Smalt - bwHPC Wiki Smalt - bwHPC Wiki

Smalt

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/smalt
Availability bwUniCluster
License GPLv3
Citing SMALT is Copyright (C) 2010 - 2015 Genome Research Ltd.
Links Smalt Homepage
Graphical Interface no
Plugins BambamlibC

1 Description/What is Smalt?

SMALT aligns DNA sequencing reads with a reference genome.
Reads from a wide range of sequencing platforms can be processed, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger. Paired reads are supported. There is no support for SOLiD reads.

A mode for the detection of split (chimeric) reads is provided. Multi-threaded program execution is supported.
For more information on features please visit the Smalt Homepage

2 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

On the command line interface you'll get a list of available versions by using the command 'module avail bio/smalt'.

$ module avail bio/smalt
------------------------ /opt/bwhpc/common/modulefiles -------------------------
bio/smalt/0.7.6


3 Usage

3.1 Loading the module

3.1.1 Default

You can load the default version of Smalt with the command 'module load bio/smalt'.

$ module avail bio/smalt
------------------------ /opt/bwhpc/common/modulefiles -------------------------
bio/smalt/0.7.6
$ module load bio/smalt
$ module list
Currently Loaded Modulefiles:
  1) bio/smalt/0.7.6

The module will try to load modules it needs to function. If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for Smalt.

3.1.2 Special Version

If you wish to load a version of Smalt, you can do so using module load bio/smalt/'version' to load the version you desires.

Example:
$ module avail bio/smalt
------------------------ /opt/bwhpc/common/modulefiles -------------------------
bio/smalt/0.7.6
$ module load bio/smalt/0.7.6
$ module list
Currently Loaded Modulefiles:
  1) bio/smalt/0.7.6

3.2 Program Binaries

You can find the binaries in the bin-folder of the Smalt home folder. After loading the Smalt module ('module load bio/smalt') its path is also set to the local $PATH- and $SMALT_HOME environments.
Smalt is a command-line program and is usually used in a pipeline.

$ ls -RxF $SMALT_HOME
# Smalt Home folder: $SMALT_HOME
/opt/bwhpc/common/bio/smalt/0.7.6:
bin/  bwhpc-examples/  modulefiles/  share/  smalt_install_log/
# Binary files : $SMALT_BIN_DIR
/opt/bwhpc/common/bio/smalt/0.7.6/bin:
basqcol*     fetchseq*    mixreads*    readstats*  simqual*  simread*  smalt*
splitmates*  splitreads*  trunkreads*
# Examples : $SMALT_EXA_DIR (IMPORTANT!)
/opt/bwhpc/common/bio/smalt/0.7.6/bwhpc-examples:
bwhpc-smalt-example.moab  genome.fa  README.bwhpc-examples  Sp_ds.left.fq
Sp_ds.right.fq
# Modulefile 
/opt/bwhpc/common/bio/smalt/0.7.6/modulefiles:
bio-smalt-0.7.6
# Python Scripts
/opt/bwhpc/common/bio/smalt/0.7.6/share:
bam_cigar_test.py  cigar_test.py         formats.py             ioform_test.py
mthread_test.py    ouform_cigar_test.py  results_split_test.py  sample_test.py
SAM.py             splitReads_test.py    testdata.py            xali_test.py
# Installation Logs
/opt/bwhpc/common/bio/smalt/0.7.6/smalt_install_log:
bambamc_autoreconf.out  bambamc_configure.out  bambamc_make_install.out
bambamc_make.out        configure.out          make_install.out
make.out

'*' indicates the file is executable. '/' indicates its a folder.

4 BambamC Plugin

Lightweight implementation files for reading and writing BAM (genome alignment) files.
The BAM Format is a binary format for storing sequence data.
Bambamc Repository Bambam.jpg

5 bwHPC Examples for Smalt

  • MPI is not implemented in the software Smalt until now (March 2016).
  • Smalt MAPPING will run multithreaded (option '-n'. ).


In the folder $SMALT_EXA_DIR you'll find an example how to use Smalt.

$ ls -l $SMALT_EXA_DIR
[...] bwhpc-smalt-example.moab # Moab example script for use with 'msub'-command
[...] genome.fa # a human (grch38) reference genome example-file in Fasta format   
[...] README.bwhpc-examples # using Smalt on bwUniCluster readme
[...] Sp_ds.left.fq # mates1 file in Fastq format 
[...] Sp_ds.right.fq # mates2 file in Fastq format


5.1 Smalt command line options

$ smalt help

    SMALT - Sequence Mapping and Alignment Tool

SYNOPSIS:
    smalt <task> [TASK_OPTIONS] [<index_name> <file_name_A> [<file_name_B>]]

Available tasks:
    smalt check   - checks FASTA/FASTQ input
    smalt help    - prints a brief summary of this software
    smalt index   - builds an index of k-mer words for the reference
    smalt map     - maps single or paired reads onto the reference
    smalt sample  - sample insert sizes for paired reads
    smalt version - prints version information

Help on individual tasks:
    smalt <task> -H

5.2 bwhpc-example file

  • bwhpc-smalt-example.moab

Use this Moab start-script to start your own Smalt session in interactive mode. Look for this section inside the file and do your modifications.

5.2.1 How to use the Smalt Test-Script

  • Create your own work-space
#           WS-Name        Days alive (max. 60)
ws_allocate smalt_repo 30
  • Change dir to your workspace
cd $(ws_find smalt_repo)
  • Copy the moab-example file you'll find in this folder and make your modifications
cp $SMALT_EXA_DIR/bwhpc-smalt-example.moab .
  • Submit your job
msub bwhpc-smalt-example.moab
  • Wait for awhile...

... until you see some more files created (e.g. a tarball). The *.tgz-file contains your data.

tar xvzf *.tgz to extract the file-contents


5.2.2 Exerpt from bwhpc-smalt-example.moab

These parameters are allying for the use of Smalt on the bwUniCluster.

#!/bin/bash
#
#MSUB -N smalt_job
#MSUB -j oe
#MSUB -o $(JOBNAME).$(JOBID)
#MSUB -m ae
#MSUB -M 'your e-mail-address@DN'
#MSUB -q singlenode
#MSUB -l walltime=00:10:00
#
[...]
echo " "
echo "### Loading SMALT module:"
echo " "
module load bio/smalt/0.7.6
[ -z "$SMALT_HOME" ] && { echo 'ERROR: Failed to load module bio/smalt/0.7.6.'; exit 1; }
echo "SMALT_HOME = ${SMALT_HOME}"
module list

echo " "
echo "### Copying input test files for job (if required):"
echo " "
cp $SMALT_EXA_DIR/{genome.fa,Sp_ds*.fq} .

echo " "
echo "### Runing Samlt in single-node-mode, multithreaded..."
echo " "

echo "Build hash-index..."
smalt index -k 14 -s 8 hs38_k14s8 genome.fa
[ "$?" -ne 0 ] && { echo "smalt index returned with an error: $?"; exit 1; }
# Builds a hash index for the human genome in the FASTA file genome.fa.
# Words of 14 base pair length are sampled at every 8th position in the genome. 
# Two files hs38_k14s8.smi (index) and hs38_k14s8.sma (sequence) are written to disk.

echo "Mapping..."
# smalt map -o mapped.sam hs38_k14s8 Sp_ds.left.fq Sp_ds.right.fq  # sequential
smalt map -n 4 -o mapped.sam hs38_k14s8 Sp_ds.left.fq Sp_ds.right.fq # multi-treaded
[ "$?" -ne 0 ] && { echo "smalt map returned with an error: $?"; exit 1; }
# Loads the hash table created by the previous step into memory and 
# maps paired-end reads in the files Sp_ds.left.fq and Sp_ds.right.fq. 
# The output is written to the file mapped.sam in SAM output format.
echo "done"
[...]


Piping the command to 'parallel' will not work!

6 Smalt-Specific Environments

To see a list of all Smalt environments set by the 'module load'-command use env | grep SMALT. Or use the command module display bio/smalt.

$ module display bio/smalt
-------------------------------------------------------------------
/opt/bwhpc/common/modulefiles/bio/smalt/0.7.6:

module-whatis	 Smalt 0.7.6 Smalt is a program for aligning sequencing reads 
    against a large reference genome (e.g. human genome). 
setenv		 SMALT_VERSION 0.7.6 
setenv		 SMALT_HOME /opt/bwhpc/common/bio/smalt/0.7.6 
setenv		 SMALT_EXA_DIR /opt/bwhpc/common/bio/smalt/0.7.6/bwhpc-examples 
setenv		 SMALT_BIN_DIR /opt/bwhpc/common/bio/smalt/0.7.6/bin 
setenv		 SMALT_SHARE_DIR /opt/bwhpc/common/bio/smalt/0.7.6/share 
prepend-path	 LD_LIBRARY_PATH /opt/bwhpc/common/bio/smalt/0.7.6/../bambamclib/lib 
prepend-path	 PATH /opt/bwhpc/common/bio/smalt/0.7.6 
prepend-path	 PATH /opt/bwhpc/common/bio/smalt/0.7.6/bin 
conflict	 bio/smalt 
-------------------------------------------------------------------


The module display command will not load the module!

7 Version-Specific Information

For a more detailed information specific to a specific Smalt version, see the information available via the module system with the command module help bio/smalt/.
For a small abstract what Smalt is about use the command module whatis bio/smalt.
Example:

$ module whatis bio/smalt
bio/smalt            : Smalt 0.7.6 Smalt is a program for aligning sequencing reads
    against a large reference genome (e.g. human genome).

$ module help bio/smalt
----------- Module Specific Help for 'bio/smalt/0.7.6' ------------
DESCRIPTION
   Smalt is a software package for mapping low-divergent sequences 
   against a large reference genome, such as the human genome.
   It has two major components, one for read shorter than 150bp 
   and the other for longer reads.  
[...]
DOCUMENTATION

*  Get started
   http://www.sanger.ac.uk/science/tools/smalt-0

*  Smalt documentation
   http://sourceforge.net/projects/smalt/files/smalt_manual.pdf   

*  Smalt repository (binaries/sources)
   http://sourceforge.net/projects/smalt/ 

*  bwHPC examples and a moab example script can be found here:
   /opt/bwhpc/common/bio/smalt/0.7.6/bwhpc-examples
   Please read the 'README.bwhpc-examples' file.
[...]