Bowtie2 - bwHPC Wiki Bowtie2 - bwHPC Wiki

Bowtie2

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/bowtie2
License Artistic License/GPLv3
Citing

Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2.
Nature Methods. 2012, 9:357-359.

Links Homepage | Documentation
Graphical Interface No

1 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

On the command line interface of any bwHPC cluster, a list of the available i versions using

$ module avail bio/bowtie2

2 License

Copyright 2014, Ben Langmead Bowtie 2 is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

Bowtie 2 is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Bowtie 2. If not, see GPL;.

3 Usage

3.1 Loading the module

You can load the default version of Bowtie 2 with the command

$ module load bio/bowtie2

The module will try to load modules it needs to function (e.g. compiler/intel). If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for Bowtie 2. If you wish to load a specific (older) version, you can do so using e.g.

$ module load bio/bowtie2/2.1.0

to load the version 2.1.0.

3.2 Program Binaries

$ bowtie2

Bowtie 2 is an ultrafast, memory-efficient short read aligner geared toward quickly aligning large sets of short DNA sequences (reads) to large genomes. Bowtie 2 takes an index and a set of reads as input and outputs a list of alignments.

$ bowtie2-build

bowtie2-build builds a Bowtie index from a set of DNA sequences. bowtie2-build outputs a set of 6 files with suffixes .1.ebwt, .2.ebwt, .3.ebwt, .4.ebwt, .rev.1.ebwt, and .rev.2.ebwt. (If the total length of all the input sequences is greater than about 4 billion, then the index files will end in ebwtl instead of ebwt.) These files together constitute the index: they are all that is needed to align reads to that reference. The original sequence files are no longer used by Bowtie once the index is built.

$ bowtie2-inspect

bowtie2-inspect extracts information from a Bowtie index about what kind of index it is and what reference sequences were used to build it. When run without any options, the tool will output a FASTA file containing the sequences of the original references (with all non-A/C/G/T characters converted to Ns). It can also be used to extract just the reference sequence names using the -n/--names option or a more verbose summary using the -s/--summary option.

3.3 Disk Usage

Scratch files are written to the current directory by default. Please change to a local directory before starting your calculations. For example

$ mkdir -p /tmp/$USER/job_sub_dir 
$ cd /tmp/$USER/job_sub_dir 

In case of multi-node parallel jobs, you might need to create the directory on all nodes used.

However, you can also use workspaces for your calculations that are located on the parallel file system. Especially since in- and outputdata for aligining sequences is rather big and if you want to use your results for subsequent analysis.

$ WS_PATH=`ws_allocate bowtie2_test 20`
$ cd ${WS_PATH}/

3.4 Bowtie-Indices

Please contact the HPC-Competence Center for Bioinformatics and Astrophysics via the bwSupport Portal if you need a Bowtie2-index permantly. The indices usually need a lot if diskspace. Therefore it is better to make them available to users in a common location like ${DBDATA_BOWTIE2_INDEX_DNA}.

4 Examples

You can copy a simple interactive example to your home directory and run it, using:

$ mkdir ~/bowtie2-examples/
$ cp -r $BOWTIE2_EXA_DIR/ ~/bowtie2-examples/
$ cd ~/bowtie2-examples/

4.1 Aligning

The following example shows you how to align simulated short reads against the human genome HG19:

Single End Aligning

$ msub -I -lnodes=1:ppn=2,walltime=00:00:30:00
$ HOME=`pwd`
$ TMP_DIR=$TMP/$USER/job_sub_dir 
$ mkdir -p $TMP_DIR 
$ cd $TMP/$USER/job_sub_dir
$ module load bio/bowtie2
$ module load dbdata/homo_sapiens/hg19_ncbi
$ time bowtie2 -p ${MOAB_PROCCOUNT} \
-x ${DBDATA_BOWTIE2_INDEX_DNA} \
-S bowtie2.sam \
${BOWTIE2_EXA_DIR}/hg19_sim.read1.fastq \
&>statistics.txt &
$ mkdir -p $HOME/bowtie2_test_results/
$ mv * $HOME/bowtie2_test_results/ 
$ cd $HOME/bowtie2_test_results/
$ rm -rfv $TMP_DIR/


Explanation of the parameters:

-S Output will be written in SAM format
-x Path to bowtie2 index, in this case hg19 is used
-p Calulation will be performed on X cores, the value is taken from the MOAB_PROCCOUNT environment variable. This calculation will be done on two cores since we requested them with -lnodes=1:ppn=2
${DBDATA_BOWTIE2_INDEX_DNA} Location of the bowtie 2 index, in this case hg19 is used
${BOWTIE2_EXA_DIR}/hg19_sim.read1.fastq Input file containing the short reads. In this example simulated short reads created with dwgsim 0.1.11 are used.
bowtie2.sam Output file in SAM format named bowtie2.sam
&>statistics.txt Statistcs are piped into the file statistics.txt



Paired End Aligning

$ msub -I -lnodes=1:ppn=2,walltime=00:00:30:00
$ HOME=`pwd`
$ TMP_DIR=$TMP/$USER/job_sub_dir 
$ mkdir -p $TMP_DIR 
$ cd $TMP/$USER/job_sub_dir
$ module load bio/bowtie2
$ module load dbdata/homo_sapiens/hg19_ncbi
$ time bowtie2 -p ${MOAB_PROCCOUNT} \
 -x ${DBDATA_BOWTIE2_INDEX_DNA} \
 -S bowtie2.sam \
 -1 ${BOWTIE2_EXA_DIR}/hg19_sim.read1.fastq \
 -2 ${BOWTIE2_EXA_DIR}/hg19_sim.read2.fastq \
 &>statistics.txt &
$ mkdir -p $HOME/bowtie2_test_results/
$ mv * $HOME/bowtie2_test_results/ 
$ cd $HOME/bowtie2_test_results/
$ rm -rfv $TMP_DIR/


Explanation of the parameters:

-p Calulation will be performed on X cores, the value is taken from the MOAB_PROCCOUNT environment variable. This calculation will be done on two cores since we requested them with -lnodes=1:ppn=2
-x Path to bowtie2 index, in this case hg19 is used
-S Output will be written in SAM format
-1 Path to the first fastq file of the paired alignment. Input file containing the short reads. In this example simulated short reads created with dwgsim 0.1.11 are used.
-2 Path to the second fastq file of the paired alignment. Input file containing the short reads. In this example simulated short reads created with dwgsim 0.1.11 are used.
bowtie2.sam Output file in SAM format named bowtie2.sam
&>statistics.txt Statistcs are piped into the file statistics.txt


4.2 Indexing

The following script can be used to create a bowtie2 index. However, please contact the HPC-Competence center for Bioinformatics and Astrophysics (bwSupport Portal) if you need additional Bowtie-Indices that are not already located in $DBDATA_BOWTIE2_INDEX_DNA/

Content of the batch script create_bowtie2_indices.moab

#!/bin/bash
#MSUB -l nodes=1:ppn=1
#MSUB -l walltime=01:00:00:00
#MSUB -m abe
##MSUB -M PUT_YOUR_EMAIL
#MSUB -l mem=20gb

module load bio/bowtie2
cd $MOAB_SUBMITDIR/
time bowtie2-build hg19.fa hg19.bowtie


More examples can be found in the $BOWTIE2_EXA_DIR.

4.3 Version-Specific Information

For information specific to a single version, see the information available via the module system with the command

$ module help bio/bowtie2