FastQ screen - bwHPC Wiki FastQ screen - bwHPC Wiki

FastQ screen

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/fastq_screen
Availability bwUniCluster
License GPLV3
Citing n./a.
Links Babraham Bioinformatics
Graphical Interface no
Requirements Bowtie2 | bio/bowtie2/2.2.3 (automatic load of module).
A suitable Perl Runtime Environment with GD::Graph plugin (optional)


1 Description/What is FastQ Screen

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
When running a sequencing pipeline it is useful to know that your sequencing runs contain the types of sequence they're supposed to.
FastQ Screen allows you to set up a standard set of libraries against which all of your sequences can be searched. Your search libraries might contain the genomes of all of the organisms you work on, along with PhiX, Vectors or other contaminants commonly seen in sequencing experiments.
The program produces both text based and graphical output which summaries the mapping of your sequences against each of your libraries, so that when you search your mouse sequences you can see if they're good or not.
For more information on features please visit the FastQ Screen Informations Page

2 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

On the command line interface you'll get a list of available versions by using the command module avail bio/fastq_screen.

$ module avail bio/fastq_screen
----------------- /opt/bwhpc/common/modulefiles --------------------
bio/fastq_screen/0.5.2


3 License

The program FastQ Screen is a free software package.

4 Usage

4.1 Loading the module

4.1.1 Default

You can load the default version of FastQ Screen with the command module load bio/fastq_screen.

$ module avail bio/fastq_screen
------------- /opt/bwhpc/common/modulefiles ------------------
bio/fastq_screen/0.5.2
$ module load bio/fastq_screen
$ module list
Currently Loaded Modulefiles:
  1) bio/bowtie2/2.2.3        2) bio/fastq_screen/0.5.2

The module will try to load modules it needs to function. If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for FastQ Screen.

4.1.2 Special Version

If you wish to load a version of FastQ Screen, you can do so using module load bio/fastq_screen/'version' to load the version you desires.

Example:
$ module avail bio/fastq_screen
----------------- /opt/bwhpc/common/modulefiles -------------------
bio/fastq_screen/0.5.2
$ module load bio/fastq_screen/0.5.2
$ module list
Currently Loaded Modulefiles:
  1) bio/bowtie2/2.2.3        2) bio/fastq_screen/0.5.2

4.2 Program Binaries

  • This version of FastQ Screen is for the command-line only.
  • Fastq Screen is intended to be used as part of a QC pipeline.

It allows you to take a sequence dataset and search it against a set of Bowtie databases. It will then generate both a text and a graphical summary of the results to see if the sequence dataset contains the kind of sequences you expect or not.

$ ls -xF $FASTQ_SCREEN_HOME
aln-pe.sam            bwhpc-examples/            database/    fastq_screen*
fastq_screen.conf     fastq_screen.conf.example  license.txt  modulefiles/
OpenSans-Regular.ttf  README-PERL.bwhpc          README.txt   RELEASE_NOTES.txt

'*' indicates the file is executable. '/' indicates its a folder.

5 Perl Plugin-Installation

Install the GD::Graph plugin for every new user who wants to use the FastQ_Screen module on the bwUni-Cluster (this one).

# INSTALL THE PLUGIN
$ perl -MCPAN -e shell
| > install Bundle::CPAN
| : answer all(!) questions with: _yes_
| > install GD::Graph
| : answer all(!) questions with: _yes_
| > quit

This must be done once! You'll find a $HOME/.cpan folder where your plugings will be located.

6 Main Configuration File

The most important configuration file is: /opt/bwhpc/common/bio/fastq_screen/0.5.2/fastq_screen.conf(.example)
Edit this one and rename it to fastq_screen.conf.
At least check these key/value pairs (EXAMPLES ONLY):

  • TREADS 8
    FastQ Screen runs in multi-threaded mode and uses 8 cores by default.
    This can be changed by editing the 'fastq_screen.conf' file.
  • DATABASE Human /opt/bwhpc/common/bio/fastq_screen/0.5.2/database/grch38_genome
    If the bowtie AND bowtie2 indices of a given genome reside in the SAME FOLDER, a SINLGE path may be provided to BOTH sets of indices. [...]

Beware!
In the main location of FastQ Screen ($FASTQ_SCREEN_HOME) you will find a folder named 'database'.
This one is an example only supplied by us.

7 bwHPC Examples for FastQ Screem

  • MPI is not implemented in this version of FastQ Screen.


In the folder $FASTQ_SCREEN_EXA_DIR you'll find an example how to use FastQ Screen.

$ ls -l $FASTQ_SCREEN_EXA_DIR
[...] build_bowtie_index.sh  # starts 'bowtie2-build ' to make an index of the Sample DB
[...] BWA_aligning_indexing_example.sh # Example Burrows Wheeler Aligner indexing
[...] bwhpc-fastq_screen-example.moab # Moab submitscript. Creates final screen-file (+alignements).
[...] fastq_screen_aligner.sh # FastQ Screen aligner example
[...] fastq_screen_job.msub_out # msub STDOUT example of a finished job
[...] Homo_sapiens.GRCh38.dna.chromosome.10.fa # DB example for tests 
[...] README-PERL.bwhpc # Include GD::Graph plugin for perl
[dir] result # example screen-result file
[...] Sample_ABC_L005_R1.fastq # given indexed example Fasta file


7.1 bwHPC example workflow

  • bwhpc-fastq_screen-example.moab

Use this Moab start-script to start your own FastQ Screen session in interactive mode. Look for this section inside the file and do your modifications.

7.1.1 How to use the bwhpc-FastQ Screen Test-Script

  • Create your own work-space
#           WS-Name           Days alive (max. 60)
ws_allocate fastq_screen_repo 30
  • Change dir to your workspace
cd $(ws_find fastq_screen_repo)
  • Copy the moab-example file you'll find in this folder and make your modifications
cp $FASTQ_SCREEN_EXA_DIR/bwhpc-fastq_screen-example.moab .
  • Submit your job
msub bwhpc-fastq_screen-example.moab
  • Wait for awhile...

... until you see some more files created. The *.tgz-file contains your data.

tar xvzf *.tgz # to extract the file-contents


7.1.2 Exerpt from bwhpc-fastq_screen-example.moab

These parameters are allying for the use of FastQ Screen on the bwUniCluster.

#!/bin/bash
#
#MSUB -N fastq_screen_job
#MSUB -j oe
#MSUB -o $(JOBNAME).$(JOBID)
#MSUB -m ae
#MSUB -M 'your e-mail@DN'
#MSUB -q singlenode
#MSUB -l walltime=00:10:00
#
[...]

echo " "
echo "### Loading Bowtie, FASTQC module:"
echo " "
module load bio/bowtie2/2.2.3
[ -z "$BOWTIE2_HOME" ] && { echo 'ERROR: Failed to load module bio/bowtie2/2.2.3'; exit 1; }
module load bio/fastq_screen/0.5.2
[ -z "$FASTQ_SCREEN_HOME" ] && { echo 'ERROR: Failed to load module bio/fastq_screen_0.5.3'; exit 1; }
module list
[...]
echo " "
echo "### Copying input test files for job (if required):"
echo " "
cp -v ${FASTQ_SCREEN_EXA_DIR}/{Sample_ABC_L005_f1.fastq,Homo*} .
[...]

echo " "
echo "### Run FastQC in 'threads-mode'..."
echo " "
bowtie2-build Homo_sapiens.GRCh38.dna.chromosome.10.fa grch38_genome
fastq_screen --aligner bowtie2 --subset 1000 --threads 6 Sample_ABC_L005_R1.fastq
fastq_screen --threads 6 Sample_ABC_L005_R1.fastq
[ "$?" -ne 0 ] && { echo "fastqc returned with an error: $?"; exit 1; }
echo "done"
[...]


Piping the command to 'parallel' will not work!

8 FastQ Screen-Specific Environments

To see a list of all FastQ Screen environments set by the module load-command use env | grep FASTQ_SCREEN. Or use the command module display bio/fastq_screen.

$ module display  bio/fastq_screen/0.5.2
-------------------------------------------------------------------
/opt/bwhpc/common/modulefiles/bio/fastq_screen/0.5.2:
module-whatis	 FastQ screen 0.5.2 Contmaination Screening for large data sets 
setenv		 FASTQ_SCREEN_VERSION 0.5.2 
setenv		 FASTQ_SCREEN_HOME /opt/bwhpc/common/bio/fastq_screen/0.5.2 
setenv		 FASTQ_SCREEN_EXA_DIR /opt/bwhpc/common/bio/fastq_screen/0.5.2/bwhpc-examples 
setenv		 FASTQ_SCREEN_BIN_DIR /opt/bwhpc/common/bio/fastq_screen/0.5.2 
prepend-path	 PATH /opt/bwhpc/common/bio/fastq_screen/0.5.2 
conflict	 bio/fastq_screen 

The module display command will not load the module!
You may check the Bowtie2 environments, too.

$ module display  bio/bowtie2/2.2.3
-------------------------------------------------------------------
/opt/bwhpc/common/modulefiles/bio/bowtie2/2.2.3:
module-whatis	 2.2.3 Bowtie 2 is an ultrafast and memory-efficient tool for aligning 
     sequencing reads to long reference sequences.
[...]  
setenv		 BOWTIE2_VERSION 2.2.3 
setenv		 BOWTIE2_HOME /opt/bwhpc/common/bio/bowtie2/2.2.3 
setenv		 BOWTIE2_BIN_DIR /opt/bwhpc/common/bio/bowtie2/2.2.3/bin 
setenv		 BOWTIE2_DOC_DIR /opt/bwhpc/common/bio/bowtie2/2.2.3/doc 
setenv		 BOWTIE2_EXA_DIR /opt/bwhpc/common/bio/bowtie2/2.2.3/examples 
prepend-path	 PATH /opt/bwhpc/common/bio/bowtie2/2.2.3/bin:/opt/bwhpc/common/bio/bowtie2/2.2.3/ 
conflict	 bio/bowtie2 


9 Version-Specific Information

For a more detailed information specific to a specific FastQ Screen version, see the information available via the module system with the command module help bio/fastq_screen.
For a small abstract what FastQ Screen is about use the command module whatis bio/fastq_screen.
Example:

$ module whatis bio/fastq_screen
bio/fastq_screen     : FastQ screen 0.5.2 Contmaination Screening for large data sets

$ module help bio/fastq_screen
----------- Module Specific Help for 'bio/fastq_screen/0.5.2' ---------------------------
DESCRIPTION
   FastQ Screen allows you to screen a library of sequences in FastQ 
   format against a set of sequence databases so you can see if the 
   composition of the library matches with what you expect. 
[...]

DOCUMENTATION
*  Get started:
   http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/README.txt

*  Release notes:
   http://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/RELEASE_NOTES.txt 
[...]

MAIN CONFIGURATION FILE

The most important configuration file is:

/opt/bwhpc/common/bio/fastq_screen/0.5.2/fastq_screen.conf(.example)

Edit this one and rename it to 'fastq_screen.conf'.

FastQ Screen runs in multi-threaded mode and uses 8 cores by default.
This can be changed by editing the 'fastq_screen.conf' file.
[...]