FastQC - bwHPC Wiki FastQC - bwHPC Wiki

FastQC

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/fastqc
Availability bwUniCluster
License GPLv3
Citing n./a.
Links Babraham Bioinformatics
Graphical Interface Yes
Requirements A suitable Java Runtime Environment

1 Description/What is FastQC?

Modern high throughput sequencers can generate hundreds of millions of sequences in a single run. Before analysing this sequence to draw biological conclusions you should always perform some simple quality control checks to ensure that the raw data looks good and there are no problems or biases in your data which may affect how you can usefully use it.
Most sequencers will generate a QC report as part of their analysis pipeline, but this is usually only focused on identifying problems which were generated by the sequencer itself. FastQC aims to provide a QC report which can spot problems which originate either in the sequencer or in the starting library material.
FastQC can be run in one of two modes. It can either run as a stand alone interactive application for the immediate analysis of small numbers of FastQ files, or it can be run in a non-interactive mode where it would be suitable for integrating into a larger analysis pipeline for the systematic processing of large numbers of files.
The main functions of FastQC are

  • Import of data from BAM, SAM or FastQ files (any variant)
  • Providing a quick overview to tell you in which areas there may be problems
  • Summary graphs and tables to quickly assess your data
  • Export of results to an HTML based permanent report
  • Offline operation to allow automated generation of reports without running the interactive application


For more information on features please visit the FastQC Homepage

2 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

On the command line interface you'll get a list of available versions by using the command 'module avail bio/fastqc'.

$ module avail bio/fastqc
----------------------- /opt/bwhpc/common/modulefiles -----------------------
bio/fastqc/0.11.4


3 License

The program FastQC is a free software package.

4 Usage

4.1 Loading the module

4.1.1 Default

You can load the default version of FastQC with the command 'module load bio/fastqc'.

$ module purge
$ module load bio/fastqc
$ module list
Currently Loaded Modulefiles:
  1) bio/fastqc/0.11.4

The module will try to load modules it needs to function. If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for FastQC.

4.1.2 Special Version

If you wish to load a version of FastQC, you can do so using module load bio/fastqc/'version' to load the version you desires.

Example:
$ module avail bio/fastqc
--------------------------- /opt/bwhpc/common/modulefiles ---------------------------
bio/fastqc/0.11.4
$ module load bio/fastqc/0.11.4
$ module list
Currently Loaded Modulefiles:
  1) bio/fastqc/0.11.4

4.2 Program Binaries

FastQC is a java application. In order to run it needs your system to have a suitable Java Runtime Environment (JRE) installed. Before you try to run FastQC you should therefore ensure that you have a suitable JRE.
You can find the binaries in the main folder of the FastQC system. After loading the FastQC module ('module load bio/fastqc') this path is also set to the local $PATH- and $FASTQC_HOME environments.
You can run FastQC in one of two modes, either as an interactive graphical application in which you can dynamically load FastQ files or in a command-line version.

$ ls -xF $FASTQC_HOME
bwhpc-examples/    cisd-jhdf5.jar     Configuration/  fastqc*       fastqc_icon.ico  Help/  INSTALL.txt
jbzip2-0.9.jar     LICENSE_JHDF5.txt  LICENSE.txt     modulefiles/  net/             org/   README.txt
RELEASE_NOTES.txt  sam-1.103.jar      Templates/      uk/

'*' indicates the file is executable. '/' indicates its a folder.

4.2.1 Command Line Version

To run FastQC non-interactively you should use the fastqc wrapper script to launch the program.
To run non-interactively you simply have to specify a list of files to process on the command line

fastqc somefile.txt someotherfile.txt

4.2.2 Graphical User Interface Version

You can run FastQC in a Graphical-Interface-Mode where you can load the files you want to process and the FastQC-GUI will display the results on screen (see example).

  • You must have a running X-Window server on your local system (X-forwarding).
  • Start the ssh-session with the option "-X" (ssh -X 'your-id'@'your-cluster-DN').
$ pwd
/opt/bwhpc/common/bio/fastqc/0.11.4
$ ls -l fastqc
-rwxr-xr-x. 1 kn_pop123456 uc1-adm-sw 13751 16. Feb 17:43 fastqc
$ fastqc &


FastQC GUI-Version

5 bwHPC Examples for FastQC

  • MPI is not implemented in the software FastQC until now (March 2016).
  • FastQC will run multithreaded (6 threads a´ 250 MB. Option '-t'. ).


In the folder $FASTQC_EXA_DIR you'll find an example how to use FastQC.

$ echo $FASTQC_EXA_DIR
/opt/bwhpc/common/bio/fastqc/0.11.4/bwhpc-examples
$ ls -l $FASTQC_EXA_DIR
[...] bwhpc-fastqc-example.moab  # Moab submitscript (interactive mode)
[...] Sample_ABC_L005_R1.fastq   # example sequence file (for use in GUI or interactive mode)


5.1 bwhpc-example file

  • bwhpc-fastqc-example.moab

Use this Moab start-script to start your own FastQC session in interactive mode. Look for this section inside the file and do your modifications.

5.1.1 How to use the FastQC Test-Script

  • Create your own work-space
#           WS-Name        Days alive (max. 60)
ws_allocate fastqc_repo 30
  • Change dir to your workspace
cd $(ws_find fastqc_repo)
  • Copy the moab-example file you'll find in this folder and make your modifications
cp $FASTQC_EXA_DIR/bwhpc-fastqc-example.moab .
  • Submit your job
msub bwhpc-fastqc-example.moab
  • Wait for awhile...

... until you see some more files created (including some HTML-files). The *.tgz-file contains your data.

tar xvzf *.tgz to extract the file-contents


5.1.2 Exerpt from bwhpc-fastqc-example.moab

These parameters are allying for the use of FastQC on the bwUniCluster.

#!/bin/bash
#
#MSUB -N fastqc_job
#MSUB -j oe
#MSUB -o $(JOBNAME).$(JOBID)
#MSUB -m ae
#MSUB -M 'your e-mail address here'
#MSUB -q singlenode
#MSUB -l walltime=00:10:00
#
[...]
echo " "
echo "### Copying input test files for job (if required):"
echo " "
cp $FASTQC_EXA_DIR/*astq . 

echo " "
echo "### Run FastQC in single-node-mode using 6 Threads (-t) ..."
echo " "
fastqc Sample_ABC_L005_R1.fastq  -t 6 --extract --outdir=$(pwd)
[ "$?" -ne 0 ] && { echo "fastqc returned with an error: $?"; exit 1; }
echo "done"
[...]


Piping the command to 'parallel' will not work!

6 FastQC-Specific Environments

To see a list of all FastQC environments set by the 'module load'-command use env | grep FASTQC. Or use the command module display bio/fastqc.

$ module display bio/fastqc
-------------------------------------------------------------------
/opt/bwhpc/common/modulefiles/bio/fastqc/0.11.4:
module-whatis	 FastQC 0.11.4 A quality control application for high throughput sequence data. 
setenv		 FASTQC_VERSION 0.11.4 
setenv		 FASTQC_HOME /opt/bwhpc/common/bio/fastqc/0.11.4 
setenv		 FASTQC_EXA_DIR /opt/bwhpc/common/bio/fastqc/0.11.4/bwhpc-examples 
setenv		 FASTQC_BIN_DIR /opt/bwhpc/common/bio/fastqc/0.11.4 
prepend-path	 PATH /opt/bwhpc/common/bio/fastqc/0.11.4 
conflict	 bio/fastqc 


The module display command will not load the module!

7 Version-Specific Information

For a more detailed information specific to a specific FastQC version, see the information available via the module system with the command module help bio/fastqc/.
For a small abstract what FastQC is about use the command module whatis bio/fastqc.
Example

$ module whatis bio/fastqc
bio/fastqc           : FastQC 0.11.4 A quality control application for high
     throughput sequence data.
$
$ module help bio/fastqc
----------- Module Specific Help for 'bio/fastqc/0.11.4' ----------
DESCRIPTION
   FastQC is an application which takes a FastQ file and runs a series
   of tests on it to generate a comprehensive QC report.  This will
   tell you if there is anything unusual about your sequence.  Each
   test is flagged as a pass, warning or fail depending on how far it
   departs from what you'd expect from a normal large dataset with no
[...]
DOCUMENTATION

*  Get started:
   http://www.bioinformatics.babraham.ac.uk/projects/fastqc/README.txt   

*  Full manual, command-line optionen and more:
   http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
   /opt/bwhpc/common/bio/fastqc/0.11.4/Help/* (same as above)
   /opt/bwhpc/common/bio/fastqc/0.11.4/INSTALL.txt

*  Wikipedia FastQC Format Page
   http://en.wikipedia.org/wiki/FASTQ_format

*  bwHPC examples and a moab example script can be found here:
   /opt/bwhpc/common/bio/fastqc/0.11.4/bwhpc-examples
   Fastqc will run multi threaded (6 threads a´ 250 MB).
[...]


8 Useful Links