Structure - bwHPC Wiki Structure - bwHPC Wiki

Structure

From bwHPC Wiki
Jump to: navigation, search
Description Content
module load bio/structure
License Free Software
Citing The basic algorithm was described by Pritchard, Stephens & Donnelly (2000).

Extensions to the method were published by Falush, Stephens and Pritchard (2003)
and (2007) and Hubisz, Falush, Stephens and Pritchard (2009).

Links Structure Software Page | Structure Documentation
Graphical Interface Yes
Included modules none


1 Description

The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs.

For more information on features please visit the Structure Software homepage.

2 Versions and Availability

A list of versions currently available on all bwHPC-C5-Clusters can be obtained from the

Cluster Information System CIS

On the command line interface of any bwHPC cluster you'll get a list of available versions by using the command 'module avail bio/structure'.

$ module avail bio/structure
------------------------ /opt/bwhpc/common/modulefiles -------------------------
bio/structure/2.3.4(default) bio/structure/2.3.4_gui


3 License

The program structure is a free software package.

4 Loading the module

4.1 default

You can load the default version of Structure with the command 'module load bio/structure'.

$ module load bio/structure
$ module list
Currently Loaded Modulefiles:
  1) bio/structure/2.3.4(default)

The module will try to load modules it needs to function. If loading the module fails, check if you have already loaded one of those modules, but not in the version needed for Structure. The default version is the one without GUI!

4.2 Command Line Version

If you wish to load a command line version of Structure, you can do so using '$ module load bio/structure/'version' to load the version you desires. In this case omit the module with the suffix _gui.

Example:
$ module avail bio/structure
------------------------ /opt/bwhpc/common/modulefiles -------------------------
bio/structure/2.3.4(default) bio/structure/2.3.4_gui
$ module load bio/structure/2.3.4
$ module list
Currently Loaded Modulefiles:
  1) bio/structure/2.3.4(default)


4.3 GUI (Graphical User Interface) version

The module with the suffix _gui indicates the one with a GUI-front-end where you can use a mouse and click through the software.

  • You must have a running X-Window server on your local system (X-forwarding).
  • Start the ssh-session with the option "-X" (ssh -X 'your-id'@bwunicluster.scc.kit.edu).
$ module avail bio/structure
------------------------ /opt/bwhpc/common/modulefiles -------------------------
bio/structure/2.3.4(default) bio/structure/2.3.4_gui
$ module load bio/structure/2.3.4_gui
$ module list
Currently Loaded Modulefiles:
  1) bio/structure/2.3.4_gui


5 Program Binaries

You can find the Structure binaries in the main folder of the Structure system. After loading the Structure module ('module load bio/structure') this path is also set to the local $PATH- and $STRUCTURE_HOME environments.

5.1 Command Line Version

The CL version of Structure contains some interactive programs.

$ ls -F $STRCTURE_HOME
extraparams  mainparams  seed.txt  structure*  structure_doc.pdf  testdata1

'*' indicates the file is executable.

5.2 Graphical User Interface Version

To start the GUI version of Structure, just load the 'bio/structure/2.3.4_gui' module and type 'structure &'.

$ module list
Currently Loaded Modulefiles:
  1) bio/structure/2.3.4_gui
$ structure &


Structure Software
There's only one single file in the 'bin'-folder inside the GUI-Structure software tree.

$ ls -F $STRUCTURE_HOME/bin
structure*
$

'*' at the end of the file name indicates it's an executable.

6 bwHPC Examples for Serial Jobs with Structure

  • MPI is not implemented in the software STRUCTURE until now (May 2015).
  • The bwUniCluster doesn't support array-jobs, so we had to use another way to set up a serial job with Structure.

These are examples for the bwUniCluster only.
In the folder $STRUCTURE_EXA_DIR you'll find an example how to use Structure.

$ echo $STRUCTURE_EXA_DIR
/opt/bwhpc/common/bio/structure/2.3.4/bwhpc-examples
$ cd $STRUCTURE_EXA_DIR
$ ls -l
[...] bwhpc-structure-example.moab
[...] cucullata-no-clones.txt
[...] div-scripts
[...] extraparams
[...] mainparams
[...] README.bwhpc
[...] run-parallel-jobs.bash


6.1 Sypnosis

  • bwhpc-structure-example.moab

Moab start-script! Use this one to start your Structure calculation. Look for this section inside the file and do your modifications.

####################################
# Your own local setups start here
# MODIFY TO YOUR OWN REQUIREMENTS!
####################################
REPLICATES=10                     # define the number of replicates for each k
KLOW=1                            # define the lower range of k
KUP=5                             # define the upper range of k
INFILE="cucullata-no-clones.txt"; # provide name of input file
OUTFILE="daphnia_cuc";            # name the outfile basis
  • cucullata-no-clones.txt

Inputfile for Structure

  • div-scripts

A folder with alternate start-scripts. Some of them are not fully implemented or tested.

  • extraparams and mainparams

Parameterfiles for Structure.

  • README.bwhpc
  • run-parallel-jobs.bash

Local testscript. Do not use this one for a big amount of data.

6.2 How to use the Structure Test-Script

  • Create your own work-space
#           WS-Name        Days alive (max. 60)
ws_allocate structure_repo 30
  • Change dir to your workspace
cd $(ws_find structure_repo)
  • Copy the moab-example file you'll find in this folder and make your modifications
cp $STRUCTURE_EXA_DIR/bwhpc-structure-example.moab .
  • Submit your job
msub bwhpc-structure-example.moab
  • Wait for awhile...

... until you see some more files created. The *.tgz-file contains your data.

tar xvzf *.tgz to extract the file-contents


6.3 Exerpt from bwhpc-structure-example.moab

These parameters are allying for the use of Structure on the bwUniCluster.

#!/bin/bash
# rainer.rutka@uni-konstanz.de
# 
# -N structure_job
#MSUB -j oe
#MSUB -o $(JOBNAME).$(JOBID)
# -m ae
#
##### QUEUE SPECIFICATION:
# # singlenode: node=1, processes=16
# # this is mandatory for Structure. ONLY SINGLEMODE!
#MSUB -q singlenode
#MSUB -l nodes=1:ppn=16
#MSUB -l mem=30000mb
#MSUB -l walltime=00:05:00
[...]
echo " "
echo "### Loading STRUCTURE module:"
echo " "
# Load version 2.5.2
module load bio/structure/2.3.4
[ ! -z "$STRUCTURE_VERSION" ] || { echo "ERROR: Failed to load module 'bio/structure/2.3.4'."; exit 1; }
echo "STRUCUTRE_BIN_DIR  = ${STRUCUTRE_HOME}/bin"
[...]
####################################
# Your own local setups start here
# MODIFY TO YOUR OWN REQUIREMENTS!
####################################
REPLICATES=10                     # define the number of replicates for each k
KLOW=1                            # define the lower range of k
KUP=5                             # define the upper range of k
INFILE="cucullata-no-clones.txt"; # provide name of input file
OUTFILE="daphnia_cuc";            # name the outfile basis
echo " "
echo "### Copying input test files for job (if required):"
echo " "
cp -v ${STRUCTURE_EXA_DIR}/{*.txt,extraparams,mainparams} ${TMP_WORK_DIR}/.

echo " "
echo "### Calling structure command ..."
echo " "
for K in $(seq ${KLOW} ${KUP})
do
   for R in $(seq 1 ${REPLICATES})
   do
      random=$(echo $((($RANDOM*32768+$RANDOM)%10000000)))
      structure -i ${INFILE} -K ${K} -D $random -o ${OUTFILE}_k${K}_rep${R} > screenout_k${K}_rep${R}
   done
done | parallel --no-notice -j $MOAB_PROCCOUNT
[...]


parallel: build and execute shell command lines from standard input in parallel
See parallel manpage for more information ($ man parallel).

7 Structure-Specific Environments

To see a list of all Structure environments set by the 'module load'-command use 'env | grep STRUCTURE'. Or use the command 'module display bio/structure'.

$ : COMMANDLINE VERSION
$ env | grep STRUCTURE
Currently Loaded Modulefiles:
  1) bio/structure/2.3.4(default)
$ env | grep STRUCTURE
STRUCTURE_EXA_DIR=/opt/bwhpc/common/bio/structure/2.3.4/bwhpc-examples
STRUCTURE_VERSION=2.3.4
STRUCTURE_DOC_DIR=/opt/bwhpc/common/bio/structure/2.3.4/console
STRUCTURE_HOME=/opt/bwhpc/common/bio/structure/2.3.4/console
$ module clear
Are you sure you want to clear all loaded modules!? [n] y
$ : GUI VERSION
$ module load bio/structure/2.3.4_gui
$ env | grep STRUCTURE
STRUCTURE_EXA_DIR=/opt/bwhpc/common/bio/structure/2.3.4/bwhpc-examples
STRUCTURE_VERSION=2.3.4
STRUCTURE_DOC_DIR=/opt/bwhpc/common/bio/structure/2.3.4/frontend/doc
STRUCTURE_HOME=/opt/bwhpc/common/bio/structure/2.3.4/frontend


8 Version-Specific Information

For a more detailed information specific to a specific Structure version, see the information available via the module system with the command 'module help bio/structure/version'.
For a small abstract what Structure is about use the command 'module whatis bio/structure
Example (COMMAND LINE VERSION)

$ module load bio/structure/2.3.4
$ module whatis bio/structure/2.3.4
bio/structure/2.3.4  : STRUCTURE Command-Line-Version. SW package for using
  multi-locus genotype data to investigate population structure
  (command 'structure PATH_TO_MAINPARAMS-FILE')
$ module help bio/structure/2.3.4
----------- Module Specific Help for 'bio/structure/2.3.4' (EXCERPT ONLY) --------
  The program structure is a free software package for using multi-locus
  genotype data to investigate population structure. Its uses include
[...]
  In order to simplify batch runs and make it easier to run simulations 
  involving structure, the run-time version of Structure has added 
  command-line flags that update the values of certain parameters, 
  over-riding the values set in 'mainparams'.
  These are as follows:
  -m (mainparams) Read a different parameter input file instead of
     mainparams
  -e (extraparams) Read a different parameter input file instead of
     extraparams
  -s (stratparams) Read a different parameter input file instead of
     stratparams. (For use with the accompanying program, STRAT, 
     for association mapping.)
  -K (MAXPOPS) Change the number of populations.
  -L (NUMLOCI) Change the number of loci.  
  -N (NUMINDS) Change the number of individuals.
  -i (input file) Read data from a different input file.
  -o (output file) Print results to a different output file.
  -D (SEED) Initialize the random number generation using the value SEED. 
     Note that RANDOMIZE MUST be set to 0 to use this option.)
[...]
           *** THIS IS THE COMMAND-LINE version ***
ATTENTION: We need at least a 'mainparams' and 'extraparams'-file
           in your current directory to get 'structure' started.
           Examples are found in: 
           /opt/bwhpc/common/bio/structure/2.3.4/console/mainparams
           /opt/bwhpc/common/bio/structure/2.3.4/console/extraparams
           /opt/bwhpc/common/bio/structure/2.3.4/bwhpc-examples/*
  and READ CAREFULLY!
           /opt/bwhpc/common/bio/structure/2.3.4/bwhpc-examples/README.bwhpc
[...]