Resource Managers - bwHPC Wiki Resource Managers - bwHPC Wiki

Resource Managers

From bwHPC Wiki
Jump to: navigation, search

1 TORQUE Resource Manager

The Terascale Open-source Resource and QUEue Manager (TORQUE) is a distributed resource manager providing control over batch jobs and distributed compute nodes. TORQUE can integrate with the Moab Workload Manager to improve overall utilization, scheduling and administration on a HPC-C4 cluster.

1.1 Batch Job (PBS) Variables : bwForCluster

At the time a job is launched into execution, TORQUE defines multiple environment variables, which can be used from within the submission script to define the correct workflow of the job.
Since the work load manager TORQUE on BwForCluster_Chemistry uses the resource manager TORQUE, the following environment variables of TORQUE are added to your environment once your job has started (excerpt of the most important ones).

Environment Brief explanation
PBS_O_WORKDIR Directory where the qsub command is issued (Job's submission directory)
PBS_O_SHELL Script shell
PBS_O_PATH Path variable used to locate executables within job script
PBS_O_HOST Host on which job script is currently running
PBS_O_HOME Home directory of submitting user
PBS_O_LOGNAME Name of submitting user
PBS_NODEFILE File that lists the hosts (compute nodes) on which the job is run. Line delimited.
PBS_JOBNAME User specified jobname
PBS_NODENUM Node offset number
PBS_NUM_NODES Number of nodes allocated to the job
PBS_QUEUE Job queue
PBS_NP Number of execution slots (cores) for the job
PBS_NUM_PPN Number of procs per node allocated to the job
PBS_JOBID Unique number PBS assigns to a job
PBS_TASKNUM Number of tasks requested
TMPDIR Directory on the scratch (local and fast) disk space that is unique to a job
  • PBS_O_WORKDIR is typically used at the beginning of a script to go to the directory where the qsub command was issued, which is frequently also the directory containing the input data for the job, etc. The typical use is
    cd $PBS_O_WORKDIR
    inside a submission script.
  • PBS_NODEFILE is typically used to define the environment for the parallel run, for mpirun in particular. Normally, this usage is hidden from users inside a script (e.g. enable_arcus_mpi.sh), which defines the environment for the user.
  • PBS_JOBID is useful to tag job specific files and directories, typically output files or run directories. For instance, the submission script line
    myApp > $PBS_JOBID.out
    runs the application myApp and redirects the standard output to a file whose name is given by the job id. (NB: the job id is a number assigned by Torque and differs from the character string name given to the job in the submission script by the user.)
  • TMPDIR is the name of a scratch disk directory unique to the job. The scratch disk space typically has faster access than the disk space where the user home and data areas reside and benefits applications that have a sustained and large amount of I/O. Such a job normally involves copying the input files to the scratch space, running the application on scratch and copying the results to the submission directory.

See also:

1.2 Job Exit Status

Once a job under TORQUE has completed, the exit_status attribute will contain the result code returned by the job script. This attribute can be seen by submitting a qstat -f command to show the entire set of information associated with a job. The exit_status field is found near the bottom of the set of output lines.

$ msub bwforcluster-quantum-example.moab
760236
$ qstat
Job ID                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
760236.adm01               ...-example.moab kn_l_pop235844  00:00:00 C short          
$
$ qstat -f 760236
Job Id: 760236.adm01
    Job_Name = bwforcluster-quantum-example.moab
    Job_Owner = kn_l_pop235844@adm01
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.vmem = 0kb
    resources_used.walltime = 00:00:11
    job_state = C
    queue = short
    server = adm01
    Checkpoint = u
    ctime = Fri Jan 15 10:10:23 2016
    Error_Path = adm01:/nfs/home1/kn/kn_kn/kn_l_pop235844/src/chem/quantum_esp
	resso/4.0.3_epw/bwhpc-examples/bwforcluster-quantum-example.moab.76023
	6
    exec_host = n0301/0
    exec_port = 15003
    group_list = kn_kn
    Hold_Types = n
    Join_Path = oe
    Keep_Files = n
    Mail_Points = a
    mtime = Fri Jan 15 10:10:44 2016
    Output_Path = adm01:/nfs/home1/kn/kn_kn/kn_l_pop235844/src/chem/quantum_es
	presso/4.0.3_epw/bwhpc-examples/bwforcluster-quantum-example.moab.7602
	36
    Priority = 0
    qtime = Fri Jan 15 10:10:23 2016
    Rerunable = False
    Resource_List.mem = 4000mb
    Resource_List.nodect = 1
    Resource_List.nodes = 1:ppn=1
    Resource_List.walltime = 48:00:00
    session_id = 1349
    Variable_List = MOAB_BATCH=,MOAB_CLASS=short,MOAB_GROUP=kn_kn,
	MOAB_JOBID=760236,MOAB_JOBNAME=bwforcluster-quantum-example.moab,
	MOAB_MACHINE=torque,MOAB_NODECOUNT=1,MOAB_NODELIST=n0301,
	MOAB_PARTITION=Moab,MOAB_PROCCOUNT=1,MOAB_TASKMAP=n0301:1,
	MOAB_USER=kn_l_pop235844,PBS_O_QUEUE=short,PBS_O_HOME=/,
	PBS_O_PATH=/sbin:/usr/sbin:/bin:/usr/bin,
	PBS_O_WORKDIR=/nfs/home1/kn/kn_kn/kn_l_pop235844/src/chem/quantum_esp
	resso/4.0.3_epw/bwhpc-examples,PBS_O_HOST=adm01,PBS_O_SERVER=adm01,
	MOAB_SUBMITDIR=/home/kn/kn_kn/kn_l_pop235844/src/chem/quantum_espress
	o/4.0.3_epw/bwhpc-examples,PATH=/sbin:/usr/sbin:/bin:/usr/bin
    euser = kn_l_pop235844
    egroup = kn_kn
    queue_type = E
    etime = Fri Jan 15 10:10:23 2016
 
--> exit_status = 0

    start_time = Fri Jan 15 10:10:33 2016
    start_count = 1
    fault_tolerant = False
    comp_time = Fri Jan 15 10:10:45 2016
    job_radix = 0
    total_runtime = 14.439113
    proxy_user = kn_l_pop235844
    submit_host = adm01
    job_id = 760236
    x = ENVREQUESTED:TRUE;SID:Moab;SJID:760236;SRMJID:760236

This code can be useful in diagnosing problems with jobs that may have unexpectedly terminated.
If TORQUE was unable to start the job, this field will contain a negative number produced by the pbs_mom.
Otherwise, if the job script was successfully started, the value in this field will be the return value of the script.

1.2.1 TORQUE Supplied Exit Codes

Name Value Description
JOB_EXEC_OK 0 job exec successful
JOB_EXEC_FAIL1 -1 job exec failed, before files, no retry
JOB_EXEC_FAIL2 -2 job exec failed, after files, no retry
JOB_EXEC_RETRY -3 job execution failed, do retry
JOB_EXEC_INITABT -4 job aborted on MOM initialization
JOB_EXEC_INITRST -5 job aborted on MOM init, chkpt, no migrate
JOB_EXEC_INITRMG -6 job aborted on MOM init, chkpt, ok migrate
JOB_EXEC_BADRESRT -7 job restart failed
JOB_EXEC_CMDFAIL -8 exec() of user command failed

2 Slurm Resource Manager

The Slurm Resource and Workload Manager (formerly known as Simple Linux Utility for Resource Management (see: SLURM)) is a free and open-source job scheduler.

2.1 Batch Job (Slurm) Variables : bwUniCluster

Since the work load manager MOAB on bwUniCluster uses the resource manager Slurm, the following environment variables of Slurm are added to your environment once your job has started (only an excerpt of the most important ones).

Environment Brief explanation
SLURM_JOB_CPUS_PER_NODE Number of processes per node dedicated to the job
SLURM_JOB_NODELIST List of nodes dedicated to the job
SLURM_JOB_NUM_NODES Number of nodes dedicated to the job
SLURM_MEM_PER_NODE Memory per node dedicated to the job
SLURM_NPROCS Total number of processes dedicated to the job
SLURM_CLUSTER_NAME Name of the cluster executing the job
SLURM_CPUS_PER_TASK Number of CPUs requested per task
SLURM_JOB_ACCOUNT Account name
SLURM_JOB_ID Job ID
SLURM_JOB_NAME Job Name
SLURM_JOB_PARTITION Partition/queue running the job
SLURM_JOB_UID User ID of the job's owner
SLURM_SUBMIT_DIR Job submit folder. The directory from which msub was invoked.
SLURM_JOB_USER User name of the job's owner
SLURM_RESTART_COUNT Number of times job has restarted
SLURM_PROCID Task ID (MPI rank)
SLURM_NTASKS The total number of tasks available for the job
SLURM_STEP_ID Job step ID
SLURM_STEP_NUM_TASKS Task count (number of PI ranks)

See also:

2.2 Job Exit Codes

A job's exit code (aka exit status, return code and completion code) is captured by SLURM and saved as part of the job record.
Any non-zero exit code will be assumed to be a job failure and will result in a Job State of FAILED with a reason of "NonZeroExitCode".
The exit code is an 8 bit unsigned number ranging between 0 and 255. While it is possible for a job to return a negative exit code, SLURM will display it as an unsigned value in the 0 - 255 range.

2.2.1 Displaying Exit Codes and Signals

SLURM displays a job's exit code in the output of the scontrol show job and the sview utility. SLURM displays job step exit codes in the output of the scontrol show step and the sview utility.
When a signal was responsible for a job or step's termination, the signal number will be displayed after the exit code, delineated by a colon(:).