HPC Centers: Coral Slurm Guide

When to Contact the HPC Help Desk	Users should contact the HPC Help Desk when assistance is needed for unclassified problems, issues, or questions.
Hours of Operation	8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).
HPC Centers Home Page	https://centers.hpc.mil/
Help Desk Video Tutorial	Getting Help: the HPC Help Desk
Phone Number	1-877-222-2039 or (937) 255-0679
Help E-mail	help@helpdesk.hpc.mil
Accounts E-mail	accounts@helpdesk.hpc.mil
HPC Help Desk Manager E-mail	manager@helpdesk.hpc.mil
After Hours	Calls, e-mails and tickets received after normal operating hours will be addressed the following business day.
Mailing Address	DoD HPCMP HPC Help Desk AFRL/RCM 2435 Fifth Street Wright-Patterson Air Force Base, OH - 45433-7802
Ticket Requests	Active users may submit tickets from the Active User Help Ticket. Inactive users may submit tickets using the Inactive User Help Ticket.

Coral Slurm Guide

1. Introduction
2. Anatomy of a Batch Script
2.1. Specify Your Shell
2.2. Required Scheduler Directives
3. Submitting Your Job
4. Simple Batch Script Example
5. Job Management Commands
6. Optional Slurm Directives
6.1. Job Identification Directives
6.2. Job Environment Directives
6.3. Reporting Directives
6.4. Job Dependency Directives
7. Environment Variables
7.1. Slurm Environment Variables
7.2. Other Important Environment Variables
8. Example Scripts
8.1. MPI Script
9. Batch Scheduler Rosetta

1. Introduction

On large-scale computers, many users must share available resources. Because of this, you can't just log on to one of these systems, upload your programs, and start running them. Essentially, your programs must "get in line" and wait their turn, and there is more than one of these lines, or queues, from which to choose. Some queues have a higher priority than others (like the express checkout at the grocery store). The queues available to you are determined by the projects you are involved with.

The jobs in the queues are managed and controlled by a batch queuing system, without which, users could overload systems, resulting in tremendous performance degradation. The queuing system will run your job as soon as it can while still honoring the following:

Meeting your resource requests
Not overloading the system
Running higher priority jobs first
Maximizing overall throughput

We use the Slurm queuing system. The Slurm module should be loaded automatically for you at login, allowing you access to the Slurm commands.

2. Anatomy of a Batch Script

A batch script is simply a small text file that can be created with a text editor such as vi or notepad. Although the specifics of a batch script will differ slightly from system to system, a basic set of components are always required, and a few components are just always good ideas. The basic components of a batch script must appear in the following order:

Specify Your Shell
Required Slurm Directives
The Execution Block

Note: Not all applications on Linux systems can read DOS-formatted text files. Slurm does not handle ^M characters well nor do some compilers. To avoid complications, please remember to convert all DOS-formatted ASCII text files with the dos2unix utility before use on any HPC system. Users are also cautioned against relying on ASCII transfer mode to strip these characters, as some file transfer tools do not perform this function.

2.1. Specify Your Shell

First, remember your batch script is a script. It is a good idea to specify which shell your script is written in. Unless you specify otherwise, Slurm will use your default login shell to run your script. To tell Slurm which shell to use, start your script with a line similar to the following, where shell is either bash, sh, ksh, csh, or tcsh: #!/bin/shell where shell is either bash (Bourne-Again Shell), sh (Bourne Shell), ksh (korn shell), csh (C shell), tcsh (enhanced C shell), or zsh (Z shell).

2.2. Required Scheduler Directives

The next block of your script will tell Slurm about the resources your job needs by including Slurm directives. These directives are a special form of comment, beginning with "#SBATCH". As you might suspect, the # character tells the shell to ignore the line, but Slurm reads these directives and uses them to set various values. IMPORTANT!! All Slurm directives MUST come before the first line of executable code in your script, otherwise they will be ignored.

Every script must include directives for the following:

The number of codes per node
The number of nodes and processes per node you are requesting
How nodes should be allocated
The maximum amount of time your job should run
Which partition you want your job to run in
Which queue you want your job to run in
Your Project ID

Slurm also provides additional optional directives. These are discussed in Optional Slurm Directives, below.

Number of Nodes and Processes per Node

Before Slurm can schedule your job, it needs to know how many nodes you want. Before your job can be run, it will also need to know how many processes you want to run on each of those nodes. In general, you would specify one process per core, but you might want more or fewer processes depending on the programming model you are using. See Example Scripts (below) for alternate use cases.

#SBATCH --ntasks=76              # Number of MPI tasks (i.e. processes)
#SBATCH --cpus-per-task=1        # Number of cores per MPI task 
#SBATCH --nodes=2                # Max number of nodes to be allocated
#SBATCH --ntasks-per-node=38     # Max number of tasks on each node
#SBATCH --ntasks-per-socket=19   # Max number of tasks on each socket

How Nodes Should Be Allocated

Some default behaviors in Slurm have the potential to seriously impair the ability of your scripts to run in certain situations and could impose restrictions on submitted jobs that might cause them to wait much longer in the queue than necessary. To prevent these situations from occurring, the following Slurm directive is required in all batch scripts on Coral:

#SBATCH --distribution=cyclic:cyclic
# Distribute tasks cyclically first among nodes
# and then among sockets within a node

For an explanation of what this directive means, see the sbatch man page.

2.2.1. How Long to Run

Next, Slurm needs to know how long your job will run. For this you will have to make an estimate. There are three things to keep in mind:

Your estimate is a limit. If your job hasn't completed within your estimate, it will be terminated.
Your estimate will affect how long your job waits in the queue. In general, shorter jobs will run before longer jobs.
Each queue has a maximum time limit. You cannot request more time than the queue allows.

To specify how long your job will run, include the following directive:

#SBATCH --time=00:05:00 # Wall clock limit (days-hrs:min:sec)

2.2.2. Which Queue to Run In

Now, Slurm needs to know which queue you want your job to run in. Your options here are determined by current cluster topology and project usage of cluster resources. Currently Coral is partitioned by non-gpu and the two available GPU types. Other queues may be created, and access to these queues is restricted to projects that have been granted special privileges due to urgency or importance, and they will not be discussed here.

To see the list of queues available on the system, use the sinfo command. To specify the queue you want your job to run in, include the following directive:

#SBATCH --partition=standard # Run job in the CPU only partition

2.2.3. Your Project ID

Slurm now needs to know which project ID to charge for your job. You can use the show_usage command to find the projects available to you and their associated project IDs. In the show_usage output, project IDs appear in the column labeled "Subproject." Note: Users with access to multiple projects should remember the project they specify may limit their choice of queues.

To specify the Project ID for your job, include the following directive:

#SBATCH --account=Project_ID

2.2.4. The Execution Block

Once the Slurm directives have been supplied, the execution block may begin. This is the section of your script that contains the actual work to be done. A well-written execution block will generally contain the following stages:

Environment Setup - This might include setting environment variables, loading modules, creating directories, copying files, initializing data, etc. As the last step in this stage, you will generally cd to the directory you want your script to execute in. Otherwise, your script would execute by default in your home directory. Most users use "cd $Slurm_SUBMIT_DIR" to run the batch script from the directory where they typed srun or sbatch to submit the job.
Compilation - You may need to compile your application if you do not already have a pre-compiled executable available.
Launching - Your application is launched using the launch-command command.
Clean up - This usually includes archiving your results and removing temporary files and directories.

3. Submitting Your Job

Once your batch script is complete, you will need to submit it to Slurm for execution using the sbatch command. For example, if you have saved your script into a text file named run.Slurm, you would type sbatch run.Slurm.

Occasionally you may want to supply one or more directives directly on the qsub command line. Directives supplied in this way override the same directives if they are already included in your script. The syntax to supply directives on the command line is the same as within a script except that #Slurm is not used. For example:

sbatch --time=HHH:MM:SS run.Slurm

4. Simple Batch Script Example

The batch scripts below contain all the required directives and common script components discussed above.

  #!/bin/bash
  ## Required Slurm Directives --------------------------------------
  #SBATCH --account=Project_ID
  #SBATCH --partition=standard
  #SBATCH --ntasks=76              
  #SBATCH --cpus-per-task=1        
  #SBATCH --nodes=2                
  #SBATCH --ntasks-per-node=38     
  #SBATCH --ntasks-per-socket=19   
  #SBATCH --time=12:00:00
  #SBATCH --output=mpi-array-%j.out
  
  ## Execution Block
  # Environment Setup ---------------------------------------------
  # cd to your scratch directory in /work1/scratch/<username>
  cd ${WORKDIR}
  
  # create a job-specific subdirectory based on JOBID and cd to it
  mkdir -p ${Slurm_JOBID}
  cd ${Slurm_JOBID}
  
  # Launching  ----------------------------------------------------
  # copy executable from $HOME and submit it
  cp ${HOME}/my_prog.exe .
  mpirun ./my_prog.exe > my_prog.out
  # Clean up  -----------------------------------------------------
  ## Remove temporary files
  rm *.o *.temp

5. Job Management Commands

The table below contains commands for managing your jobs on Slurm.

Job Management Commands
Command	Description
`srun`	Submit a job
`sbatch`	Submit a batch job
`squeue`	List jobs in a queue
`sstat`	Check the status of a job, only steps started with srun will show anything
`sinfo`	Display the status of all Slurm queues
`sinfo -r`	Display all offline Slurm batch nodes
`scancel`	Delete a job
`scontrol hold`	Place a job on hold.
`scontrol release`	Release a job from hold
`sacct`	Display job accounting data from a completed job
BCT Commands that must be ported for Slurm
`qview`	A more user-friendly version of squeue
`show_queues`	A more user-friendly version of sinfo
`qpeek`	Lets you see the stdout and stderr of your running job

6. Optional Slurm Directives

In addition to the required directives mentioned above, Slurm has many other directives, but most users will only ever use a few of them. Some of the more useful optional directives are listed below.

6.1. Job Identification Directives

Job identification directives allow you to identify characteristics of your jobs. These directives are voluntary but strongly encouraged. The following table lists useful job identification directives.

Job Identification Directives
Directive	Options	Description
`--job-name`	Job_name	Name of your job
`--mail-user`	e-mail address	e-mail recipient
`--mail-type`	None, begin, end, fail, all	e-mail event type
`--output`	stdout, filename	Job output
`--error`	Stderr, filename	Job error
`--time`	Time	Time limit hrs:min:sec
`--array`	Indexes	Submit array job
`--begin`	Time	Job start time
`--comment`	String	Arbitrary comment enclosed in double quotes
`-contiguous`		Nodes must form a contiguous set
`--deadline`	Time	Remove job if no ending is possible before deadline
`--dependency`	Dependency list	Defer job start until all dependencies are met
`--distribution`	Block, cyclic, plane, arbitrary	Specify alterate distribution methods for remote processes.
`--exclusive`	=user\|mcs	The job allocation cannot share nodes with other running jobs
`--account`	String	Change resources used by this job to specified account

6.1.1. Job Name

The -J directive allows you to assign a name for your job. In addition to being easier to remember than a numeric job ID, the Slurm environment variable, $SLURM_JOB_NAME, inherits this value and can be used instead of the job ID to create job-specific output directories. To use this directive, add a line in the following form to your batch script:

#SBATCH --jobname=job_20

or to your srun command

srun -J job_20...

6.2. Job Environment Directives

Job environment directives allow you to control the environment in which your script will operate. The following table contains a few useful job environment directives.

Job Environment Directives
Directive	Options	Description
`--export`	Key and value	Pass variables into Slurm

6.2.1. Interactive Batch Shell

To run an interactive job:

srun --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i

6.2.2. Export Variables

The --export=ALL directive instructs Slurm to export all the environment variables from your login environment into your batch environment. To use this directive, add a line in the following form to your batch script:

#SBATCH --export=ALL

or to your srun command

srun --export=ALL ...

6.2.3. Export Specific Variables

You can also tell Slurm to export specific environment variables from your login environment into your batch environment. To use this directive, add a line in one of the following forms to your batch script:

#SBATCH --export=ALL,A=foo,B=fee

or to your srun command

srun --export=ALL,A=foo,B=fee

Using either of these methods, multiple comma-separated variables can be included. It is also possible to set values for variables exported in this way, as follows:

srun --export=variable1=value1,variable2=value2,variable3=my_value3

6.3. Reporting Directives

Reporting directives allow you to control what happens to standard output and standard error messages generated by your script. They also allow you to specify e-mail options to be executed at the beginning and end of your job.

6.3.1. Redirecting Stdout and Stderr

By default, messages written to stdout and stderr are merged for you in files named x-job_id.out or job_id.out, respectively, where x is either the name of the script or the name specified with the -J directive, and job_id is the ID of the job. If you want to change this behavior, the --error and --output directives allow you to redirect stdout and stderr messages to different named files.

Redirection Directives
Directive	Options	Description
`--error`	file_name	Redirect standard error to the named file
`--output`	file_name	Redirect standard output to the named file

6.3.2. Creating E-mail Alerts

Many users want to be notified when their jobs begin and end. The --mail-type directive makes this possible. If you use this directive, you will also need to supply the --mail-user directive with one or more e-mail addresses to be used.

E-mail Directives
Directive	Options	Description
`--mail-type`	None, begin, end, fail, all	Event types to send e-mail
`--mail-user`	e-mail address(es)	Set the e-mail to address(es) to be used

For example:

#SBATCH --mail-type=END,FAIL            # Events: NONE, BEGIN, END, FAIL, ALL
#SBATCH --mail-user=smith@mhpcc.hpc.mil # Where to send mail

6.4. Job Dependency Directives

Job dependency directives allow you to specify dependencies your job may have on other jobs. This allows users to control the other running jobs. These directives will generally take the following form:

#SBATCH --dependency=dependency_expression

dependency_expression is a comma-delimited list of one or more dependencies, and each dependency is of the form:

type:jobids

type is one of the directives listed below, and jobids is a colon-delimited list of one or more job IDs your job is dependent upon.

Job Dependency Directives
Directive	Description
after	Execute this job after listed jobs have begun.
afterany	Execute this job after listed jobs have terminated.
afterburstbuff	Execute this job after listed jobs have terminated and any associated burst buffer stage out operations have completed.
aftercorr	A task of this job array can begun execution after the corresponding task ID in the specified job has completed successfully (ran to completion with an exit code of zero).
afternotok	Listed jobs may be run after this job has terminated in some failed state (non-zero exit code, node failure, timed out, etc.).
afterok	Listed jobs may be run if the specified jobs have successfully executed (ran to completion with an exit code of zero).
expand	Resources allocated to this job should be used to expand the specified job. The job to expand must share the same Quality of Service (QOS) and partition. Gang scheduling of resources in the partition is also not supported.
singleton	This job can begin execution after any previously launched jobs sharing the same job name and user have terminated. In other words, only one job by that name and owned by that user can be running or suspended at any point in time.

For example, run a job after completion (success or failure) of job ID 1234:

#SBATCH --dependency=afterany:1234

Or, run a job after successful completion of job ID 1234:

#SBATCH --dependency=afterok:1234

For more information about job dependencies, see the sbatch and srun man pages.

7. Environment Variables

7.1. Slurm Environment Variables

Frequently Used Slurm Environmental Variables
Slurm Variable	Description
`$SLURM_JOBID` or `$SLURM JOB ID`	Job identifier assigned to a job or job array by the batch system.
`$SLURM_NODELIST` or `$SLURM JOB NODELIST`	List of nodes allocated to the job.
`$SLURM_SUBMIT_DIR`	The absolute path of directory where sbatch or srun was executed.
`$SLURM_JOB_NAME`	The job name supplied by the user.

The following additional Slurm variables may be useful to some users.

Other Slurm Environment Variables
Variable	Description
`$SLURM_JOB_ARRAY_ID`	The job ID for a job array
`$SLURM_JOB_ACCOUNT`	The Project ID charged for the job
`$SLURM_CPUS_PER_TASK`	Number of cpus requested per task. Only set if the cpus-per-task option is specified.
`$SLURM_JOB_CPUS_PER_NODE`	Count of processors available to the job on this node
`$SLURM_NTASKS` or `$SLURM_NPROCS`	Same as `-n` and `-ntasks`
`$SLURM_SUBMIT_HOST`	The hostname of the node from which `sbatch` was executed
`$SLURM_O_HOST`	Host name on which the `qsub` command was executed
`$SLURMD_NODENAME`	Name of the node running the job script
`$SLURM_JOB_PARTITION`	Name of the partition in which the job is running
`$SLURM_O_PATH`	Value of PATH from submission environment

7.2. Other Important Environment Variables

In addition to the Slurm environment variables, the table below lists a few other variables which are not specifically associated with Slurm. These variables are not generally required but may be important depending on your job.

Other Important Environment Variables
Variable	Description
`$OMP_NUM_THREADS`	The number of OpenMP threads per node
`$XT_LINUX_SHMEM_STACK_SIZE`	Controls the size of the stack per process
`$XT_LINUX_SHMEM_HEAP_SIZE`	Controls the size od the private heap per process
`$XT_SYMMETRIC_HEAP_SIZE`	Controls the size of the symmetric heap per process
`$MPI_DSM_DISTRIBUTE`	Ensures that memory is assigned closest to the physical core where each MPI process is running
`$MPI_GROUP_MAX`	Maximum number of groups within a communicator
`$SLURM_QUEUE`	The name of the queue from which the job is executed

8. Example Scripts

All the script examples shown below contain a "Cleanup" section demonstrating how to automatically archive your data using the transfer queue and clean up your $WORKDIR after your job completes. Using this method helps avoid data loss and ensures your allocation is not charged for idle cores while performing file transfer operations.

8.1. MPI Script

The following script is for a 96 core MPI job running for 20 hours in the standard queue.

#!/bin/ksh
## Required Directives ------------------------------------
#SBATCH --nodes 2
#SBATCH --tasks-per-node 38
#SBATCH --distribution=cyclic:cyclic
#SBATCH --time=20:00:00
#SBATCH --partition=standard
#SBATCH --account Project_ID

## Optional Directives ------------------------------------
#SBATCH --job-name=testjob
#SBATCH --output=testjob-%j.txt
#SBATCH --mail-user="your.email@site.mil"
#SBATCH --mail-type=BEGIN,END,FAIL

## Execution Block ----------------------------------------
# Environmental Setup
# the following environment variable is not required, but will
# optimally assign processes to cores and improve memory use.
export MPI_DSM_DISTRIBUTE=yes

. /cm/local/apps/environment-modules/4.2.1/init/bash
module purge
module load gcc/8.2.0 slurm/18.08.9 openmpi/4.0.3-aspen

# cd to your scratch directory in /work1
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
mkdir -p ${SLURM_JOBID}
cd ${SLURM_JOBID}

## Launching ----------------------------------------------
mpiexec -np 96 /executable/path/my_prog.exe > my_prog.out

9. Batch Scheduler Rosetta

Batch Scheduler Rosetta
User Commands	PBS	Slurm
Job Submission	qsub Script_File	sbatch Script_File
Job Deletion	qdel Job_ID	scancel Job_ID
Job status (by job)	qstat Job_ID	squeue Job_ID
Job status (by user)	qstat -u User_Name	squeue -u User_Name
Job hold	qhold Job_ID	scontrol hold Job_ID
Job release	qrls Job_ID	scontrol release Job_ID
Queue list	qstat -Q	squeue
Node list	pbsnodes -l	sinfo -N OR scontrol show nodes
Cluster status	qstat -a	sinfo
GUI	xpbsmon	sview
Environment	PBS	Slurm
Job ID	$PBS_JOBID	$SLURM_JOBID
Submit Directory	$PBS_O_WORKDIR	$SLURM_SUBMIT_DIR
Submit Host	$PBS_O_HOST	$SLURM_SUBMIT_HOST
Node List	$PBS_NODEFILE	$SLURM_JOB_NODELIST
Job Array Index	$PBS_ARRAYID	$SLURM_ARRAY_TASK_ID
Job Specification	PBS	Slurm
Script Directive	#PBS	#SBATCH
Queue	-q Queue_Name	ARL: -p Queue_Name AFRL and Navy: -q Queue_Name
Node Count	-l select=N1:ncpus=N2: mpiprocs=N3 (N1 = Node count N2 = Max cores per node N3 = Cores to use per node)	-N min[-max]
Core Count	-l select=N1:ncpus=N2: mpiprocs=N3 (N1 = Node count N2 = Max cores per node N3 = Cores to use per node Core Count = N1 x N3)	--ntasks=total_cores_in_run
Wall Clock Limit	-l walltime=hh:mm:ss	-t min OR -t days-hh:mm:ss
Standard Output File	-o File_Name	-o File_Name
Standard Error File	-e File_Name	-e File_Name
Combine stdout/err	-j oe (both to stdout) OR -j eo (both to stderr)	(use -o without -e)
Copy Environment	-V	--export=ALL\|NONE\|Variable_List
Event Notification	-m [a][b][e]	--mail-type=[BEGIN],[END],[FAIL]
Email Address	-M Email_Address	--mail-user=Email_Address
Job Name	-N Job_Name	--job-name=Job_Name
Job Restart	-r y\|n	--requeue OR --no-requeue (NOTE: configurable default)
Working Directory	No option – defaults to home directory	--workdir=/Directory/Path
Resource Sharing	-l place=scatter:excl	--exclusive OR --shared
Account to charge	-A Project_ID	--account=Project_ID
Tasks per Node	-l select=N1:ncpus=N2: mpiprocs=N3 (N1 = Node count N2 = Max cores per node N3 = Cores to use per node)	--tasks-per-node=count
Job Dependency	-W depend=state:Job_ID[:Job_ID...][,state:Job_ID[:Job_ID...]]	--depend=state:Job_ID
Job host preference		--nodelist=nodes AND/OR --exclude=nodes
Job Arrays	-J N-M[:step][%Max_Jobs]	--array=N-M[:step]
Generic Resources	-l other=Resource_Spec	--gres=Resource_Spec
Licenses	-l app=number Example: -l abaqus=21 (Note: license resource allocation)	-L app:number
Begin Time	-a [[[YYYY]MM]DD]hhmm[.ss] (Note: no delimiters)	--begin=YYYY-MM-DD[Thh:mm[:ss]]

HPC Help Desk

Coral Slurm Guide

Table of Contents

1. Introduction

2. Anatomy of a Batch Script

2.1. Specify Your Shell

2.2. Required Scheduler Directives

2.2.1. How Long to Run

2.2.2. Which Queue to Run In

2.2.3. Your Project ID

2.2.4. The Execution Block

3. Submitting Your Job

4. Simple Batch Script Example

5. Job Management Commands

6. Optional Slurm Directives

6.1. Job Identification Directives

6.1.1. Job Name

6.2. Job Environment Directives

6.2.1. Interactive Batch Shell

6.2.2. Export Variables

6.2.3. Export Specific Variables

6.3. Reporting Directives

6.3.1. Redirecting Stdout and Stderr

6.3.2. Creating E-mail Alerts

6.4. Job Dependency Directives

7. Environment Variables

7.1. Slurm Environment Variables

7.2. Other Important Environment Variables

8. Example Scripts

8.1. MPI Script

9. Batch Scheduler Rosetta