Using Slurm Job Arrays
Slurm Job Arrays allow a large collection of Slurm runs to be executed in one script. These job arrays can significantly relieve the strain on the Slurm queueing system. They also allow a much easier method to run a large number of jobs that only vary by one or two parameters. We highly recommend that you employ job arrays in cases of 200 or more Slurm jobs that vary by only one parameter (or two or three).
Implementing Slurm Job arrays in your Slurm job script is very easy. The inclusion of the Slurm directive #SBATCH –array=n-m:step (where n is the starting index, m is the ending index, and the optional step is the step size) tells Slurm that this script is a job array, and Slurm queues it in FLOOR[(m-n)/step+1] instances. The environment variable $SLURM_ARRAY_TASK_ID contains the current index instance your script is in. The command echo $SLURM_ARRAY_TASK_ID outputs the current job index in which the script is executing (e.g., 7). When you submit a job array Slurm job, the Slurm job number has left and right brackets, [], appended to it. A job that would normally look something like "384294", would now look like "384294[]" in the Slurm squeue command.
As an explicit example of the Slurm job array directive, the inclusion of this Slurm directive #SBATCH --array=1-999:2 into your Slurm script causes Slurm to run 500 instances ( FLOOR( (1000-2)/2+1) = 500 ) of your script. Each instance of your script has $SLURM_ARRAY_TASK_ID uniquely set to one of 1, 3, 5, 7, ..., 999 when executed.
The following example Slurm job array script runs five MPI jobs. Note the Slurm job array directive on line 7: #SBATCH --array=0-12:3.
1 #!/bin/bash 2 #SBATCH -t 00:20:00 3 #SBATCH -A Project_ID 4 #SBATCH -q debug 5 #SBATCH -N 1 6 #SBATCH --job-name=Job_Array_Test 7 #SBATCH --array=0-12:3 8 # #SBATCH --array=0-12:3 signifies a job array from 0 to 12 in steps of 3. 9 10 echo "Slurm Job Id SLURM_ARRAY_JOB_ID is ${SLURM_ARRAY_JOB_ID}" 11 echo "Slurm job array index SLURM_ARRAY_TASK_ID value is ${SLURM_ARRAY_TASK_ID}" 12 13 cd $WORKDIR 14 15 # Make a directory of the main Slurm JOBID and subdir with Job Array Index 16 mkdir -p ${SLURM_ARRAY_JOB_ID}/${SLURM_ARRAY_TASK_ID} 17 18 # Make a variable that has full path to this run 19 # TMPD might look like this /p/work1/smith/392813/9/ 20 TMPD=${WORKDIR}/${SLURM_ARRAY_JOB_ID}/${SLURM_ARRAY_TASK_ID} 21 22 # copy executable or do a module load to get paths to executables 23 cp $WORKDIR/picalc.exe ${TMPD}/picalc.exe 24 25 # Though not used here, OPTIONAL_INPUT could hold various inputs 26 # that have different parameters set 27 OPTIONAL_INPUT=$WORKDIR/input.${SLURM_ARRAY_TASK_ID} 28 29 # cd into directory that will contain all output 30 # from this SLURM_ARRAY_TASK_ID run 31 cd ${TMPD} 32 33 # run job and redirect output 34 mpiexec_mpt -n 48 ./picalc.exe ${OPTIONAL_INPUT} >& output.o$SLURM_ARRAY_JOB_ID 35 36 exit
After submitting the sample job array script, the following shows the output from the squeue command (e.g., squeue --me) for a queued job array job.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 102575_[0,3,6,9,12 general Job_Arra smith PD 0:00 1 (Priority)
The following shows the Slurm output from the squeue command for a partially running job array. Note that Slurm leaves the remaining queued jobs on the first line and the running job in the array on the subsequent lines. In this example, job array index 0 is running and the remaining (3-12) are still queued.
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 102575_[3,6,9,12] general Job_Arra smith PD 0:00 1 (Priority) 102575_0 general Job_Arra smith R 0:35 1 n1271
When this job array script is run, it executes five MPI jobs. Each Slurm job array instance executes the submitted Slurm script with the environment variable $SLURM_ARRAY_TASK_ID set to 0, 3, 6, 9, or 12 for each Slurm run script instance. Line 20 utilizes the $SLURM_ARRAY_TASK_ID environment variable, along with the Slurm job number environment variable, $SLURM_ARRAY_JOB_ID, and $WORKDIR, to ultimately set $TMPD to a full directory path for each job array instance. The $TMPD variable holds paths that look like /p/work1/smith/476538/0, /p/work1/smith/476538/3, /p/work1/smith/476538/6, /p/work1/smith/476538/9, /p/work1/smith/476538/12 for each separate run instance. At line 22, the script copies the picalc.exe executable into that full path, changes directory on line 31 to that full path, and runs the picalc program. After the job has completed, manually executing ls in /p/work1/smith/476538 will show five directories: 0/, 12/, 3/, 6/, and 9/. Each directory will contain the picalc.exe executable and the output from that program, as follows:
-bash-4.2$ pwd /p/work1/smith/476538/6 -bash-4.2$ ls -F 0/ 12/ 3/ 6/ 9/ -bash-4.2$ cd 6 -bash-4.2$ ls -ltrFa -rwxr-----. 1 smith msrc 817032 Jul 23 10:59 picalc.exe* -rw-r--r--. 1 smith msrc 7326 Jul 23 10:59 output.o476538 drwxr-xr-x. 2 smith msrc 4096 Jul 23 10:59 ./ drwxr-xr-x. 7 smith msrc 4096 Jul 23 11:00 ../
The standard output/error from the job array always takes the form of <Slurm job name from '--job-name'>/<SLURM_JOBID>/<SLURM_ARRAY_TASK_ID>. The files are usually copied back to where the user sbatched the job array script. In this case, the job array was sbatched from $WORKDIR (/p/work1/smith/) and appears as follows:
-bash-4.2$ ls -ltrF Job_Array_Test* -rw-------. 1 smith msrc 510 Jul 23 11:00 slurm-476538_9.out -rw-------. 1 smith msrc 567 Jul 23 11:00 slurm-476538_6.out -rw-------. 1 smith msrc 567 Jul 23 11:00 slurm-476538_3.out -rw-------. 1 smith msrc 884 Jul 23 11:00 slurm-476538_0.out -rw-------. 1 smith msrc 889 Jul 23 11:01 slurm-476538_12.out
It is important to note that indices might be executed "out of order".
Array job 12 could run before array job 0. One last note: to see more specifics on
the status of all jobs in the array, either use the -r flag to
squeue or use the following command:
scontrol show job 104267 | grep -e JobState -e JobId
Some users might want to utilize multiple varying indices. For example, you might want a series of jobs that cycle through a set of Cartesian coordinates. Although Slurm only provides a single parameter index, you can partition this single index into as many indices as you would like. In this case we will use two indices: one index for x and one index for y. Shown below is how to map a Slurm index of 1,200 values ranging from 0 to 1,199 into x and y indices varying from 0 to 29 for x and 0 to 39 for y.
1 #!/bin/bash 2 #SBATCH -t 00:20:00 3 #SBATCH -A Project_ID 4 #SBATCH -q debug 5 #SBATCH -N 1 6 #SBATCH --job-name=Job_Array_Test2 7 #SBATCH --array=0-1199 8 9 cd $WORKDIR 10 11 j=${SLURM_ARRAY_TASK_ID} 12 (( y=j % 40 )) 13 (( x=j/40 )) 14 15 echo "For index $j, my x and y indices are=$x and $y"
Submitting the script above to Slurm produces 1,200 files. Selected output
from the first 90 files is shown below:
-bash-4.2$ grep 'For index' Job_Array_Test*
slurm-619717_0.out:For index 0, my x and y indices are=0 and 0
slurm-619717_1.out:For index 1, my x and y indices are=0 and 1
slurm-619717_2.out:For index 2, my x and y indices are=0 and 2
slurm-619717_20.out:For index 20, my x and y indices are=0 and 20
slurm-619717_30.out:For index 30, my x and y indices are=0 and 30
slurm-619717_31.out:For index 31, my x and y indices are=0 and 31
slurm-619717_39.out:For index 39, my x and y indices are=0 and 39
slurm-619717_40.out:For index 40, my x and y indices are=1 and 0
slurm-619717_41.out:For index 41, my x and y indices are=1 and 1
slurm-619717_49.out:For index 49, my x and y indices are=1 and 9
slurm-619717_77.out:For index 77, my x and y indices are=1 and 37
slurm-619717_78.out:For index 78, my x and y indices are=1 and 38
slurm-619717_79.out:For index 79, my x and y indices are=1 and 39
slurm-619717_80.out:For index 80, my x and y indices are=2 and 0
slurm-619717_81.out:For index 81, my x and y indices are=2 and 1
slurm-619717_89.out:For index 89, my x and y indices are=2 and 9
We hope users who run large numbers of jobs that only vary by one or two parameters will see the benefits for themselves and for the Slurm system of using job arrays.