Using Slurm Job Arrays

Slurm Job Arrays allow a large collection of Slurm runs to be executed in one script. These job arrays can significantly relieve the strain on the Slurm queueing system. They also allow a much easier method to run a large number of jobs that only vary by one or two parameters. We highly recommend that you employ job arrays in cases of 200 or more Slurm jobs that vary by only one parameter (or two or three).

Implementing Slurm Job arrays in your Slurm job script is very easy. The inclusion of the Slurm directive #SBATCH –array=n-m:step (where n is the starting index, m is the ending index, and the optional step is the step size) tells Slurm that this script is a job array, and Slurm queues it in FLOOR[(m-n)/step+1] instances. The environment variable $SLURM_ARRAY_TASK_ID contains the current index instance your script is in. The command echo $SLURM_ARRAY_TASK_ID outputs the current job index in which the script is executing (e.g., 7). When you submit a job array Slurm job, the Slurm job number has left and right brackets, [], appended to it. A job that would normally look something like "384294", would now look like "384294[]" in the Slurm squeue command.

As an explicit example of the Slurm job array directive, the inclusion of this Slurm directive #SBATCH --array=1-999:2 into your Slurm script causes Slurm to run 500 instances ( FLOOR( (1000-2)/2+1) = 500 ) of your script. Each instance of your script has $SLURM_ARRAY_TASK_ID uniquely set to one of 1, 3, 5, 7, ..., 999 when executed.

The following example Slurm job array script runs five MPI jobs. Note the Slurm job array directive on line 7: #SBATCH --array=0-12:3.

 1  #!/bin/bash
 2  #SBATCH -t 00:20:00
 3  #SBATCH -A Project_ID
 4  #SBATCH -q debug
 5  #SBATCH -N 1
 6  #SBATCH --job-name=Job_Array_Test
 7  #SBATCH --array=0-12:3	
 8  #  #SBATCH --array=0-12:3 signifies a job array from 0 to 12 in steps of 3.
 9
10  echo "Slurm Job Id SLURM_ARRAY_JOB_ID is ${SLURM_ARRAY_JOB_ID}"
11  echo "Slurm job array index SLURM_ARRAY_TASK_ID value is ${SLURM_ARRAY_TASK_ID}"
12
13  cd $WORKDIR
14
15  # Make a directory of the main Slurm JOBID and subdir with Job Array Index
16  mkdir -p ${SLURM_ARRAY_JOB_ID}/${SLURM_ARRAY_TASK_ID}
17
18  # Make a variable that has full path to this run
19  # TMPD might look like this /p/work1/smith/392813/9/
20  TMPD=${WORKDIR}/${SLURM_ARRAY_JOB_ID}/${SLURM_ARRAY_TASK_ID}
21
22  # copy executable or do a module load to get paths to executables
23  cp $WORKDIR/picalc.exe ${TMPD}/picalc.exe
24
25  # Though not used here, OPTIONAL_INPUT could hold various inputs
26  # that have different parameters set
27  OPTIONAL_INPUT=$WORKDIR/input.${SLURM_ARRAY_TASK_ID}
28
29  # cd into directory that will contain all output
30  # from this SLURM_ARRAY_TASK_ID run
31  cd ${TMPD}
32
33  # run job and redirect output
34  mpiexec_mpt -n 48 ./picalc.exe ${OPTIONAL_INPUT}  >& output.o$SLURM_ARRAY_JOB_ID
35
36  exit

After submitting the sample job array script, the following shows the output from the squeue command (e.g., squeue --me) for a queued job array job.

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
102575_[0,3,6,9,12   general Job_Arra smith    PD       0:00      1 (Priority)

The following shows the Slurm output from the squeue command for a partially running job array. Note that Slurm leaves the remaining queued jobs on the first line and the running job in the array on the subsequent lines. In this example, job array index 0 is running and the remaining (3-12) are still queued.

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
102575_[3,6,9,12]    general Job_Arra smith    PD       0:00      1 (Priority)
102575_0             general Job_Arra smith     R       0:35      1 n1271

When this job array script is run, it executes five MPI jobs. Each Slurm job array instance executes the submitted Slurm script with the environment variable $SLURM_ARRAY_TASK_ID set to 0, 3, 6, 9, or 12 for each Slurm run script instance. Line 20 utilizes the $SLURM_ARRAY_TASK_ID environment variable, along with the Slurm job number environment variable, $SLURM_ARRAY_JOB_ID, and $WORKDIR, to ultimately set $TMPD to a full directory path for each job array instance. The $TMPD variable holds paths that look like /p/work1/smith/476538/0, /p/work1/smith/476538/3, /p/work1/smith/476538/6, /p/work1/smith/476538/9, /p/work1/smith/476538/12 for each separate run instance. At line 22, the script copies the picalc.exe executable into that full path, changes directory on line 31 to that full path, and runs the picalc program. After the job has completed, manually executing ls in /p/work1/smith/476538 will show five directories: 0/, 12/, 3/, 6/, and 9/. Each directory will contain the picalc.exe executable and the output from that program, as follows:

-bash-4.2$ pwd
/p/work1/smith/476538/6
-bash-4.2$ ls -F
0/  12/  3/  6/  9/
-bash-4.2$ cd 6
-bash-4.2$ ls -ltrFa
-rwxr-----. 1 smith msrc 817032 Jul 23 10:59 picalc.exe*
-rw-r--r--. 1 smith msrc   7326 Jul 23 10:59 output.o476538
drwxr-xr-x. 2 smith msrc   4096 Jul 23 10:59 ./
drwxr-xr-x. 7 smith msrc   4096 Jul 23 11:00 ../

The standard output/error from the job array always takes the form of <Slurm job name from '--job-name'>/<SLURM_JOBID>/<SLURM_ARRAY_TASK_ID>. The files are usually copied back to where the user sbatched the job array script. In this case, the job array was sbatched from $WORKDIR (/p/work1/smith/) and appears as follows:

-bash-4.2$ ls -ltrF Job_Array_Test*
-rw-------. 1 smith msrc 510 Jul 23 11:00 slurm-476538_9.out
-rw-------. 1 smith msrc 567 Jul 23 11:00 slurm-476538_6.out
-rw-------. 1 smith msrc 567 Jul 23 11:00 slurm-476538_3.out
-rw-------. 1 smith msrc 884 Jul 23 11:00 slurm-476538_0.out
-rw-------. 1 smith msrc 889 Jul 23 11:01 slurm-476538_12.out

It is important to note that indices might be executed "out of order". Array job 12 could run before array job 0. One last note: to see more specifics on the status of all jobs in the array, either use the -r flag to squeue or use the following command: scontrol show job 104267 | grep -e JobState -e JobId

Some users might want to utilize multiple varying indices. For example, you might want a series of jobs that cycle through a set of Cartesian coordinates. Although Slurm only provides a single parameter index, you can partition this single index into as many indices as you would like. In this case we will use two indices: one index for x and one index for y. Shown below is how to map a Slurm index of 1,200 values ranging from 0 to 1,199 into x and y indices varying from 0 to 29 for x and 0 to 39 for y.

 1 #!/bin/bash
 2  #SBATCH -t 00:20:00
 3  #SBATCH -A Project_ID
 4  #SBATCH -q debug
 5  #SBATCH -N 1
 6  #SBATCH --job-name=Job_Array_Test2
 7  #SBATCH --array=0-1199
 8
 9 cd $WORKDIR
10 
11 j=${SLURM_ARRAY_TASK_ID}
12 (( y=j % 40 )) 
13 (( x=j/40 ))
14  
15  echo "For index $j, my x and y indices are=$x and $y"

Submitting the script above to Slurm produces 1,200 files. Selected output from the first 90 files is shown below: -bash-4.2$ grep 'For index' Job_Array_Test* slurm-619717_0.out:For index 0, my x and y indices are=0 and 0 slurm-619717_1.out:For index 1, my x and y indices are=0 and 1 slurm-619717_2.out:For index 2, my x and y indices are=0 and 2 slurm-619717_20.out:For index 20, my x and y indices are=0 and 20 slurm-619717_30.out:For index 30, my x and y indices are=0 and 30 slurm-619717_31.out:For index 31, my x and y indices are=0 and 31 slurm-619717_39.out:For index 39, my x and y indices are=0 and 39 slurm-619717_40.out:For index 40, my x and y indices are=1 and 0 slurm-619717_41.out:For index 41, my x and y indices are=1 and 1 slurm-619717_49.out:For index 49, my x and y indices are=1 and 9 slurm-619717_77.out:For index 77, my x and y indices are=1 and 37 slurm-619717_78.out:For index 78, my x and y indices are=1 and 38 slurm-619717_79.out:For index 79, my x and y indices are=1 and 39 slurm-619717_80.out:For index 80, my x and y indices are=2 and 0 slurm-619717_81.out:For index 81, my x and y indices are=2 and 1 slurm-619717_89.out:For index 89, my x and y indices are=2 and 9

We hope users who run large numbers of jobs that only vary by one or two parameters will see the benefits for themselves and for the Slurm system of using job arrays.