Using PBS Job Arrays

PBS Job Arrays allow a large collection of PBS runs to be executed in one script. These job arrays can significantly relieve the strain on the PBS queueing system. They also allow a much easier method to run a large amount of jobs that only vary by one or two parameters. It is highly recommended that users look to utilize these job arrays in cases of 200 or more PBS jobs that vary by one parameter (or two or three).

Implementing PBS Job arrays in your PBS job script is very easy. The inclusion of the PBS directive #PBS -J n-m:step (where n is the starting index, m is the ending index, and the optional step is the step size) will state to PBS that this PBS script is a job array, and PBS will queue this script in "FLOOR[(m-n)/step+1]" instances. The environment variable $PBS_ARRAY_INDEX will contain the current index instance your script is in. The command echo $PBS_ARRAY_INDEX will output the current job index in which the script is executing (e.g., 7). When you submit a job array PBS job, the PBS job number will have left and right brackets, "[]", appended to it. A job that would normally look something like "384294", would now look like "384294[]" in the PBS qstat command.

As an explicit example of the PBS job array directive, the inclusion of this PBS directive #PBS -J 1-999:2 into your PBS script will cause PBS to run 500 instances ( FLOOR( (1000-2)/2+1) = 500 ) of your script. Each instance of your script will have $PBS_ARRAY_INDEX uniquely set to one of 1, 3, 5, 7, ..., 999 when it is executed.

The following example PBS job array script runs five MPI jobs on Gaffney. Note the PBS job array directive on line 7: #PBS -J 0-12:3.

 1  #!/bin/bash
 2  #PBS -l select=1:ncpus=48:mpiprocs=48
 3  #PBS -l walltime=00:20:00
 4  #PBS -A Project_ID
 5  #PBS -q debug
 6  #PBS -N Job_Array_Test
 7  #PBS -J 0-12:3
 8  #PBS -j oe	
 9  #PBS -V
11  #
12  #  PBS -J 0-12:3 signifies a job array from 0 to 12 in steps of 3.
13  #
15  echo "PBS Job Id PBS_JOBID is ${PBS_JOBID}"
17  echo "PBS job array index PBS_ARRAY_INDEX value is ${PBS_ARRAY_INDEX}"
19  #
20  #  To isolate the job id number, cut on the character "[" instead of
21  #  ".".  PBS_JOBID might look like "48274[].server" rather "48274.server"
22  #  in job arrays
23  #
24  JOBID=`echo ${PBS_JOBID} | cut -d'[' -f1`
26  cd $WORKDIR
28  # Make a directory of the main PBS JOBID
29  mkdir ${JOBID}
31  # go in said directory
32  cd ${JOBID}
34  # Make a subdirectory with the current PBS Job Array Index
35  mkdir ${PBS_ARRAY_INDEX}
37  # Make a variable that has full path to this run
38  # TMPD might look like this /p/work1/smith/392813/9/
41  # copy executable or do a module load to get paths to executables
42  cp $WORKDIR/picalc.exe ${TMPD}/picalc.exe
44  # Though not used here, OPTIONAL_INPUT could hold various inputs
45  # that have different parameters set
48  # cd into directory that will contain all output
49  # from this PBS_ARRAY_INDEX run
50  cd ${TMPD}
52  # run job and redirect output
53  mpiexec_mpt -n 48 ./picalc.exe ${OPTIONAL_INPUT}  >& output.o$JOBID
55  exit

After submitting the sample job array script, the following shows the output from a qstat -sw command (e.g., qstat -sw 468028[]) for a queued job array job.

                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
468028[].pbsser smith    debug    Job_Array_    --    1  48    --  00:20 Q   --

The following shows the PBS output from the qstat -sw command (e.g., qstat -sw 468028[]) for a job array that is running. Note that PBS places a "B" rather than an "R" in the status column "S" for a running job array job.

                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
468028[].pbsser smith    debug    Job_Array_    --    1  48    --  00:20 B   --

When this job array script is running it will execute five MPI jobs. Each PBS job array instance will execute the submitted PBS script with the environment variable $PBS_ARRAY_INDEX being set to 0, 3, 6, 9, or 12 for each PBS run script instance. Line 39 utilizes the $PBS_ARRAY_INDEX environment variable, along with PBS job number environment variables, $JOB_ID and $WORKDIR, to ultimately set $TMPD to a full directory path for each job array instance. The $TMPD variable will hold paths that will look like /p/work1/smith/476538/0, /p/work1/smith/476538/3, /p/work1/smith/476538/6, /p/work1/smith/476538/9, /p/work1/smith/476538/12 for each, separate run instance. At line 42, the script copies the picalc.exe executable into that full path, changes directory on line 50 to that full path, and runs the picalc program. After the job has completed, manually executing an "ls" in /p/work1/smith/476538 will show five directories, 0/, 12/, 3/, 6/, and 9/. Each directory will contain the "picalc.exe" executable and the output from that program as follows:

-bash-4.2$ pwd
-bash-4.2$ ls -F
0/  12/  3/  6/  9/
-bash-4.2$ cd 6
-bash-4.2$ ls -ltrFa
-rwxr-----. 1 smith msrc 817032 Jul 23 10:59 picalc.exe*
-rw-r--r--. 1 smith msrc   7326 Jul 23 10:59 output.o476538
drwxr-xr-x. 2 smith msrc   4096 Jul 23 10:59 ./
drwxr-xr-x. 7 smith msrc   4096 Jul 23 11:00 ../

It is important to note that indices might be executed out-of-order. PBS job array 12 could run before 0. One last note is that to display the current indices completed, a user can execute qstat -f 468477[] | grep array_indices, as follows:

[smith@gaffney06 smith]$ qstat -f 468477[] | grep array_indices
array_indices_submitted = 0-12:3
array_indices_remaining = 12

A few users might want to utilize multiple varying indices. For example, one might want a series of jobs that cycle through a set of Cartesian coordinates. Although PBS only provides a single parameter index, you can partition this single index into as many indices as you would like. In this case we'll use two indices, one index for x and one index for y. Shown below is how to map a PBS index of 1200 values that go from 0 to 1,199 into two x and y indices that vary from 0 to 29 for x, and 0 to 39 for y.

 1 #!/bin/bash
 2 #PBS -l select=1:ncpus=48:mpiprocs=48
 3 #PBS -l walltime=0:10:00
 4 #PBS -A Project_ID
 5 #PBS -q debug
 6 #PBS -N Job_Array_Test
 7 #PBS -J 0-1199
 8 #PBS -j oe	
 9 #PBS -V
11 cd $WORKDIR
14 (( y=j % 40 )) 
15 (( x=j/40 ))
17  echo "For index $j, my x and y indices are=$x and $y"  

Submitting the script above to PBS will produce 1,200 files. The selected output from the first 90 files is shown below:

-bash-4.2$ grep 'For index' Job_Array_Test*
Job_Array_Test.o619717.0:For index 0, my x and y indices are=0 and 0
Job_Array_Test.o619717.1:For index 1, my x and y indices are=0 and 1
Job_Array_Test.o619717.2:For index 2, my x and y indices are=0 and 2
Job_Array_Test.o619717.20:For index 20, my x and y indices are=0 and 20
Job_Array_Test.o619717.30:For index 30, my x and y indices are=0 and 30
Job_Array_Test.o619717.31:For index 31, my x and y indices are=0 and 31
Job_Array_Test.o619717.39:For index 39, my x and y indices are=0 and 39
Job_Array_Test.o619717.40:For index 40, my x and y indices are=1 and 0
Job_Array_Test.o619717.41:For index 41, my x and y indices are=1 and 1
Job_Array_Test.o619717.49:For index 49, my x and y indices are=1 and 9
Job_Array_Test.o619717.77:For index 77, my x and y indices are=1 and 37
Job_Array_Test.o619717.78:For index 78, my x and y indices are=1 and 38
Job_Array_Test.o619717.79:For index 79, my x and y indices are=1 and 39
Job_Array_Test.o619717.80:For index 80, my x and y indices are=2 and 0
Job_Array_Test.o619717.81:For index 81, my x and y indices are=2 and 1
Job_Array_Test.o619717.89:For index 89, my x and y indices are=2 and 9

We hope that users who run a large number of jobs that only vary by a parameter or two will see the benefits of using PBS job arrays for themselves and also for the PBS system.