Using PBS Job Arrays
PBS Job Arrays allow a large collection of PBS runs to be executed in one script. These job arrays can significantly relieve the strain on the PBS queueing system. They also allow a much easier method to run a large amount of jobs that only vary by one or two parameters. It is highly recommended that users look to utilize these job arrays in cases of 200 or more PBS jobs that vary by one parameter (or two or three).
Implementing PBS Job arrays in your PBS job script is very easy. The inclusion of
the PBS directive #PBS -J n-m:step
(where n is the
starting index, m is the ending index, and the optional step
is the step size) will state to PBS that this PBS script is a job array,
and PBS will queue this script in
"FLOOR[(m-n)/step+1]" instances.
The environment variable $PBS_ARRAY_INDEX
will contain the current index
instance your script is in. The command echo $PBS_ARRAY_INDEX
will
output the current job index in which the script is executing (e.g., 7). When you submit
a job array PBS job, the PBS job number will have left and right brackets,
"[]",
appended to it. A job that would normally look something like "384294", would now
look like "384294[]"
in the PBS qstat command.
As an explicit example of the PBS job array directive, the inclusion of this PBS
directive #PBS -J 1-999:2
into your PBS script will cause PBS to run 500
instances ( FLOOR( (1000-2)/2+1) = 500 ) of your script.
Each instance of your script will have $PBS_ARRAY_INDEX
uniquely set to one of 1, 3, 5,
7, ..., 999 when it is executed.
The following example PBS job array script runs five MPI jobs on Gaffney. Note
the PBS job array directive on line 7: #PBS -J 0-12:3
.
1 #!/bin/bash 2 #PBS -l select=1:ncpus=48:mpiprocs=48 3 #PBS -l walltime=00:20:00 4 #PBS -A Project_ID 5 #PBS -q debug 6 #PBS -N Job_Array_Test 7 #PBS -J 0-12:3 8 #PBS -j oe 9 #PBS -V 10 11 # 12 # PBS -J 0-12:3 signifies a job array from 0 to 12 in steps of 3. 13 # 14 15 echo "PBS Job Id PBS_JOBID is ${PBS_JOBID}" 16 17 echo "PBS job array index PBS_ARRAY_INDEX value is ${PBS_ARRAY_INDEX}" 18 19 # 20 # To isolate the job id number, cut on the character "[" instead of 21 # ".". PBS_JOBID might look like "48274[].server" rather "48274.server" 22 # in job arrays 23 # 24 JOBID=`echo ${PBS_JOBID} | cut -d'[' -f1` 25 26 cd $WORKDIR 27 28 # Make a directory of the main PBS JOBID 29 mkdir ${JOBID} 30 31 # go in said directory 32 cd ${JOBID} 33 34 # Make a subdirectory with the current PBS Job Array Index 35 mkdir ${PBS_ARRAY_INDEX} 36 37 # Make a variable that has full path to this run 38 # TMPD might look like this /p/work1/smith/392813/9/ 39 TMPD=${WORKDIR}/${JOBID}/${PBS_ARRAY_INDEX} 40 41 # copy executable or do a module load to get paths to executables 42 cp $WORKDIR/picalc.exe ${TMPD}/picalc.exe 43 44 # Though not used here, OPTIONAL_INPUT could hold various inputs 45 # that have different parameters set 46 OPTIONAL_INPUT=$WORKDIR/input.${PBS_ARRAY_INDEX} 47 48 # cd into directory that will contain all output 49 # from this PBS_ARRAY_INDEX run 50 cd ${TMPD} 51 52 # run job and redirect output 53 mpiexec_mpt -n 48 ./picalc.exe ${OPTIONAL_INPUT} >& output.o$JOBID 54 55 exit
After submitting the sample job array script, the following shows the
output from a qstat -sw command (e.g., qstat -sw 468028[]
) for a
queued job array job.
Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 468028[].pbsser smith debug Job_Array_ -- 1 48 -- 00:20 Q --
The following shows the PBS output from the qstat -sw command (e.g.,
qstat -sw 468028[]
) for a job array that is running. Note that PBS
places a "B" rather than an "R" in the status column "S" for a running job array job.
Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 468028[].pbsser smith debug Job_Array_ -- 1 48 -- 00:20 B --
When this job array script is running it will execute five MPI jobs. Each PBS job
array instance will execute the submitted PBS script with the environment variable
$PBS_ARRAY_INDEX
being set to 0, 3, 6, 9, or 12 for each PBS run script
instance. Line 39 utilizes the $PBS_ARRAY_INDEX
environment variable,
along with PBS job number environment variables, $JOB_ID
and
$WORKDIR
, to ultimately set $TMPD
to a full directory
path for each job array instance. The $TMPD
variable will hold paths
that will look like /p/work1/smith/476538/0
,
/p/work1/smith/476538/3
, /p/work1/smith/476538/6
,
/p/work1/smith/476538/9
, /p/work1/smith/476538/12
for
each, separate run instance. At line 42, the script copies the picalc.exe executable
into that full path, changes directory on line 50 to that full path, and runs the
picalc program. After the job has completed, manually executing an "ls" in
/p/work1/smith/476538
will show five directories, 0/, 12/, 3/, 6/,
and 9/. Each directory will contain the "picalc.exe" executable and the output from
that program as follows:
-bash-4.2$ pwd /p/work1/smith/476538/6 -bash-4.2$ ls -F 0/ 12/ 3/ 6/ 9/ -bash-4.2$ cd 6 -bash-4.2$ ls -ltrFa -rwxr-----. 1 smith msrc 817032 Jul 23 10:59 picalc.exe* -rw-r--r--. 1 smith msrc 7326 Jul 23 10:59 output.o476538 drwxr-xr-x. 2 smith msrc 4096 Jul 23 10:59 ./ drwxr-xr-x. 7 smith msrc 4096 Jul 23 11:00 ../
It is important to note that indices might be executed out-of-order. PBS job
array 12 could run before 0. One last note is that to display the current indices
completed, a user can execute qstat -f 468477[] | grep array_indices
,
as follows:
[smith@gaffney06 smith]$ qstat -f 468477[] | grep array_indices array_indices_submitted = 0-12:3 array_indices_remaining = 12
A few users might want to utilize multiple varying indices. For example, one might want a series of jobs that cycle through a set of Cartesian coordinates. Although PBS only provides a single parameter index, you can partition this single index into as many indices as you would like. In this case we'll use two indices, one index for x and one index for y. Shown below is how to map a PBS index of 1200 values that go from 0 to 1,199 into two x and y indices that vary from 0 to 29 for x, and 0 to 39 for y.
1 #!/bin/bash 2 #PBS -l select=1:ncpus=48:mpiprocs=48 3 #PBS -l walltime=0:10:00 4 #PBS -A Project_ID 5 #PBS -q debug 6 #PBS -N Job_Array_Test 7 #PBS -J 0-1199 8 #PBS -j oe 9 #PBS -V 10 11 cd $WORKDIR 12 13 j=${PBS_ARRAY_INDEX} 14 (( y=j % 40 )) 15 (( x=j/40 )) 16 17 echo "For index $j, my x and y indices are=$x and $y"
Submitting the script above to PBS will produce 1,200 files. The selected output from the first 90 files is shown below:
-bash-4.2$ grep 'For index' Job_Array_Test* Job_Array_Test.o619717.0:For index 0, my x and y indices are=0 and 0 Job_Array_Test.o619717.1:For index 1, my x and y indices are=0 and 1 Job_Array_Test.o619717.2:For index 2, my x and y indices are=0 and 2 Job_Array_Test.o619717.20:For index 20, my x and y indices are=0 and 20 Job_Array_Test.o619717.30:For index 30, my x and y indices are=0 and 30 Job_Array_Test.o619717.31:For index 31, my x and y indices are=0 and 31 Job_Array_Test.o619717.39:For index 39, my x and y indices are=0 and 39 Job_Array_Test.o619717.40:For index 40, my x and y indices are=1 and 0 Job_Array_Test.o619717.41:For index 41, my x and y indices are=1 and 1 Job_Array_Test.o619717.49:For index 49, my x and y indices are=1 and 9 Job_Array_Test.o619717.77:For index 77, my x and y indices are=1 and 37 Job_Array_Test.o619717.78:For index 78, my x and y indices are=1 and 38 Job_Array_Test.o619717.79:For index 79, my x and y indices are=1 and 39 Job_Array_Test.o619717.80:For index 80, my x and y indices are=2 and 0 Job_Array_Test.o619717.81:For index 81, my x and y indices are=2 and 1 Job_Array_Test.o619717.89:For index 89, my x and y indices are=2 and 9
We hope that users who run a large number of jobs that only vary by a parameter or two will see the benefits of using PBS job arrays for themselves and also for the PBS system.