Using PBS Job Arrays
PBS Job Arrays allow a large collection of PBS runs to be executed in one script. These job arrays can significantly relieve the strain on the PBS queueing system. They also allow a much easier method to run a large number of jobs that only vary by one or two parameters. We highly recommend that you employ job arrays in cases of 200 or more PBS jobs that vary by only one parameter (or two or three).
Implementing PBS Job arrays in your PBS job script is very easy. The inclusion of the PBS directive #PBS -J n-m:step (where n is the starting index, m is the ending index, and the optional step is the step size) tells PBS that this script is a job array, and PBS queues it in FLOOR[(m-n)/step+1] instances. The environment variable $PBS_ARRAY_INDEX contains the current index instance your script is in. The command echo $PBS_ARRAY_INDEX outputs the current job index in which the script is executing (e.g., 7). When you submit a job array PBS job, the PBS job number has left and right brackets, [], appended to it. A job that would normally look something like "384294", would now look like "384294[]" in the PBS qstat command.
As an explicit example of the PBS job array directive, the inclusion of this PBS directive #PBS -J 1-999:2 into your PBS script causes PBS to run 500 instances ( FLOOR( (1000-2)/2+1) = 500 ) of your script. Each instance of your script has $PBS_ARRAY_INDEX uniquely set to one of 1, 3, 5, 7, ..., 999 when executed.
The following example PBS job array script runs five MPI jobs. Note the PBS job array directive on line 7: #PBS -J 0-12:3.
1 #!/bin/bash 2 #PBS -l select=1:ncpus=48:mpiprocs=48 3 #PBS -l walltime=00:20:00 4 #PBS -A Project_ID 5 #PBS -q debug 6 #PBS -N Job_Array_Test 7 #PBS -J 0-12:3 8 #PBS -j oe 9 #PBS -V 10 11 # 12 # PBS -J 0-12:3 signifies a job array from 0 to 12 in steps of 3. 13 # 14 15 echo "PBS Job Id PBS_JOBID is ${PBS_JOBID}" 16 17 echo "PBS job array index PBS_ARRAY_INDEX value is ${PBS_ARRAY_INDEX}" 18 19 # 20 # To isolate the job id number, cut on the character "[" instead of 21 # ".". PBS_JOBID might look like "48274[].server" rather "48274.server" 22 # in job arrays 23 # 24 JOBID=`echo ${PBS_JOBID} | cut -d'[' -f1` 25 26 cd $WORKDIR 27 28 # Make a directory of the main PBS JOBID 29 mkdir ${JOBID} 30 31 # go in said directory 32 cd ${JOBID} 33 34 # Make a subdirectory with the current PBS Job Array Index 35 mkdir ${PBS_ARRAY_INDEX} 36 37 # Make a variable that has full path to this run 38 # TMPD might look like this /p/work1/smith/392813/9/ 39 TMPD=${WORKDIR}/${JOBID}/${PBS_ARRAY_INDEX} 40 41 # copy executable or do a module load to get paths to executables 42 cp $WORKDIR/picalc.exe ${TMPD}/picalc.exe 43 44 # Though not used here, OPTIONAL_INPUT could hold various inputs 45 # that have different parameters set 46 OPTIONAL_INPUT=$WORKDIR/input.${PBS_ARRAY_INDEX} 47 48 # cd into directory that will contain all output 49 # from this PBS_ARRAY_INDEX run 50 cd ${TMPD} 51 52 # run job and redirect output 53 mpiexec_mpt -n 48 ./picalc.exe ${OPTIONAL_INPUT} >& output.o$JOBID 54 55 exit
After submitting the sample job array script, the following shows the output from the qstat -sw command (e.g., qstat -sw 468028[]) for a queued job array job.
Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 468028[].pbsser smith debug Job_Array_ -- 1 48 -- 00:20 Q --
The following shows the PBS output from the qstat -sw command (e.g., qstat -sw 468028[]) for a running job array. Note that PBS places a "B" rather than an "R" in the status column "S" for a running job array job.
Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 468028[].pbsser smith debug Job_Array_ -- 1 48 -- 00:20 B --
When this job array script is run, it executes five MPI jobs. Each PBS job array instance executes the submitted PBS script with the environment variable $PBS_ARRAY_INDEX set to 0, 3, 6, 9, or 12 for each PBS run script instance. Line 39 utilizes the $PBS_ARRAY_INDEX environment variable, along with the PBS job number environment variable, $JOB_ID, and $WORKDIR, to ultimately set $TMPD to a full directory path for each job array instance. The $TMPD variable holds paths that look like /p/work1/smith/476538/0, /p/work1/smith/476538/3, /p/work1/smith/476538/6, /p/work1/smith/476538/9, /p/work1/smith/476538/12 for each separate run instance. At line 42, the script copies the picalc.exe executable into that full path, changes directory on line 50 to that full path, and runs the picalc program. After the job has completed, manually executing ls in /p/work1/smith/476538 will show five directories: 0/, 12/, 3/, 6/, and 9/. Each directory will contain the picalc.exe executable and the output from that program, as follows:
-bash-4.2$ pwd /p/work1/smith/476538/6 -bash-4.2$ ls -F 0/ 12/ 3/ 6/ 9/ -bash-4.2$ cd 6 -bash-4.2$ ls -ltrFa -rwxr-----. 1 smith msrc 817032 Jul 23 10:59 picalc.exe* -rw-r--r--. 1 smith msrc 7326 Jul 23 10:59 output.o476538 drwxr-xr-x. 2 smith msrc 4096 Jul 23 10:59 ./ drwxr-xr-x. 7 smith msrc 4096 Jul 23 11:00 ../
The standard output/error from the job array always takes the form of <PBS job name from '-N'>/<PBS_JOBID>/<PBS_ARRAY_INDEX>. The files are usually copied back to where the user qsubbed the job array script. In this case, the job array was qsubbed from $WORKDIR (/p/work1/smith/) and appears as follows:
-bash-4.2$ ls -ltrF Job_Array_Test* -rw-------. 1 smith msrc 510 Jul 23 11:00 Job_Array_Test.o476538.9 -rw-------. 1 smith msrc 567 Jul 23 11:00 Job_Array_Test.o476538.6 -rw-------. 1 smith msrc 567 Jul 23 11:00 Job_Array_Test.o476538.3 -rw-------. 1 smith msrc 884 Jul 23 11:00 Job_Array_Test.o476538.0 -rw-------. 1 smith msrc 889 Jul 23 11:01 Job_Array_Test.o476538.12
It is important to note that indices might be executed "out of order". Array job 12 could run before array job 0. One last note: to see the current indices completed, use the following command: qstat -f 468477[] | grep array_indices, as follows:
[smith@gaffney06 smith]$ qstat -f 468477[] | grep array_indices array_indices_submitted = 0-12:3 array_indices_remaining = 12
Some users might want to utilize multiple varying indices. For example, you might want a series of jobs that cycle through a set of Cartesian coordinates. Although PBS only provides a single parameter index, you can partition this single index into as many indices as you would like. In this case we will use two indices: one index for x and one index for y. Shown below is how to map a PBS index of 1,200 values ranging from 0 to 1,199 into x and y indices varying from 0 to 29 for x and 0 to 39 for y.
1 #!/bin/bash 2 #PBS -l select=1:ncpus=48:mpiprocs=48 3 #PBS -l walltime=0:10:00 4 #PBS -A Project_ID 5 #PBS -q debug 6 #PBS -N Job_Array_Test 7 #PBS -J 0-1199 8 #PBS -j oe 9 #PBS -V 10 11 cd $WORKDIR 12 13 j=${PBS_ARRAY_INDEX} 14 (( y=j % 40 )) 15 (( x=j/40 )) 16 17 echo "For index $j, my x and y indices are=$x and $y"
Submitting the script above to PBS produces 1,200 files. Selected output from the first 90 files is shown below:
-bash-4.2$ grep 'For index' Job_Array_Test*
Job_Array_Test.o619717.0:For index 0, my x and y indices are=0 and 0
Job_Array_Test.o619717.1:For index 1, my x and y indices are=0 and 1
Job_Array_Test.o619717.2:For index 2, my x and y indices are=0 and 2
Job_Array_Test.o619717.20:For index 20, my x and y indices are=0 and 20
Job_Array_Test.o619717.30:For index 30, my x and y indices are=0 and 30
Job_Array_Test.o619717.31:For index 31, my x and y indices are=0 and 31
Job_Array_Test.o619717.39:For index 39, my x and y indices are=0 and 39
Job_Array_Test.o619717.40:For index 40, my x and y indices are=1 and 0
Job_Array_Test.o619717.41:For index 41, my x and y indices are=1 and 1
Job_Array_Test.o619717.49:For index 49, my x and y indices are=1 and 9
Job_Array_Test.o619717.77:For index 77, my x and y indices are=1 and 37
Job_Array_Test.o619717.78:For index 78, my x and y indices are=1 and 38
Job_Array_Test.o619717.79:For index 79, my x and y indices are=1 and 39
Job_Array_Test.o619717.80:For index 80, my x and y indices are=2 and 0
Job_Array_Test.o619717.81:For index 81, my x and y indices are=2 and 1
Job_Array_Test.o619717.89:For index 89, my x and y indices are=2 and 9
We hope users who run large numbers of jobs that only vary by one or two parameters will see the benefits for themselves and for the PBS system of using job arrays.