Raider Slurm Guide
Table of Contents
- 1. Introduction
- 1.1. Document Scope
- 2. Resources and Queue Information
- 2.1. Resource Summary
- 2.2. Node Information
- 2.3. Queue Information
- 3. Anatomy of a Batch Script
- 3.1. Specify Your Shell
- 3.2. Required Scheduler Directives
- 3.3. The Execution Block
- 3.4. Requesting Specialized Nodes
- 3.5. Advanced Considerations
- 3.6. Advance Reservation System Jobs
- 4. Submitting and Managing Your Job
- 4.1. Scheduler Dos and Don'ts
- 4.2. Job Management Commands
- 4.3. Job States
- 4.4. Baseline Configuration Common Commands
- 5. Optional Directives
- 5.1. Job Application Directive (Unsupported)
- 5.2. Job Name Directive
- 5.3. Job Reporting Directives
- 5.4. Job Environment Directives
- 5.5. Job Dependency Directives
- 5.6. Slurm Input Environment Variables
- 6. Job Arrays
- 7. Example Scripts
- 7.1. Simple Batch Script
- 7.2. Job Information Batch Script
- 7.3. OpenMP Script
- 7.4. Hybrid (MPI/OpenMP) Script
- 7.5. Accessing More Memory per Process
- 7.6. GPU Script
- 7.7. Data Transfer Script
- 7.8. Job Array Script
- 7.9. Large-Memory Node Script
- 8. Hello World Examples
- 8.1. C Program - hello.c
- 8.2. OpenMP - hello-OpenMP.c
- 8.3. Hybrid MPI/Open MP - hello-hybrid.c
- 8.4. Cuda - hello-cuda.cu
- 9. Batch Scheduler Rosetta
- 10. Glossary
1. Introduction
On large-scale computers, many users must share available resources. Because of this, you can't just log on to one of these systems, upload your programs, and start running them. Essentially, your programs must "get in line" and wait their turn, and there is more than one of these lines, or queues, from which to choose. Some queues have a higher priority than others (like the express checkout at the grocery store). The queues available to you are determined by the projects you are involved with.
To perform any task on the compute cluster, you must submit it as a "job" to a special piece of software called the scheduler or batch queueing system. At its most basic, a job can be a command non-interactively, but any command (or series of commands) you want to run on the system is called a job.
Before you can submit your job to the scheduler, you must describe it, usually in the form of a batch script. The batch script specifies the computing resources needed, identifies an application to be run (along with its input data and environment variables), and describes how best to deliver the output data.
The process of using a scheduler to run the job is called batch job submission. When you submit a job, it is placed in a queue with jobs from other users. The scheduler then manages which jobs run, where, and when. Without the scheduler users could overload the system, resulting in tremendous performance degradation for everyone. The queuing system runs your job as soon as it can do so while still honoring the following:
- Meeting your resource requests
- Not overloading the system
- Running higher priority jobs first
- Maximizing overall throughput
The process can be summarized as:
- Create a batch script.
- Submit a job.
- Monitor a job.
1.1. Document Scope
This document provides an overview and introduction to the use of the Slurm batch scheduler on the Penguin Computing TrueHPC (Raider) located at the AFRL DSRC. The intent of this guide is to provide information to enable the average user to submit jobs on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:
- Use of the Linux operating system
- Use of an editor (e.g., vi or emacs)
- Remote use of computer systems via network
- A selected programming language and its related tools and libraries
We suggest you review the Raider User Guide before using this guide.
2. Resources and Queue Information
2.1. Resource Summary
When working on an HPC system you must specify the resources your job needs to run. This lets the scheduler find the right time and place to schedule your job. Strict adherence to resource requests allows Slurm to find the best possible place for your jobs and ensures no user can use more resources than they've been given. You should always try to specify resource limits that are close to but greater than your requirements so your job can be scheduled more quickly. This is because Slurm must wait until the requested resources are available before it can run your job. You cannot request more resources than are available on the system, and you cannot use more resources than you request. If you do, your job may be rejected, fail, or remain indefinitely in the queue.
Raider is a batch-scheduled Batch-scheduled - users request compute nodes via commands to batch scheduler software and wait in a queue until the requested nodes become available HPC system with numerous nodes. All jobs that require large amounts of system resources must be submitted as a batch script Batch Script - A script that provides resource requirements and commands for the job.. As discussed in Section 3, scripts are used to submit a series of directives that define the resources required by your job. The most basic resources include time, nodes, and memory.
2.2. Node Information
Below is a summary of node types available on Raider. Refer to the Raider User Guide for in-depth information.
- Login nodes - Access points for submitting jobs on Raider. Login nodes are intended for basic tasks such as uploading data, managing files, compiling software, editing scripts, and checking on or managing your jobs. DO NOT run your computations on the login nodes.
- Compute nodes - Node types such as "Standard ", "Large-Memory", "GPU", etc.
are considered compute nodes. Compute nodes can include:
- Standard nodes - The compute node type that is standard on Raider.
- Large-Memory Nodes - Large-memory nodes have more memory than standard nodes and are intended for jobs that require a large amount of memory.
- GPU nodes - GPU nodes are specialized accelerated compute nodes with additional hardware to speed up work, often with parallel processing that bundles frequently occurring tasks.
- Machine Learning Accelerator (MLA) nodes - MLA nodes are specialized GPU nodes intended for machine learning and other compute-intensive applications. There is no significant difference between MLA and GPU nodes.
- Visualization nodes - Visualization nodes are GPU nodes intended for visualization applications.
- High Clock nodes- Nodes have the benefit of higher clock speeds but the drawback of lower thread size and cache size.
- Transfer nodes - Nodes exist to help users conserve allocation when transferring data.
A summary of the node configuration on Raider is presented in the following table.
Login | Login-viz | Standard | Large-Memory | Visualization | MLA | High Clock | Transfer | |
---|---|---|---|---|---|---|---|---|
Total Nodes | 6 | 4 | 1,480 | 8 | 24 | 32 | 64 | 2 |
Processor | AMD 7713 Milan | AMD 7713 Milan | AMD 7713 Milan | AMD 7713 Milan | AMD 7713 Milan | AMD 7713 Milan | AMD 73F3 Milan | AMD 7713 Milan |
Processor Speed | 2.0 GHz | 2.0 GHz | 2.0 GHz | 2.0 GHz | 2.0 GHz | 2.0 GHz | 3.4 GHz | 2.0 GHz |
Sockets / Node | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Cores / Node | 128 | 128 | 128 | 128 | 128 | 128 | 32 | 128 |
Total CPU Cores | 768 | 512 | 189,440 | 1,024 | 3,072 | 4,096 | 2,048 | 256 |
Usable Memory / Node | 503 GB | 503 GB | 251 GB | 2.0 TB | 503 GB | 503 GB | 503 GB | 503 GB |
Accelerators / Node | 1 | 1 | None | None | 1 | 4 | None | None |
Accelerator | NVIDIA A40 PCIe 4 | NVIDIA A100 SXM 4 | N/A | N/A | NVIDIA A40 PCIe 4 | NVIDIA A100 SXM 4 | N/A | N/A |
Memory / Accelerator | 45 GB | 40 GB | N/A | N/A | 45 GB | 40 GB | N/A | N/A |
Storage on Node | 960 GB NVMe SSD | 960 GB NVMe SSD | 1.91 TB NVMe SSD | 7.68 TB NVMe SSD | None | 3.84 TB NVMe SSD | None | None |
Interconnect | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand | HDR InfiniBand |
Operating System | RHEL | RHEL | RHEL | RHEL | RHEL | RHEL | RHEL | RHEL |
2.3. Queue Information
Queues are where your jobs run. Think of queues as a resource used to control how your job is placed on the available hardware. Queues address hardware considerations and define policies such as what type of jobs can run in the queues, how long your job can run, how much memory your job can use, etc. Every queue has its own limits, behavior, and default values.
On a first come first serve basis, the scheduler checks whether the resources are available for the first job in the queue. If so, the job is executed without further delay. But if not, the scheduler goes through the rest of the queue to check whether another job can be executed without extending the waiting time of the first job in queue. If it finds such a job, the scheduler backfills the job. Backfill scheduling allows out-of-order jobs to use the reserved job slots if these jobs do not delay the start of another job. Therefore, smaller jobs (i.e., jobs needing only a few resources) usually encounter short queue times.
On Raider, quality of service is used to define scheduling priority and job limits. Your queue options are determined by your projects. Most users have access to the debug, standard, background, transfer, and HPC Interactive Environment (HIE) queues. Other queues exist, but access to these queues is restricted to projects that are granted special privileges due to urgency or importance, and they are not discussed here. To see the list of queues available on the system, use the squeue command. Use the squeue -l --qos queue command to get full details about a specific queue.
Standard Queue
As its name suggests, the standard queue is the
most common queue and should be used for normal day-to-day jobs.
Debug Queue
When determining why your job is failing, it is very
helpful to use the debug queue. It is restricted to user testing and debugging
jobs and has a maximum walltime of one hour. Because of the resource and time
limits, jobs progress through the debug queue more quickly, so you don't have to
wait many hours to get results.
Background Queue
The background queue is a bit special. Although
it has the lowest priority, jobs in this queue are not charged against
your project allocation. You may choose to run in the background queue for
several reasons:
- You don't care how long it takes for your job to begin running.
- You are trying to conserve your allocation.
- You have used up your allocation.
Transfer Queue
The transfer queue exists to help users conserve
allocation when transferring data to and from Raider from within batch scripts.
It has a wall clock limit of 24 hours and jobs run in this queue will not charge
to a user’s allocation. It supports all environment variables defined by
BC policy FY05-04
(Environment Variables), including those referring to storage locations.
Note: Users who require more walltime for a transfer job should contact the HPC Help Desk for assistance.
Note: Currently, Raider does not have a maximum walltime configured for the transfer queue. This will change in the near future. In the interim, users are asked to respect the requested limits and not abuse the transfer queue.
Users can submit batch scripts in this queue to move data between various storage areas, file systems, or other systems. The following storage areas are accessible from the transfer queue:
- $WORKDIR - Your temporary work directory on Raider
- $CENTER - Your directory on the Center Wide File System (CWFS)
- $ARCHIVE_HOME - Your directory on the mass storage archival system (MSAS) at AFRL
- $HOME - Your home directory
HPC Interactive Environment (HIE) Queue
The HIE is both a queue
configuration and a computing environment intended to deliver rapid response
and high availability to support the following services:
- Remote visualization
- Application development for GPU-accelerated applications
- Application development for other non-standard processors on a particular system
There is a very limited number of nodes available to the HIE queue, and they should be reserved for appropriate use cases. The use of the HIE queue for regular batch processing is considered abuse and is closely monitored. The HIE queue should not be used simply as a mechanism to give your regular batch jobs higher priority. Refer to the HIE User Guide for more information.
Priority Queues
The HPCMP has designated three restricted queues
that require special permission for job submission. If your project
is not authorized to submit jobs to these queues, your submission will fail.
These queues include:
- Urgent queue - Jobs belonging to DoD HPCMP Urgent Projects
- High queue - Specific for Jobs belonging to DoD HPCMP High Priority Projects
- Frontier queue - Specific for jobs belonging to DoD HPCMP Frontier Projects
The following table describes the Slurm queues available on Raider:
Priority | Queue Name | Max Wall Clock Time | Max Cores Per Job | Max Queued Per User | Max Running Per User | Description |
---|---|---|---|---|---|---|
Highest | urgent | 168 Hours | 92,160 | N/A | N/A | Jobs belonging to DoD HPCMP Urgent Projects |
debug | 1 Hour | 3,840 | 15 | 4 | Time/resource-limited for user testing and debug purposes | |
high | 168 Hours | 92,160 | N/A | N/A | Jobs belonging to DoD HPCMP High Priority Projects | |
frontier | 168 Hours | 92,160 | N/A | N/A | Jobs belonging to DoD HPCMP Frontier Projects | |
standard | 168 Hours | 92,160 | N/A | N/A | Standard jobs | |
HIE | 24 Hours | 256 | 2 | 2 | Rapid response for interactive work. For more information see the HPC Interactive Environment (HIE) User Guide. | |
transfer | 48 Hours | 1 | N/A | 12 | Data transfer for user jobs. Not charged against project allocation. See the AFRL DSRC Archive Guide, section 5.2. | |
Lowest | background | 24 Hours | 3,840 | 35 | 10 | User jobs that are not charged against the project allocation |
* The running job limit on the debug queue per user is 4.
** The running job limit on the background queue per user is 10.
3. Anatomy of a Batch Script
The Slurm scheduler is currently running on Raider. It schedules jobs, manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch script. Slurm can manage both single-processor and multiprocessor jobs. The appropriate module is automatically loaded for you when you log in. This section is a brief introduction to Slurm. More advanced topics are discussed later in this document.
Batch Script Life Cycle
Let's start with what happens in the typical
life cycle of a batch script, where an application is run in a batch submission:
- The user submits a batch script, which is put into the queue.
- Once the resources are allocated, the scheduler executes the batch script on one node, and the script has access to the typical environment variables the scheduler defines.
- The executable command in the script is encountered and executed. If using a launch command, the launch command examines the scheduler environment variables to determine the node list in the allocation, as well as parameters, such as the number of total processes, and launches the required number of processes.
- Once the executing process(es) have terminated, the batch script moves to the next line of execution or terminates if there are no more lines.
Batch Script Anatomy
A batch script is a small text file
created with a text editor such as vi or notepad. Although the specifics
of batch scripts may differ slightly from system to system, a basic set of
components are always required, and a few components are just always good ideas.
The basic components of a simple batch script must appear in the following order:
- Specify Your Shell
- Scheduler Directives
- Required Directives
- Optional Directives
- The Execution Block
To simplify things, several template scripts are included in Section 7, where you can fill in required commands and resources.
Cautions About Special Characters
Some special characters are not handled well by schedulers. This is especially true of the following:
- ^M characters - Scripts created on a MS Windows system, which usually contain ^M characters, should be converted with dos2unix before use.
- Smart quotes - MS Word autocorrects normal straight single and double quotation marks into "smart quotes." Ensure your script only uses normal straight quotation marks.
- Em dash, en dash, and hyphens - MS Word often autocorrects regular hyphens into em dash or en dash characters. Ensure your script only uses normal hyphens.
- Tab characters - many editors insert tabs instead of spaces for various reasons. Ensure your script does not contain tabs.
3.1. Specify Your Shell
Your batch script is a shell script. So, it's good practice to specify which
shell your script is written in for execution. If you do not specify your shell
within the script, the scheduler uses your default login shell. To tell
the scheduler which shell to use, the first line of your script should be:
#!/bin/shell
where shell is either bash (Bourne-Again Shell), sh (Bourne Shell), ksh
(korn shell), csh (C shell), tcsh (enhanced C shell), or zsh (Z shell).
3.2. Required Scheduler Directives
After specifying the script shell, the next section of the script sets the scheduler directives, which define your resource requests to the scheduler. These include how many nodes are needed, how many cores per node, what queue the job will run in, and how long these resources are required (walltime).
Directives are a special form of comment, beginning with #SBATCH. As you might suspect, the # character tells the shell to ignore the line, but the scheduler reads these lines and uses the directives to set various values. IMPORTANT!! All directives MUST come before the first line of executable code in your script, otherwise they are ignored.
The scheduler has numerous directives to assist you in setting up how your job will run on the system. Some directives are required. Others are optional. Required directives specify resources needed to run the application. If your script does not define these directives, your job will be rejected by the scheduler or use center-defined defaults. Caution: default values may not be in line with your job requirements and may vary by center. Optional directives are discussed in Section 5.
To schedule your job, the scheduler must know:
- The queue to run your job in.
- The maximum time needed for your job.
- The Project ID to charge for your job.
- The number of nodes you are requesting.
- The number of processes per node you are requesting.
- The number of cores per node.
- The total number of cores
- How nodes should/can be allocated.
3.2.1. Specifying the Queue
You must choose which queue you want your job to run in. Each queue has different
priorities and limits and may target different node types with different hardware
resources. To specify the queue, include the following directive:
#SBATCH --qos=queue_name
or
#SBATCH -q queue_name
3.2.2. How Long to Run
Next, the scheduler needs the maximum time you expect your job to run. This is referred to as walltime, as in clock on the wall. The walltime helps the scheduler identify appropriate run windows for your job. For accounting purposes, your allocation is charged for how long your job actually runs, which is typically less than the requested walltime.
In estimating your walltime, there are three things to keep in mind.
- Your estimate is a limit. If your job hasn't completed within your estimate, it is terminated. So, you should always add a buffer to account for variability in run time because you don't want your job to be killed when it is 99.9% complete. And, if your job is terminated, your account is still charged for the time.
- Your estimate affects how long your job waits in the queue. In general, shorter jobs run before longer jobs. If you specify a time that is too long, your job will likely sit in the queue longer than it should.
- Each queue has a maximum time limit. You cannot request more time than the queue allows.
To specify your walltime, include the following directive:
#SBATCH --time=DD-HH:MM:SS
or
#SBATCH -t DD-HH:MM:SS
3.2.3. Your Project ID
The scheduler needs to know which project ID to charge for your job. You can use the show_usage command to find the projects available to you and their associated project IDs. In the show_usage output, project IDs appear in the column labeled "Subproject."
Note: If you have access to multiple projects, remember the project you specify may limit your choice of queues.
To specify the project ID for your job, include the following directive:
#SBATCH --account=Project_ID
or
#SBATCH -A Project_ID
3.2.4. Number of Nodes, Processes, and Cores
There are two types of computational resources: hardware (compute nodes and cores) and virtual (processes). A node is a computer system with a single operating system image, a unified memory space, and one or more cores. Every script must include directives for the node, process, and task selection. Nodes are allocated exclusively to your job and not shared with other users.
Before Slurm can run your job, it needs to know how many nodes you want, the total number of tasks (processes), and the number of tasks per node. In general, you would specify one task per core, but you might want fewer tasks depending on the programming model you are using. See Example Scripts (below) for alternate use cases.
The number of nodes, the number of tasks, and the number of tasks per node are
specified using the directives:
#SBATCH --nodes=N1
#SBATCH --ntasks=N2
#SBATCH --ntasks-per-node=N3
or
#SBATCH -N N1
#SBATCH -n N2
#SBATCH --ntasks-per-node=N3
where N1 specifies the number of nodes you are requesting, N2
is the number of tasks, and N3 is the number of tasks per node.
Generally, you only need to use any two of these three directives. For example,
you could specify the total number of nodes and total tasks and let Slurm decide
the number of tasks per node. In this case the directives would be:
#SBATCH --nodes=N1
#SBATCH --ntasks=N2
where N1 is the number of nodes you are requesting and N2 is the
total number of tasks.
In general, the --ntasks-per-node default is the total number of cores on the node, but there may be situations where you might want to specify a lower value. If you are porting a PBS script to Slurm, using --nodes and --ntasks-per-node is the simplest conversion for the select and mpiprocs values.
3.2.5. SLB Directives
The Shared License Buffer (SLB) regulates shared license usage across all
HPC systems by granting and enforcing license reservations for certain commercial
software packages. If your job requires enterprise licenses controlled by
SLB, you must enter the software and requested number of licenses, using the
following directive:
#SBATCH --licenses=software:number_of_licenses
or
#SBATCH -L software:number_of_licenses
To request licenses for multiple applications, separate them by commas, as follows:
#SBATCH -L software:number_of_licenses,software:number_of_licenses
For more information about SLB, please see the SLB User Guide.
3.3. The Execution Block
After the directives have been supplied, the execution block begins. The execution block is the section of your script containing the actual work to be done. This includes any modules to be loaded and commands to be executed. This could also include executing or sourcing other scripts.
3.3.1. Basic Execution Scheme
The following describes the most basic scheme for a batch script. PLEASE ADOPT THIS BASIC EXECUTION SCHEME IN YOUR OWN BATCH SCRIPTS.
Setup
- Set environment variables, load modules, create directories, transfer input files.
- Changing to the right directory - By default Slurm runs your job in the directory from which it is submitted, which can cause problems. To avoid this, cd into your $WORKDIR directory to run it on the local high-speed disk.
Launching the executable
- Launch your executable using the launch command on Raider specific to your programming model.
Cleaning up
- Archive your results and remove temporary files and directories.
- Copy any necessary files to your home directory.
3.3.1.1. Setup
Using the batch script to set up your environment ensures your script runs in an automatic and consistent manner, but not all environment-setup tasks can be accomplished via scheduler directives, so you may have to set some environment variables yourself. Remember that commands to set up the environment must come after the scheduler directives. For MPI jobs, each MPI process is separate and inherits the environment set up by the batch script.
As part of the Baseline Configuration (BC) initiative, there is a common set of environment variables on all HPCMP allocated systems. These variables are predefined in your login, batch, and compute environments, making them automatically available at each center. We encourage you to use these variables in your scripts where possible. Doing so helps to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems within the HPCMP. Some BC environment variables are shown in the table below.
Variable | Description |
---|---|
$WORKDIR | Your work directory on the local temporary file system (i.e., local high-speed disk). $WORKDIR is visible to both the login and compute nodes and should be used for temporary storage of active data related to your batch jobs. |
$CENTER | Your directory on the Center-Wide File System (CWFS). |
$ARCHIVE_HOME | This is your directory on the archival file system that serves a given compute platform. |
The complete list of BC environment variables is available in BC Policy FY05-04.
Setup considerations in customizing your batch job may include:
- Creating a directory in $WORKDIR for your job run in.
NEWDIR=$WORKDIR/MyDir mkdir -p $NEWDIR
- Changing to the directory from which the job will run.
cd $NEWDIR
- Copying required input files to the job directory.
cp From_directory/file $NEWDIR
- Ensuring required modules are loaded.
module load module_name
3.3.1.2. Launching an Executable
The command you'll use to launch a parallel executable within a batch script depends on the parallel library loaded at compile and execution time, the programming model, and the machine used to launch the application. It does not depend on the scheduler. Launch commands on Raider are discussed in detail in the Raider User Guide.
On Raider, the mpiexec command is used to launch a parallel
executable. The basic syntax for launching an MPI executable is:
mpiexec args executable pgmargs
where args are command-line arguments for mpiexec, executable is the name of an executable program, and pgmargs are command-line arguments for the executable.
3.3.1.3. Cleaning Up
You are responsible for cleaning up and monitoring your workspace. The clean-up process generally entails deleting unneeded files and transferring important data left in the job directory after the job is completed. It is important to remember that $WORKDIR is a "scratch" file system and is not backed up. Currently, $WORKDIR files older than 30 days are subject to being purged. If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, you WILL be notified prior to deletion. Similarly, files transferred to $CENTER are not backed up, and files older than 120 days are subject to being purged. To prevent automatic deletion by the purge scripts, important data should be archived. See the AFRL DSRC Archive Guide for more information on archiving data.
3.3.2. Advanced Execution Methods
A batch script is a text file containing directives and execution steps you "submit" to Slurm. This script can be as simple as the basic execution scheme discussed above or include more complex customizations, such as compiling within the script or loading a file with a list of modules required for the executable. Below are additional considerations for the execution block of the batch script.
3.3.2.1. Environment Variables set by the Scheduler
In addition to environment variables inherited from your user environment (see Section 3.3.1.1), Slurm sets other environment variables for batch jobs. The following table contains commonly used Slurm environment variables.
Variable | Description |
---|---|
$SLURM_JOB_ACCOUNT | The Project ID charged for the job |
$SLURM_JOB_ID | The job identifier assigned to a job or job array by the batch system |
$SLURM_JOBID (deprecated) | Identical to $SLURM_JOB_ID. Included for backwards compatibility |
$SLURM_SUBMIT_DIR | The absolute path of directory where the job was submitted |
$SLURM_JOB_NAME | The job name supplied by the user |
$SLURM_ JOB_PARTITION | The partition in which the job executes |
$SLURM_JOB_QOS | The Quality of Service (QOS) i.e., job queue, of the job |
$SLURM_SUBMIT_HOST | The hostname of the node from which sbatch was executed |
$SLURM_NTASKS | The total number of cores used in a job |
$SLURM_JOB_NODE_LIST | The list of nodes allocated to the job |
$SLURM_JOB_NUM_NODES | The total number of nodes allocated to the job |
$SLURM_JOB_ARRAY_ID | The job ID for a job array |
$SLURM_ARRAY_TASK_ID | The index number for a sub job in a job array |
$SLURM_ARRAY_TASK_COUNT | Total number of tasks in a job array |
$SLURM_ARRAY_TASK_MAX | A job array's maximum index number |
$SLURM_ARRAY_TASK_MIN | A job array's minimum index number |
$SLURM_ARRAY_TASK_STEP | A job array's index step size |
See the sbatch man page for a complete list of environment variables set by Slurm |
Baseline Configuration Policy (BC Policy FY05-04) defines an additional set of environment variables with related functionality available on all systems. These variables can also be found in the Raider User Guide.
3.3.2.2. Loading Modules
Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so commands for applications can be found. For a full discussion on modules see the Raider User Guide and the AFRL DSRC Modules Guide.
To ensure required modules are loaded at runtime, you can load them within
the batch script before the executable code by using the command:
module load module_name
3.3.2.3. Compiling on the Compute Nodes
You can compile on the compute nodes, either interactively or within a job script. On most systems this is the same as compiling on the login nodes, though in some cases there are differences between the login and compute nodes. See the Raider User Guide for more information.
3.3.2.4. Using the Transfer Nodes
Unlike PBS and LSF, for transfer jobs, Slurm allows you to select either a transfer node or the transfer queue or both, resulting in a job running on a transfer node in the transfer queue. Before a job can run, the input data needs to be copied into a directory accessible by the job script. This can be done in a separate job script using the transfer node. Because jobs on a transfer node cost no allocation, the transfer node is advantageous for large file transfers such as during data staging or cleanup to move data left in your $WORKDIR after your application completes.
When using a transfer node or the transfer queue, keep in mind:
- The transfer node may have additional bandwidth for data transfers.
- You share the node with other users, so your available compute and memory is likely lower.
- Your allocation is not charged when using a transfer node.
See Example Scripts for an example for using the transfer nodes.
3.4. Requesting Specialized Nodes
Node types are selected by specifying the following node features: standard,
viz, mla, xfer, bigmem,
highclock. This is done using the --prefer (soft)
or -C or --constraint (hard) options. The soft
option asks Slurm to provide the node type if available but allows for another
node type if not available. For example:
--prefer=viz
The hard option requires Slurm to wait in the queue until the exact request can be met. Examples of the hard option are provided below.
The standard node is selected by default. There is no need to specify a standard node unless as part of a heterogeneous request.
You may also specify GPU hardware through the --gres option as discussed below.
Note: Slurm offers short versions (-C) and long versions (--constraint) of many options.
3.4.1. GPU Nodes
The graphics processing unit, or GPU, has become one of the most important types of computing technology. The GPU is made up of many synchronized cores working together for specialized tasks.
GPU nodes must be requested as viz or mla nodes
(see respective sections below). You may also specify the GPU hardware
(i.e., a100 or a40) using the --gres directive, as
follows:
#SBATCH --gres=gpu:a100:4
#SBATCH --gres=gpu:a40:1
If you don't have a preference, simply ask for a number of GPUs:
#SBATCH --gres=gpu:2
Note that if you use --gres, it must be done in tandem with the
viz or mla node options below.
3.4.2. Visualization Nodes
Visualization nodes are GPU nodes with specialized hardware or software
to support visualization tasks. To request a visualization node, add the
--constraint=viz directive, as follows:
#SBATCH --constraint=viz
#SBATCH -C viz
3.4.3. Machine Learning Accelerator (MLA) Nodes
MLA nodes are GPU nodes with specialized hardware and software to support
machine-learning tasks. To request an MLA node, use the directive:
#SBATCH --constraint=mla
or
#SBATCH -C mla
3.4.4. High Clock Nodes
High Clock nodes have the benefit of higher clock speeds, but the drawback
of lower thread size and cache size. To request high clock nodes, use the following
directive.
#SBATCH --constraint=highclock
or
#SBATCH -C highclock
3.4.5. Transfer Nodes
Transfer nodes exist to help users conserve allocation when transferring data.
To request transfer nodes, use the following directive.
#SBATCH --constraint=xfer
or
#SBATCH -C xfer
3.5. Advanced Considerations
3.5.1. Heterogeneous Computing (Using Multiple Node Types) and Node Distribution
Heterogeneous computing refers to using more than one type of node, such as CPU, GPU, or large-memory nodes. By assigning different workloads to specialized nodes suited for diverse purposes, performance and energy efficiency can be vastly improved. Node distribution refers to assigning tasks to groups or chunks of nodes. This section discusses how to schedule different node types and organize groups of nodes (heterogeneous or homogeneous) so they can be assigned different tasks.
On Raider, heterogeneous nodes are selected using the --constraint
or -C directive, followed by the constraint format:
"[type*number&type*number...]". For
example, to select three mla nodes, one viz node,
two bigmem nodes, and 94 standard nodes,
use the following:
#SBATCH --nodes=100 --constraint=[mla*3&viz*1&bigmem*2&standard*94]
The distribution of tasks to the nodes and cores on those nodes can be controlled using the --distribution or -m directive, which has the following options:
--distribution=*|block|cyclic|arbitrary|plane=size [:*|block|cyclic|fcyclic[:*|block|cyclic|fcyclic]] [,Pack|NoPack]
The first distribution method (before the first ":") controls the distribution of tasks to nodes. The second distribution method controls the distribution tasks across sockets. The third controls the distribution of tasks across cores. The second and third distributions apply only if task pinning Pinning - Pinning threads for shared-memory parallelism or binding processes for distributed-memory parallelism is an advanced way to control how your system distributes the threads or processes across the available cores. is enabled.
The following table describes the distribution options:
Variable | Description |
---|---|
block | Distributes tasks to a node such that consecutive tasks share a node. This is the default distribution method |
cyclic | Distributes tasks to a node such that consecutive tasks are distributed over consecutive nodes (in a round-robin fashion). |
plane | The tasks are distributed in blocks of size |
arbitrary | Processes are allocated in the order as listed in the file designated by the environment variable $SLURM_HOSTFILE. If this variable is listed, it overrides any other method specified. If not set, the method defaults to block. |
fcyclic | Distributes the tasks to consecutive sockets in a round-robin fashion across the sockets. Tasks requiring more than one core have each allocated in a cyclic fashion across sockets. |
pack | Rather than evenly distributing a job step's tasks evenly across its allocated nodes, pack them as tightly as possible on the nodes. This only applies when the "block" task distribution method is used. |
noPack | Rather than packing a job step's tasks as tightly as possible on the nodes, distribute them evenly. |
3.6. Advance Reservation System Jobs
The Advance Reservation Service (ARS) provides a web-based interface to batch schedulers on most allocated HPC resources in the HPCMP. This service allows allocated users to reserve resources for use at specific times and for specific durations. It works in tandem with selected schedulers to allow restricted access to those reserved resources.
Note: Raider is unavailable on ARS at this time. The AFRL DSRC will notify users when they may use ARS to schedule reservations on Raider.
For Advance Reservation System (ARS) jobs you must submit a reservation request. Upon successful completion of the reservation request, a confirmation page is presented, and an email is sent notifying you of all pertinent data concerning the reservation, including the ARS_ID. It is your responsibility to either use or cancel your reservation. Unless you cancel it, your allocation is charged for the full time on the reserved nodes whether you use them or not. For information, such as how to cancel a reservation, see the ARS User Guide.
To use the reserved nodes, you must log onto the selected system and submit
a job specifying the ARS_ID, as follows:
#SBATCH --reservation=ARS_ID
4. Submitting and Managing Your Job
Once your batch script is ready, you need to submit it to the scheduler
for execution, and the scheduler will generate a job according to the parameters
set in the script. Submitting a batch script can be done with the sbatch
command:
sbatch batch-script-name
Because batch scripts specify the resources for your job, you won't need
to specify any resources on the command line. However, you can override
or add any job parameter by providing the specific resource as a flag
directly on the sbatch command line. Directives supplied in
this way override the same directives if they are already included in your
script. The syntax to supply directives on the command line is the same
as within a script except #SBATCH is not used. For
example, to override the time use:
sbatch --tint=days-hh:mm:ss batch-script-name
4.1. Scheduler Dos and Don'ts
When submitting your job, it's important to keep in mind these general guidelines:
- Request only the resources you need.
- Be aware of limits. If you request more resources than the hardware can offer, the scheduler might not reject the job, and it may be stuck in the queue forever.
- Be aware of the available memory limit. In general, the available memory per core is (memory_per_node)/(cores_in_use_on_the_node).
- The scheduler might not support pinning, so you might want to do this manually.
- There may be per-user quotas on the system.
You should also keep in mind that Raider is a shared resource. Behavior that negatively impacts other users or stresses the system administrators is not desirable. Below are some suggestions to be followed for a happy HPC community.
- Submitting 1000 jobs to perform 1000 tasks is naïve and can overload the scheduler. If these tasks are serial, it also wastes your allocation hours across 1000 nodes. Job arrays are strongly encouraged, see Section 6.
- If you expect your job to run for several days, split it into smaller jobs. You'll get reduced queue time and increased stability (e.g., against node failure). You can either split your job manually and submit as separate jobs or submit your jobs sequentially within a single script as described in the AFRL DSRC Archive Guide.
- Send your job to the right queue. It is important to understand in which queue the scheduler will run your job as most queues have core and walltime limits.
- Do not run compute-intensive tasks from a login node. Doing so slows the login nodes, causing login delays for other users and may prompt administrators to terminate your tasks, often without notice.
4.2. Job Management Commands
Once you submit your job, there are commands available to check and manage your job submission. For example:
- Determining the status of your job
- Cancelling your job
- Putting your job on hold
- Releasing a job from hold
The table below contains commands for managing your jobs. Use man command or the command --help option to get more information about a command.
Command | Description |
---|---|
sacct -j job_id -l | Display job accounting data from a completed job. |
sbatch script_file | Submit a job. |
sinfo -o "%20N %16F %8c %9m %15f %17G" | Display a list of available resources. |
scancel job_id | Delete a job. |
scontrol hold job_id | Place a job on hold. |
scontrol release job_id | Release a job from hold. |
sinfo | Reports the state of queues and nodes managed by Slurm. |
sinfo -N or sinfo --Node |
Display a list of nodes |
scontrol show nodes | Display a list of nodes with detailed node information |
squeue | Display the list of all jobs across all queues. |
sacctmgr show qos format="name%-14,Description%-20,priority,maxwall" | Display a neatly formatted list of queues |
squeue -j job_id | Check the status of a job. |
squeue -u user_name or squeue --user=user_name |
Check the status of all jobs submitted by the user. Can use $USER in place of user_name. |
squeue --format "specs" | Display custom job information. For example, to display an output similar
to PBS qstat output (i.e., account, user, queue, job name,
job status, job execution time) use: squeue --format"%.10a %.10u %.10q %.12j %.4t %M" |
sstat job_id | Display information about the resources utilized by a running job or job step. Can only sstat your jobs. |
4.3. Job States
When checking the status of a job, the state of a job is listed. Jobs typically pass through several states during their execution. The main job cycle states are QUEUED/PENDING, RUNNING, SUSPENDED, COMPLETING, and COMPLETED. An explanation of each state follows.
Command | Description |
---|---|
PD | The job is queued, eligible to run, or routed. |
R | The job is running. |
S | Job has an allocation, but execution has been suspended, and CPUs have been released for other jobs. |
CP | The job is in the process of completing. |
CD | The job is completed with an exit code of zero. |
CA | The job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated. |
Slurm has an extensive list of states. See man squeue for a complete list. |
4.4. Baseline Configuration Common Commands
The Baseline Configuration Team (BCT) has established the following set of common commands that are consistent across all systems. Most are custom and not inherent in the scheduler.
Command | Description |
---|---|
bcmodule | Executes like the standard module command but has numerous improvements
and new features.
Note: the utility is located in /p/app/BCT_module, and there is currently no system-wide alias set. Users may need to create an alias for bcmodule in their personal shell configuration file, located in their $HOME directory. |
check_license | Checks the status of HPCMP shared applications grouped into two distinct categories: Software License Buffer (SLB) applications and non-SLB applications. |
cqstat | Displays information about jobs in the batch queueing system. |
node_use | Displays memory-use and load-average information for all login nodes of the system on which it is executed. |
qflag | Collects information about the user and the user's jobs and sends a message about the reported problem without any need to leave the HPC system. |
qhist | Prints a full report on a running or completed batch job with an option to include chronological log file entries for the job from the batch queueing system. The command can also list all completed jobs for a given user over a specified number of days in the past. |
qpeek | Returns the standard output (stdout) and standard error (stderr) messages for any submitted batch job from the start of execution until the job run is complete. |
qview | Displays various reports about jobs in the batch queuing system. |
show_queues | Displays current batch queuing system information. |
show_storage | Produces two reports on quota and usage information. |
show_usage | Produces two reports on the allocation and usage status of each subproject under which a user may compute. |
5. Optional Directives
In addition to the required directives mentioned above, Slurm has many other directives, but most users only use a few of them. Some of the more useful optional directives are summarized below.
5.1. Job Application Directive (Unsupported)
The application directive allows you to identify the application being used by your job. This directive is used for HPCMP accountability and administrative purposes and helps the HPCMP accurately assess application usage and ensure adequate software licenses and appropriate software are purchased. While not required, using this directive is strongly encouraged as it provides valuable data to the HPCMP regarding application use.
Slurm currently does not support an application directive.
5.2. Job Name Directive
The job_name directive allows you to give your job a name that's
easier to remember than a numeric job ID. The Slurm environment variable,
$SLURM_JOB_NAME, inherits this value and can be used instead of
the job ID to create job-specific output directories. To use this directive,
add the following to your batch script:
#SBATCH --job-name=job_name
or
#SBATCH -J job_name
or, to your sbatch command
sbatch --job-name=job_name ...
5.3. Job Reporting Directives
Job reporting directives allow you to control what happens to standard output and standard error messages generated by your script. They also allow you to specify e-mail options to be executed at the beginning and end of your job. The following table and sections describe the job reporting directives:
Directive | Description |
---|---|
-e filename, or --error=filename |
Redirect standard error (stderr) to the named file. |
-o filename, or --output=filename |
Redirect standard output (stdout) to the named file. |
--open-mode=mode_to_open | Specifies whether output and error files are appended or overwritten. A value of append adds the output to the file. A value of truncate overwrites the file if it exists. |
--mail-user=user_name | Linux user name to notify about state changes as defined by --mail-type. The default value is the submitting user. |
--mail-type=event | Send email when the job BEGIN, END, FAIL, ALL, TIME_LIMIT |
--mail-type=END | Send email when the job ends. |
--mail-type=BEGIN, END | Send email when the job begins and ends. |
5.3.1. Redirecting stderr and stdout
By default, messages written to stdout and stderr are combined and written
to a single file with a default filename slurm-%j.out, where
%j is replaced by the job id. If you want to change this behavior,
the -o or --output and -e or --error
directives allow you to redirect stdout and stderr messages to different named
files. To instruct Slurm to write stdout to a specific file, use the directive:
#SBATCH --output filename.out
or
#SBATCH -o filename.out
To instruct Slurm to write to the stderr to a specific file, use the directive:
#SBATCH --error filename.err
or
#SBATCH -e filename.err
To instruct Slurm to write to a single file, just specify stdout.
By default, Slurm overwrites output and error files. To append instead of overwriting,
use the following directive:
--open-mode=append
5.3.2. Setting up E-mail Alerts
Note: Email alert functionality has not yet been enabled on Raider; however, we expect the feature to be made available to users soon. The HPC Help Desk will advise users when they may set up email alerts in their Slurm job submissions.
Mail is sent to the email address associated with your pIE account.
Many users want to be notified when their jobs begin and end. The --mail-user
and --mail-type directives make this possible. If you use the
--mail-user directive, you must supply the directive with one
or more e-mail addresses to be used. For example:
#SBATCH --mail-user=user
If you use the --mail-type directive, you must supply the directive
with the event type, which can be BEGIN, END, FAIL, ALL, or TIME_LIMIT, TIME_LIMIT_90
(90% of time limit), TIME_LIMIT_80, TIME_LIMIT_50, and ARRAY_TASKS (mail for
each array task). For example:
#SBATCH --mail-type=BEGIN,END
5.4. Job Environment Directives
Job environment directives allow you to control the environment in which your script will operate. This section describes some useful variables in setting up the script environment.
Directive | Description |
---|---|
salloc slurm_options --x11 | Request an interactive job that runs on a compute node. |
#SBATCH --export=ALL | Export all environment variables from your login environment into your batch environment. |
#SBATCH --export=NONE | Export no environment variables from your login environment into your batch environment. |
#SBATCH --export=variable1, variable2 | Export specific environment variables from your login environment into your batch environment. |
#SBATCH --mem=size[K|M|G|T] | Memory size per node. |
5.4.1. Set Working Directory
By default, Slurm executes your job from the current directory where you
submit the job. To change the work directory cd to it in the
script. You can also set the working directory of the batch script to path
before it is executed using the -D or --chdir directive.
--chdir=path
or
-D path
The path can be a full or relative path to the directory where the command is executed. The advantage of using the --chdir directive (instead of cd in the script) is that stdout/stderr output files are also output into the new directory.
5.4.2. Interactive Batch Shell
When you log into Raider, you will be running in an interactive shell on a login node. The login nodes provide login access for Raider and support such activities as compiling, editing, and general interactive use by all users. Please note the AFRL DSRC Login Node Abuse policy.
The preferred method to run resource intensive interactive executions is to use an interactive batch session. An interactive batch session allows you to run interactively (in a command shell) on a compute node after waiting in the batch queue.
Note: Once an interactive session starts, it uses the entire requested block of CPU time and other resources unless you exit from it early, even if you don't use it. To avoid unnecessary charges to your project, don't forget to exit an interactive session once finished.
A Slurm interactive session reserves resources on compute nodes, allowing
you to use them interactively as you would the login node. On Raider, use the
salloc command, as follows:
salloc your_slurm_options --x11
The Slurm options for your job are described in Required Scheduler Directives above. The command will run using your default shell. The --x11 directive enables X-Windows access, so it may be omitted if your interactive job does not use a GUI. The salloc command adds your job to the queue and starts a new bash session on a compute node. Any subsequent commands in that session occur within the running job on the compute nodes.
Interactive batch sessions are scheduled just like normal batch jobs, so depending on how many other batch jobs are queued, it may take some time. Once your interactive batch shell starts, you will be logged into the first compute node of those assigned to your job. At this point, you can run or debug interactive applications, execute job scripts, post-process data, etc. You can launch parallel applications on your assigned compute nodes by using an MPI or other parallel launch command.
The HPC Interactive Environment (HIE) provides an HIE queue specifically for interactive jobs. It offers longer job times and has nodes reserved only for HIE, so queue wait times are sometimes much shorter. However, HIE has limitations, such as only allowing the use of a single node at a time. See the HIE User Guide for more information before using the HIE queue.
5.4.3. Export Environment Variables
Batch jobs run with their own environment, separate from the login environment from which the batch job is launched. If your application is dependent on environment variables set in the login environment, you need to export these variables from the login environment to the batch environment.
The --export directive tells Slurm to export all environment
variables from your login environment to your batch environment. SLURM_*
variables are always propagated. To use this directive, add the following
line to your batch script:
#SBATCH --export=ALL
To export none of your own environment variables and only SLURM_* variables
from your environment use the directive:
#SBATCH --export=NONE
To export all SLURM_* environment variables along with explicitly defined variables
use the directive:
#SBATCH --export=my_variable1,my_variable2,...
It is also possible to set values for variables exported in this way, as follows:
#SBATCH --export=my_variable=my_value,my_variable2=my_value2...
5.4.4. Memory Size
The --mem=size directive is used to specify the maximum amount
of memory required per node in the job in bytes. For example, if the job
needs up to 2 GB of memory, then the directive would read:
#SBATCH --mem=2G
5.5. Job Dependency Directives
Directive | Description |
---|---|
after:job_id[[+time][:job_id[+time]...]] | This job may begin time minutes after the specified jobs start or are cancelled. If no time is given there is no delay after start or cancellation. |
afterany:job_id[:job_id...] | This job may begin after the specified jobs terminate. This is the default dependency type. |
aftercorr:job_id[:job_id...] | A task of this job array may begin after the corresponding task ID in the specified job completes successfully (i.e., runs to completion with an exit code of zero). |
afternotok:job_id[:job_id...] | This job may begin after the specified jobs terminate in a failed state (non-zero exit code, node failure, timed out, etc.). |
afterok:job_id[:job_id...] | This job may begin after the specified jobs run successfully (i.e., to completion with a zero exit code). |
singleton | This job may begin after the termination of any previously launched jobs that have the same job name and user. |
Job dependency directives allow you to specify dependencies your job
may have on other jobs. This allows you to control the order jobs run in.
These directives generally take the following form:
#SBATCH --dependency=dependency_expression
or
#SBATCH -d dependency_expression
where dependency_expression is a comma-delimited list of one or more
dependencies, and each dependency is of the form:
type:job_id[:job_id][,type:job_id[:job_id]]
or
type:job_id[:job_id][?type:job_id[:job_id]]
where type is one of the directives listed below, and job_id
is a colon-delimited list of one or more job IDs your job is dependent
upon. All dependencies must be satisfied if the "," separator is used. Any
dependency may be satisfied if the "?" separator is used. For more information
about job dependencies, see the sbatch man page.
5.6. Slurm Input Environment Variables
In addition to environment variables inherited from your user environment and environment variables set by the scheduler, Slurm has environment variables that can be set by the user. Upon startup, sbatch reads and handles the options set in the following environment variables. While there are many environment variables, you only need to know a few important ones to get started. Commonly used environment variables you can set include:
Variable | Description |
---|---|
$SBATCH_ACCOUNT | Account name associated of the job allocation. Same as -A or --account |
$SBATCH_ARRAY_INX | Submit a job array. Same as -a or --array. |
$SBATCH_CONSTRAINT | Specify which constraints are required by their job. Same as -C or --constraint. |
$SBATCH_DISTRIBUTION | Same as -m or --distribution |
$SBATCH_EXPORT | Same as --export |
$SBATCH_GET_USER_ENV | Retrieve the login environment variables. Same as --get-user-env |
$SBATCH_GRES | Specifies a comma-delimited list of node types. Same as --gres. |
$SBATCH_GRES_FLAGS | Specify node task pinning options. Same as --gres-flags. |
$SBATCH_MEM_PER_NODE | Specify the real memory required per node. Same as --mem |
$SBATCH_QOS | Request a quality of service for the job. Same as -qos or -q. |
$SBATCH_RESERVATION | Allocate resources from the named reservation. Same as --reservation |
$SBATCH_TIMELIMIT | Set a limit on the total run time. Same as -t or --time |
See the sbatch man page for a complete list of environment variables. |
6. Job Arrays
Imagine you have several hundred jobs that are all identical except for two or three parameters whose values vary across a range of input values. Submitting all these jobs individually would not only be tedious but would also incur a lot of overhead, which would impose a significant strain on the scheduler, negatively impacting all scheduled jobs. This example is not an uncommon use case, and it is the reason why job arrays were invented.
Job arrays let you submit and manage collections of similar jobs quickly and easily within a single script, which can significantly relieve the strain on the queueing system. Resource directives are specified once at the top of a job array script and are applied to each array task. As a result, each task has the same initial options (e.g., size, wall time, etc.) but may have different input values.
If your use case includes 200 or more similar jobs that vary by only a few parameters, job arrays are highly recommended.
To implement a Slurm job array in your job script, include the directives:
#SBATCH --array=n-m[:step],...
or
#SBATCH -a n-m[:step],...
where n is the starting index, m is the ending index, and the
optional step is the increment. Slurm then queues this script in
FLOOR[(m-n)/step+1] instances, each of which
receives its index in the $SLURM_ARRAY_TASK_ID environment variable.
You can use the command echo $SLURM_ARRAY_TASK_ID to
output the unique index of a job instance.
A step function is specified with a suffix containing a colon and number.
For example,
#SBATCH --array=0-15:4
which is equivalent to:
#SBATCH --array=0,4,8,12
When working with Slurm job arrays, other Slurm environment variables that come into play include:
- SLURM_ARRAY_JOB_ID - Job array's master job ID number.
- SLURM_ARRAY_TASK_COUNT - Total number of tasks in a job array.
- SLURM_ARRAY_TASK_MAX - Job array's maximum ID (index) number.
- SLURM_ARRAY_TASK_MIN -Job array's minimum ID (index) number.
- SLURM_ARRAY_TASK_STEP -Job array's index step size.
See the sbatch man page for more information on these variables.
7. Example Scripts
This section provides sample scripts you may copy and use. All scripts follow the anatomy presented in Section 3 and have been tested in their respective scheduler environment. When you use any of these examples, remember to substitute your own Project_ID, job name, output and error files, executable, and clean up. More advanced scripts can be found under the $SAMPLES_HOME directory on the system. Assorted flavors of Hello World are provided in Section 8. These simple programs can be used to test these scripts.
The following Baseline Configuration variables are used in the scripts below.
Variable | Description |
---|---|
$BC_CORES_PER_NODE | The number of cores per node for the compute node on which a job is running. |
$BC_MEM_PER_NODE | The approximate maximum memory per node available to an end user program (in integer MB) for the compute node type to which a job is being submitted. |
$BC_MPI_TASKS_ALLOC | Intended to be referenced from inside a job script, contains the number of MPI tasks/ranks allocated for a particular job. |
$BC_NODE_ALLOC | Intended to be referenced from inside a job script, contains the number of nodes allocated for a particular job. |
7.1. Simple Batch Script
The following is a very basic script to demonstrate requesting resources (including all required directives), setting up the environment, specifying the execution block (i.e., the commands to be executed), and cleaning up after your job completes. Save this as a regular text file using your editor of choice.
#!/bin/bash ################################################################# # Description: This basic bash shell script for`a simple job. # The job can be submitted to the standard queue # with the sbatch command. # Use the "show_usage" command to get your PROJECT_ID(s). ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged #SBATCH --account=Project_ID ## or ##SBATCH -A Project_ID # Run the job in the standard queue #SBATCH -q standard # Select 4 nodes #SBATCH --nodes=4 ## or ##SBATCH -N 4 # Total tasks count #SBATCH --ntasks=8 ## or ##SBATCH -n 8 # Set max wall time to 10 minutes #SBATCH --time=00:10:00 ## or ##SBATCH -t 00:10:00 ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' #SBATCH --job-name=jobName ## or ## SBATCH -J jobName # Change stdout and stderr filenames #SBATCH --output=filename.out ## or ## SBATCH -o filename.out #SBATCH --error=filename.err ## or ## SBATCH -e filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------- ################################################################## # Change to the default working directory cd ${WORKDIR} echo "working directory is ${WORKDIR}" # Run the job to the default working directory echo echo "-----------------------" echo "-- Executable Output --" echo "-----------------------" mpiexec ./executable ################################################################## # CLEAN UP ------------------------------------------------------- ################################################################## # Remove temporary files and # move data to non-scratch directory (Home or archive) # See the "Archival In Compute Jobs" section (Section 4) of the # AFRL DSRC Archive Guide for a detailed example of performing archival # operations within a job script. exit
7.2. Job Information Batch Script
The following examples can be included in the Execution block of any job script. The first example shows Baseline Configuration environment variables available on all HPCMP systems. The second example shows scheduler-specific variables.
################################################################# # Job information set by Baseline Configuration variables ################################################################# echo ---------------------------------------------------------- echo "Type of node " $BC_NODE_TYPE echo "CPU cores per node " $BC_CORES_PER_NODE echo "CPU cores per standard node " $BC_STANDARD_NODE_CORES echo "CPU cores per accelerator node " $BC_ACCELERATOR_NODE_CORES echo "CPU cores per big memory node " $BC_BIGMEM_NODE_CORES echo "Hostname " $BC_HOST echo "Maxumum memory per nodes " $BC_MEM_PER_NODE echo "Number of tasks allocated " $BC_MPI_TASKS_ALLOC echo "Number of nodes allocated " $BC_NODE_ALLOC echo "Working directory " $WORKDIR echo ---------------------------------------------------------- ############################################################## # Output some useful job information. ############################################################## echo "-------------------------------------------------------" echo "Project ID " $SLURM_JOB_ACCOUNT echo "Job submission directory " $SLURM_SUBMIT_DIR echo "Submit host " $SLURM_SUBMIT_HOST echo "Job name " $SLURM_JOB_NAME echo "Job identifier (SLURM_JOB_ID) " $SLURM_JOB_ID echo "Job identifier (SLURM_JOBID) " $SLURM_JOBID echo "Working directory " $WORKDIR echo "Job partition " $SLURM_JOB_PARTITION echo "Job queue (QOS) " $SLURM_JOB_QOS echo "Job number of nodes " $SLURM_JOB_NUM_NODES echo "Job node list " $SLURM_JOB_NODELIST echo "Number of nodes " $SLURM_NNODES echo "Number of tasks " $SLURM_NTASKS echo "Node list " $SLURM_NODELIST echo "-------------------------------------------------------" echo
7.3. OpenMP Script
To run a pure OpenMP job, specify the number of cores you want from the node (ncpus). Also specify the number of threads (ompthreads) or $OMP_NUM_THREADS defaults to the value of ncpus, possibly resulting in poor performance. Differences between the Simple Batch Script and this script are highlighted.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged #SBATCH --account=Project_ID # Run the job in the standard queue #SBATCH --qos=standard # Select 4 nodes #SBATCH --nodes=4 # Total tasks count #SBATCH --ntasks=8 # Set max wall time to 10 minutes #SBATCH --time=00:10:00 ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' #SBATCH --job-name=jobName # Change stdout and stderr filenames #SBATCH --output=filename.out #SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------- ################################################################## # Change to the default working directory cd ${WORKDIR} echo "working directory is ${WORKDIR}" export OMP_NUM_THREADS=${BC_CORES_PER_NODE} # Run the job from the default working directory ./openMP_executable ################################################################## # CLEAN UP ------------------------------------------------------- ################################################################## # Remove temporary files and # move data to non-scratch directory (Home or archive) # See the "Archival In Compute Jobs" section (Section 4) of the # AFRL DSRC Archive Guide for a detailed example of performing # archival operations within a job script. exit
7.4. Hybrid (MPI/OpenMP) Script
Hybrid MPI/OpenMP scripts are required for executables that MPI between cores and OpenMP inside each core. The following script is an example of hybrid MPI and OpenMP. Differences between the Simple Batch Script and this script are highlighted.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged ##SBATCH --account=Project_ID # Run the job in the standard queue #SBATCH -q standard # Select 4 nodes #SBATCH --nodes=4 # Total tasks count, 1 per node #SBATCH --ntasks=4 # Set max wall time to 10 minutes #SBATCH --time=00:10:00 ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' #SBATCH --job-name=jobName # Change stdout and stderr filenames #SBATCH --output=filename.out #SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------- ################################################################## # Change to the default working directory cd ${WORKDIR} echo "working directory is ${WORKDIR}" # One thread per core on each node export OMP_NUM_THREADS=$BC_CORES_PER_NODE # Run the job to the default working directory mpiexec ./hybrid_executable ################################################################## # CLEAN UP ------------------------------------------------------- ################################################################## # Remove temporary files and # move data to non-scratch directory (Home or archive) # See the "Archival In Compute Jobs" section (Section 4) of the # AFRL DSRC Archive Guide for a detailed example of performing # archival operations within a job script.
7.5. Accessing More Memory per Process
By default, an MPI job runs one process per core, with all processes sharing
the available memory on the node. On Raider each compute node has 128 cores
and 237 GB of memory. Assuming one process per core, the memory per process is:
memory per process = 237 GB/128
If you need more memory per process, then your job needs to run fewer MPI processes per node. This means number_of_processes_per_node < 128. For example, if you were to request 4 nodes and use only 16 out of 128 cores, this will result in a total of 4*16=64 MPI processes. Each of the 16 MPI process per node will have access to approximately 14.8 GB (237 GB/16) of memory.
The following script demonstrates this example by requesting 4 nodes and setting 16 processes per node. The job runs for 2 hours in the standard queue. For more information, refer to the Samples section in the Raider User Guide. Note: Differences between the Simple Batch Script and this script are highlighted.
Another way to get more memory per process is to run on bigmem nodes, which is discussed in the next section. However, because there are few bigmem nodes on the system, if you need many cores, bigmem nodes may not be an option.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged ##SBATCH --account=Project_ID # Run the job in the standard queue #SBATCH -q standard ## Start 64 MPI processes; only 16 processes on each node # This will result in each process having a memory size of approximately 14.8 GB (237 GB/16) #SBATCH --nodes=4 #SBATCH --ntasks-per-node=16 # Set max wall time to 2 hours #SBATCH --time=02:00:00 ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' #SBATCH --job-name=jobName # Change stdout and stderr filenames #SBATCH --output=filename.out #SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------- ################################################################## # Change to the default working directory cd ${WORKDIR} echo "working directory is ${WORKDIR}" # Run the job to the default working directory ################################################################## # CLEAN UP ------------------------------------------------------- ################################################################## # Remove temporary files and # move data to non-scratch directory (Home or archive) # See the "Archival In Compute Jobs" section (Section 4) of the # AFRL DSRC Archive Guide for a detailed example of performing # archival operations within a job script. exit
7.6. GPU Script
Here is a short example of a script for submitting jobs to a GPU node. Differences between the Simple Batch Script and this script are highlighted.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged #SBATCH --account=Project_ID # Run the job in the standard queue #SBATCH -q standard # Select 4 nodes #SBATCH --nodes=4 # Total tasks count #SBATCH --ntasks=8 # Set max wall time to 10 minutes #SBATCH --time=00:10:00 # Request GPU nodes - select one of the options below ##SBATCH --constraint=mla ##SBATCH -C mla ##SBATCH --constraint=viz ##SBATCH -C viz ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' # SBATCH --jobname=jobName # Change stdout and stderr filenames # SBATCH --output=filename.out # SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------ ################################################################## # Change to the default working directory cd ${WORKDIR} echo "working directory is ${WORKDIR}" # Run the job to the default working directory ./GPU_executable ################################################################## # CLEAN UP ------------------------------------------------------- ################################################################## # Remove temporary files and # move data to non-scratch directory (Home or archive) # See the "Archival In Compute Jobs" section (Section 4) of the # AFRL DSRC Archive Guide for a detailed example of performing # archival operations within a job script. exit
7.7. Data Transfer Script
The transfer queue is a special-purpose queue for transferring or archiving files. It has access to $HOME, $ARCHIVE_HOME, $WORKDIR, and $CENTER. Jobs running in the transfer queue are charged for a single core against your allocation. Differences between the Simple Batch Script and this script are highlighted.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged #SBATCH --account=Project_ID # Set max wall time to ten minutes #SBATCH --time=00:10:00 # Request transfer nodes #SBATCH --qos=transfer #SBATCH --constraint=xfer ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' # SBATCH --jobname=jobName # Change stdout and stderr filenames # SBATCH --output=filename.out # SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------ ################################################################## cd $WORKDIR echo "New directory = ${WORKDIR}" # Assume all files to be transferred from are in $WORKDIR/from_dir export FROM_DIR=$WORKDIR/from_dir # Assume all files are to be transferred to $ARCHIVE_HOME export TO_DIR=$ARCHIVE_HOME # Create a gzip file to reduce data transfer time tar -czf $FROM_DIR.gz . # If needed, uncomment to create a directory on the archive # archive mkdir -C $TO_DIR # Use archive commands to transfer the data archive put -C $TO_DIR $FROM_DIR.gz # List the archive directory contents to verify data transfer archive ls -al -C $TO_DIR echo "Transfer job ended"
7.8. Job Array Script
As was discussed in Section 6, job arrays allow you to leverage a scheduler's ability to create multiple jobs from one script. Many of the situations where this is useful include:
- Establishing a list of commands to run and have a job created from each command in the list.
- Running many parameters against one set of data or analysis program.
- Running the same program multiple times with different sets of data.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged #SBATCH --account=Project_ID # Run the job in the standard queue #SBATCH -q standard # Select 4 nodes #SBATCH --nodes=4 # Total tasks count #SBATCH --ntasks=8 # Set max wall time to 10 minutes #SBATCH --time=00:10:00 ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' # SBATCH --jobname=jobName # Change stdout and stderr filenames # SBATCH --output=filename.out # SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------ ################################################################## cd $WORKDIR/SCRIPTS export JA_ID=$SLURM_JOB_ARRAY_ID export JA_DIR=$WORKDIR/Job_Array.o${JA_ID} # Output Job ID and Job array index information echo "Slurm Job Array ID is $SLURM_JOB_ARRAY_ID" echo "Slurm Job array index is $SLURM_ARRAY_TASK_ID" echo "Slurm Job ID is $JA_ID" echo "Job array directory is $JA_DIR" # # Make a directory for each task in the array mkdir $JA_DIR # # Change into to task specific directory to run each task cd $JA_DIR # # Retrieve the job's binary cp $WORKDIR/executable $JA_DIR/executable_$SLURM_ARRAY_TASK_ID # # Run job and redirect output export outfile=$JA_DIR/$JA_ID_$SLURM_ARRAY_TASK_ID mpiexec $JA_DIR/executable_$SLURM_ARRAY_TASK_ID &> $outfile
7.9. Large-Memory Node Script
The standard compute nodes on Raider contain 237 GB of RAM and 128 cores. That works out to about 1.85 GB/core (237 GB/128). This is fine for most jobs running on the system. However, some jobs require more memory per core. To accommodate these jobs, Raider has 8 large-memory nodes with 998 GB of memory. You can allocate a job on the large-memory nodes by submitting a large-memory job script. Differences between the Simple Batch Script and this script are highlighted.
#!/bin/bash ################################################################## # REQUIRED DIRECTIVES ------------------------------------------ ################################################################## # Account to be charged #SBATCH --account=Project_ID # Run the job in the standard queue #SBATCH -q standard # Select 4 nodes #SBATCH --nodes=4 # Total tasks count #SBATCH --ntasks=8 # Set max wall time to 10 minutes #SBATCH --time=00:10:00 # Request big memory nodes #SBATCH --constraint=bigmem ################################################################## # OPTIONAL DIRECTIVES ------------------------------------------- ################################################################## # Name the job 'jobName' # SBATCH --jobname=jobName # Change stdout and stderr filenames # SBATCH --output=filename.out # SBATCH --error=filename.err ################################################################## # EXECUTION BLOCK ------------------------------------------------ ################################################################## # Change to the default working directory cd ${WORKDIR} echo "working directory is ${WORKDIR}" # Run the job to the default working directory mpiexec ./executable ################################################################## # CLEAN UP ------------------------------------------------------- ################################################################## # Remove temporary files and # move data to non-scratch directory (Home or archive) # See the "Archival In Compute Jobs" section (Section 4) of the # AFRL DSRC Archive Guide for a detailed example of performing # archival operations within a job script.
8. Hello World Examples
This section provides code to differing examples of the basic hello.c program. Refer to the Raider User Guide for information about compiling.
8.1. C Program - hello.c
/************************************************************** * A simple program to demonstrate an MPI executable ***************************************************************/ #include <mpi.h> #include <stdio.h> int rank; int numNodes; char processorName[MPI_MAX_PROCESSOR_NAME]; int nameLen; int main(int argc, char** argv) { // Initialize the MPI environment MPI_Init(NULL, NULL); // Get the number of nodes MPI_Comm_size(MPI_COMM_WORLD, &numRanks); // Get the rank of this process MPI_Comm_rank(MPI_COMM_WORLD, &rank); // Get the name of the processor MPI_Get_processor_name(processorName, &nameLen); // Print messages from each processor printf("Hello from processor %s - ", processorName); printf("I am rank %d out of %d ranks\n", rank, numRanks); // Finalize the MPI environment MPI_Finalize(); } // end main
8.2. OpenMP - hello-OpenMP.c
/************************************************************* * A simple program to demonstrate a pure OpenMPI executable ***************************************************************/ #include <stdio.h> #include <stdlib.h> #include <omp.h> // needed for OpenMP #include <unistd.h> // only needed for definition of gethostname #include <sys/param.h> // only needed for definition of MAXHOSTNAMELEN int main (int argc, char *argv[]) { int th_id, nthreads; char foo[] = "Hello"; char bar[] = "World"; char hostname[MAXHOSTNAMELEN]; gethostname(hostname, MAXHOSTNAMELEN); #pragma omp parallel private(th_id) { th_id = omp_get_thread_num(); printf("%s %s from thread %d on %s!\n", foo, bar, th_id, hostname); #pragma omp barrier if ( th_id == 0 ) { nthreads = omp_get_num_threads(); printf("There were %d threads on %s!\n", nthreads, hostname); } } return EXIT_SUCCESS; }
8.3. Hybrid MPI/Open MP - hello-hybrid.c
/***************************************************************** * A simple program to demonstrate a Hybrid MPI/OpenMP executable *****************************************************************/ #include <stdio.h> #include <omp.h> #include "mpi.h" int main(int argc, char *argv[]) { int numprocs, rank, namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; int iam = 0, np = 1; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Get_processor_name(processor_name, &namelen); #pragma omp parallel default(shared) private(iam, np) { np = omp_get_num_threads(); iam = omp_get_thread_num(); printf("Hello from thread %d of %d, iam, np); printf( from process %d out of %d on %s\n", rank, numprocs, processor_name); } MPI_Finalize(); }
8.4. Cuda - hello-cuda.cu
/***************************************************************** * A simple program to demonstrate a CUDA/GPU executable * May require a module swap: * module swap PrgEnv-cray PrgEnv-nvidia * Check User manual for compiling GPU code ******************************************************************/ #include <stdio.h> #include <stdlib.h> #include <cuda.h> void cuda_device_init(void) { int ndev; cudaGetDeviceCount(&ndev); cudaDeviceSynchronize(); if (ndev == 1) printf("There is %d GPU.\n",ndev); else printf("There are %d GPUs\n",ndev); for(int i=0;i<ndev;i++) { cudaDeviceProp pdev; cudaGetDeviceProperties(&pdev,i); cudaDeviceSynchronize(); printf("Hello from GPU %d\n",i); printf("GPU type : %s\n",pdev.name); printf("Memory Global: %d Mb\n",\ (pdev.totalGlobalMem+1024*1024)/1024/1024); printf("Memory Const : %d Kb\n",pdev.totalConstMem/1024); printf("Memory Shared: %d Kb\n",pdev.sharedMemPerBlock/1024); printf("Clock Rate : %.3f GHz\n",pdev.clockRate/1000000.0); printf("Number of Processors : %d\n",pdev.multiProcessorCount); printf("Number of Cores : %d\n",8*pdev.multiProcessorCount); printf("Warp Size : %d\n",pdev.warpSize); printf("Max Thr/Blk : %d\n",pdev.maxThreadsPerBlock); printf("Max Blk Size : %d %d %d\n",\ pdev.maxThreadsDim[0],pdev.maxThreadsDim[1],\ pdev.maxThreadsDim[2]); printf("Max Grid Size: %d %d %d\n",\ pdev.maxGridSize[0],pdev.maxGridSize[1],\ pdev.maxGridSize[2]); } } int main(int argc, char * argv[]) { cuda_device_init(); return 0; } /************************************************************** * Compile Script for hello-cuda on Raider ***************************************************************/ #!/bin/bash # $MODULESHOME/init/bash module load cuda # set -x # nvcc -o hello-cuda.exe hello-cuda.cu chmod 750 hello-cuda.exe
9. Batch Scheduler Rosetta
User Commands | PBS | Slurm | LSF |
---|---|---|---|
Job Submission | qsub Script_File | sbatch Script_File | bsub < Script_File |
Job Deletion | qdel Job_ID | scancel Job_ID | bkill Job_ID |
Job status (by job) |
qstat Job_ID | squeue Job_ID | bjobs Job_ID |
Job status (by user) |
qstat -u User_Name | squeue -u User_Name | bjobs -u User_Name |
Job hold | qhold Job_ID | scontrol hold Job_ID | bstop Job_ID |
Job release | qrls Job_ID | scontrol release Job_ID | bresume Job_ID |
Queue list | qstat -Q | squeue | bqueues |
Node list | pbsnodes -l | sinfo -N OR scontrol show nodes | bhosts |
Cluster status | qstat -a | sinfo | bqueues |
GUI | xpbsmon | sview | xlsf OR xlsbatch |
Environment | PBS | Slurm | LSF |
Job ID | $PBS_JOBID | $SLURM_JOBID | $LSB_JOBID |
Submit Directory | $PBS_O_WORKDIR | $SLURM_SUBMIT_DIR | $LSB_SUBCWD |
Submit Host | $PBS_O_HOST | $SLURM_SUBMIT_HOST | $LSB_SUB_HOST |
Node List | $PBS_NODEFILE | $SLURM_JOB_NODELIST | $LSB_HOSTS/LSB_MCPU_HOST |
Job Array Index | $PBS_ARRAYID | $SLURM_ARRAY_TASK_ID | $LSB_JOBINDEX |
Job Specification | PBS | Slurm | LSF |
Script Directive | #PBS | #SBATCH | #BSUB |
Queue | -q Queue_Name | ARL: -p Queue_Name AFRL and Navy: -q Queue_Name |
-q Queue_Name |
Node Count | -l select=N1:ncpus=N2: mpiprocs=N3 (N1 = Node count N2 = Max cores per node N3 = Cores to use per node) |
-N min[-max] | -n CoreCount -R "span[ptile=CoresPerNode]" (NodeCount = CoreCount / Cores Per Node) |
Core Count | -l select=N1:ncpus=N2: mpiprocs=N3 (N1 = Node count N2 = Max cores per node N3 = Cores to use per node Core Count = N1 x N3) |
--ntasks=total_cores_in_run | -n Core_Count |
Wall Clock Limit | -l walltime=hh:mm:ss | -t min OR -t days-hh:mm:ss |
-W hh:mm |
Standard Output File | -o File_Name | -o File_Name | -o File_Name |
Standard Error File | -e File_Name | -e File_Name | -e File_Name |
Combine stdout/err | -j oe (both to stdout) OR -j eo (both to stderr) |
(use -o without -e) | (use -o without -e) |
Copy Environment | -V | --export=ALL|NONE|Variable_List | |
Event Notification | -m [a][b][e] | --mail-type=[BEGIN],[END],[FAIL] | -B or -N |
Email Address | -M Email_Address | --mail-user=Email_Address | -u Email_Address |
Job Name | -N Job_Name | --job-name=Job_Name | -J Job_Name |
Job Restart | -r y|n | --requeue OR --no-requeue (NOTE: configurable default) |
-r |
Working Directory | No option – defaults to home directory | --workdir=/Directory/Path | No option – defaults to submission directory |
Resource Sharing | -l place=scatter:excl | --exclusive OR --shared |
-x |
Account to charge | -A Project_ID | --account=Project_ID | -P Project_ID |
Tasks per Node | -l select=N1:ncpus=N2: mpiprocs=N3 (N1 = Node count N2 = Max cores per node N3 = Cores to use per node) |
--tasks-per-node=count | |
Job Dependency | -W depend=state:Job_ID[:Job_ID...][,state:Job_ID[:Job_ID...]] | --depend=state:Job_ID | -w done|exit|finish |
Job host preference | --nodelist=nodes AND/OR --exclude=nodes |
-m Node_List (i.e., "inf001" -or- inf[001-128]) OR -m node_type (i.e., "inference", "training", or "visualization") |
|
Job Arrays | -J N-M[:step][%Max_Jobs] | --array=N-M[:step] | -J "Array_Name[N-M[:step]][%Max_Jobs]" (Note: bold black brackets are literal) |
Generic Resources | -l other=Resource_Spec | --gres=Resource_Spec | |
Licenses | -l app=number Example: -l abaqus=21 (Note: license resource allocation) |
-L app:number | Example -L abaqus:21 -R "rusage[License_Spec]" (Note: brackets are literal) |
Begin Time | -a [[[YYYY]MM]DD]hhmm[.ss] (Note: no delimiters) |
--begin=YYYY-MM-DD[Thh:mm[:ss]] | -b [[YYYY:][MM:]DD:]hh:mm |
10. Glossary
- Batch-scheduled :
- users request compute nodes via commands to batch scheduler software and wait in a queue until the requested nodes become available
- Batch Script :
- A script that provides resource requirements and commands for the job.
- Pinning :
- Pinning threads for shared-memory parallelism or binding processes for distributed-memory parallelism is an advanced way to control how your system distributes the threads or processes across the available cores.