Coral User Guide
Table of Contents
- 1. Introduction
- 1.1. Document Scope and Assumptions
- 1.2. DSRC Policies
- 1.3. Obtaining an Account
- 1.4. Training
- 1.5. Requesting Assistance
- 2. System Configuration
- 2.1. System Summary
- 2.2. Login and Compute Nodes
- 3. Accessing the System
- 3.1. Kerberos
- 3.2. Logging In
- 3.3. File Transfers
- 4. User Environment
- 4.1. User Directories
- 4.2. Shells
- 4.3. Environment Variables
- 4.4. Archive Usage
- 5. Program Development
- 5.1. Modules
- 5.2. Programming Models
- 5.3. Available Compilers
- 5.4. Libraries
- 5.5. Debuggers
- 5.6. Code Profiling
- 5.7. Compiler Optimization Options
- 6. Batch Scheduling
- 6.1. Scheduler
- 6.2. Queue Information
- 6.3. Interactive Logins
- 6.4. Batch Request Submission
- 6.5. Batch Resource Directives
- 6.6. Interactive Batch Sessions
- 6.7. Launch Commands
- 6.8. Sample Scripts
- 6.9. Slurm Commands
- 6.10. Advance Reservations
- 7. Software Resources
- 7.1. Application Software
- 7.2. Useful Utilities
- 7.3. Sample Code Repository
- 8. Links to Vendor Documentation
- 8.1. GNU Links
- 9. Glossary
1. Introduction
1.1. Document Scope and Assumptions
This document provides an overview and introduction to the use of the Apsen Systems Linux Cluster system (Coral) located at the MHPCC DSRC, along with a description of the specific computing environment on the system. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:
- Use of the Linux operating system
- Use of an editor (e.g., vi or emacs)
- Remote use of computer systems via network
- A selected programming language and its related tools and libraries
1.2. DSRC Policies
All policies are discussed in the Policies Section of the MHPCC DSRC Introductory Site Guide. All users running at the MHPCC DSRC are expected to know, understand, and follow the policies discussed. If you have any questions about the MHPCC DSRC's policies, please contact the HPC Help Desk.
1.3. Obtaining an Account
To begin the account application process, visit the Obtaining an Account page and follow the instructions presented there. An HPC Help Desk video is available to guide you through the process.
1.4. Training
Training on a number of topics in this User Guide is available at the PET Knowledge Management Learning System. New account holders should strongly consider attending HPCMP New Account Orientation, which is provided via live webcast every month and available as an on-demand video.
1.5. Requesting Assistance
The HPC Help Desk is available to assist users with unclassified problems, issues, or questions. Technicians are on duty 8:00 a.m. to 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).
- Service Portal: https://helpdesk.hpc.mil/hpc
- E-mail: help@helpdesk.hpc.mil
- Phone: 1-877-222-2039 or 937-255-0679
For after-hours support and for support services not provided by the HPC Help Desk, you can contact the MHPCC DSRC in any of the following ways:
- Phone: (808) 879-5077
- Fax: (808) 879-5018
- U.S. Mail:
MHPCC High Performance Computing Center
550 Lipoa Parkway
Kihei, Maui HI 96753
For more information about requesting assistance, see the HPC Help Desk dropdown.
2. System Configuration
2.1. System Summary
Coral is an Aspen Systems Linux Cluster system. It has two login nodes Node - an individual server in a cluster or collection of servers of an HPC system and three types of compute nodes for job execution. Coral uses InfiniBand as its high-speed interconnect Interconnect - a specialized, very high-speed network that connects the nodes of an HPC system together. It is typically used for application inter-process communication (e.g., message passing) and I/O traffic. for MPI messages and IO traffic. Coral uses WEKA to manage its parallel file system Parallel File System - A software component designed to store data across multiple networked servers and to facilitate high-performance access through simultaneous, coordinated input/output operations (IOPS) between clients and storage nodes..
Login | Standard | Large-Memory | GPU | GPU | GPU | DPU | DPU | |
---|---|---|---|---|---|---|---|---|
Total Nodes | 2 | 4 | 8 | 8 | 4 | 4 | 2 | 4 |
Processor | Intel Xeon Gold 6326 | Ampere Altra | Intel Xeon Gold 6338 | Intel Xeon Gold 6338 | AMD EPYC 7513 | Ampere Altra | Intel Xeon Gold 6338 | Ampere Altra |
Processor Speed | 2.9 GHz | 3.0 GHz | 2.0 GHz | 2.0 GHz | 2.6 GHz | 3.0 GHz | 2.0 GHz | 3.0 GHz |
Sockets / Node | 2 | 2 | 2 | 2 | 2 | 1 | 2 | 1 |
Cores / Node | 16 | 32 | 32 | 32 | 32 | 80 | 32 | 80 |
Total CPU Cores | 32 | 128 | 256 | 256 | 128 | 320 | 64 | 320 |
Usable Memory / Node | 489 GB | 489 GB | 1.9 TB | 1.9 TB | 981 GB | 489 GB | 489 GB | 489 GB |
Accelerators / Node | None | None | None | None | None | None | 1 | 2 |
Accelerator | N/A | N/A | N/A | N/A | N/A | N/A | NVIDIA Infiniband BlueField-2(Rev1) | NVIDIA Infiniband BlueField-2(Rev1) |
Memory / Accelerator | N/A | N/A | N/A | N/A | N/A | N/A | 16 GB | 16 GB |
Storage on Node | 3.5 TB NVMe | 3.5 TB NVMe | 3.5 TB NVMe | 3.5 TB NVMe | 3.5 TB NVMe | 5 TB NVMe | 3.5 TB NVMe | 5 TB NVMe |
Interconnect | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand | 200 Gbps HDR InfiniBand |
Operating System | RHEL8 | RHEL8 | RHEL8 | RHEL8 | RHEL8 | RHEL8 | RHEL8 | RHEL8 |
2.2. Login and Compute Nodes
Coral is intended as a batch-scheduled Batch-scheduled - users request compute nodes via commands to batch scheduler software and wait in a queue until the requested nodes become available HPC system with numerous nodes. Its login nodes Login Node - a node that serves as the user's entry point into an HPC system are for minor setup, housekeeping, and job preparation tasks and are not used for large computational (e.g., memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes Compute Node - a node that performs computational tasks for the user. There may be multiple types of compute nodes for specialized purposes. by batch job Batch Job - a single request for a set of compute nodes along with a set of tasks (usually in the form of a script) to perform on those nodes submission. Node types such as "Standard", "Large-Memory", "GPU-Accelerated", etc. are considered compute nodes. Coral uses both shared Shared Memory Model - a programming methodology where a set of processors (such as the cores within one node) have direct access to a shared pool of memory and distributed Distributed Memory Model - a programming methodology where memory is distributed across multiple nodes giving processes on each node faster direct access to local memory, but requiring slower techniques such as message passing to access memory on other nodes memory models. Memory is shared among all the cores on one node but is not shared among the nodes across the cluster.
Coral's login nodes use Intel 6326 Ice Lake processors with 489 GB of usable memory. All memory and cores on the node are shared among all users who are logged in. Therefore, users should not use more than 2 GB of memory at any one time.
Coral's standard compute nodes use Ampere M128-30 Altra processors. Each node contains 489 GB of usable shared memory. It contains 5 TB of on-node NVMe SSD storage. Standard compute nodes are intended for typical compute jobs.
Coral's large-memory compute nodes use Intel 6338 Ice Lake processors. Each node contains 1.9 TB of usable shared memory. It contains 3.5 TB of on-node NVMe SSD storage. Large-memory compute nodes are intended for jobs requiring large amounts of memory.
Coral's DPU nodes consist of Ampere M128-30 Altra processors paired with 1 NVIDIA Infiniband BlueField-2(Rev 1). Each node contains 489 GB of usable shared memory, as well as 16 GB of shared memory internal to each accelerator. Each node also contains 3.5 TB of on-node NVMe SSD storage. DPU compute nodes are intended for users wishing to use a DPU.
3. Accessing the System
3.1. Kerberos
For security purposes, you must have a current Kerberos Kerberos - authentication and encryption software required by the HPCMP to access HPC system login nodes and other resources. See Kerberos & Authentication ticket on your computer before attempting to connect to Coral. To obtain a ticket you must either install a Kerberos client kit on your desktop or connect via the HPC Portal. Visit the Kerberos & Authentication page for information about installing Kerberos clients on your Windows, Linux, or Mac desktop. Instructions are also available on those pages for getting a ticket and logging into the HPC systems from each platform.
3.2. Logging In
The system host name for the Coral cluster is coral.mhpcc.hpc.mil, which redirects you to one of 14 login nodes. Hostnames and IP addresses to these nodes are available upon request from the HPC Help Desk.
The preferred way to login to Coral is via ssh, as follows:
% ssh username@coral.mhpcc.hpc.mil
3.3. File Transfers
File transfers to DSRC systems (except for those to the local archive system) must be performed using the following HPCMP Kerberized tools: scp, mpscp, sftp, scampi, or tube. Windows users may use a graphical secure file transfer protocol (sftp) client such as FileZilla. See the HPC Help Desk Video on Using FileZilla. Before using any of these tools (except tube), you must use a Kerberos client to obtain a Kerberos ticket. Information about installing and using a Kerberos client can be found on the Kerberos & Authentication page.
The command below uses secure copy (scp) to copy a single local
file into a destination directory on a Coral login node. The
mpscp command is similar to the scp command, but it
has a different underlying means of data transfer and may enable a greater transfer
rate. The mpscp command has the same syntax as scp.
% scp local_file username@coral.mhpcc.hpc.mil:/target_dir
Both scp and mpscp can be used to send multiple files.
This command transfers all files with the .txt extension to the same destination
directory.
% scp *.txt username@coral.mhpcc.hpc.mil:/target_dir
The example below uses the secure file transfer protocol (sftp)
to connect to Coral, then uses sftp's cd and put
commands to change to the destination directory and copy a local file there.
The sftp quit command ends the sftp session. Use the sftp
help command to see a list of all sftp commands.
% sftp username@coral.mhpcc.hpc.mil
sftp> cd target_dir
sftp> put local_file
sftp> quit
4. User Environment
4.1. User Directories
The following user directories are provided for all users on Coral:
Path | Formatted Capacity | File System Type | Storage Type | User Quota | Minimum File Retention |
---|---|---|---|---|---|
/wdata/home ($HOME) | 50 TB | WEKA | SAS | 100 GB | None |
/wdata/scratch ($WORKDIR) | 50 TB | WEKA | SAS | None | 21 Days |
/p/cwfs ($CENTER) | 753 TB | NFS | SAS | 100 TB | 120 Days |
/wdatap/app ($PROJECTS_HOME) | 50 TB | WEKA | SAS | None | None |
4.1.1. Home Directory ($HOME)
When you log in, you are placed in your home directory, /wdata/home/username. It is accessible from the login and compute nodes and can be referenced by the environment variable $HOME.
Your home directory is intended for storage of frequently used files, scripts, and small utility programs. It has a 100-GB quota, and files stored there are not subject to automatic deletion based on age. It is backed up weekly to enable file restoration in the event of catastrophic system failure.
Important! The home file system is not tuned for parallel I/O and does not support application-level I/O. Jobs performing intensive file I/O in your home directory will perform poorly and cause problems for everyone on the system. Running jobs should use the work file system ($WORKDIR) for file I/O.
4.1.2. Work Directory ($WORKDIR)
The work file system is a large, high-performance WEKA-based file system tuned for parallel application-level I/O. It is accessible from the login and compute nodes and provides temporary file storage for queued and running jobs.
All users have a work directory, /wdata/scratch/username, on this file system, which can be referenced by the environment variable, $WORKDIR. This directory should be used for all application file I/O. NEVER allow your jobs to perform file I/O in $HOME.
$WORKDIR has no quota. It is not backed up or exported to any other system and is subject to an automated deletion cycle. If available disk space gets too low, files that have not been accessed in 30 days may be deleted. If this happens or if catastrophic disk failure occurs, lost files are irretrievable. To prevent the loss of important files, transfer them to a long-term storage area, such as your archival directory ($ARCHIVE_HOME, see Archive Usage), which has no quota. Or, for smaller files, your home directory ($HOME).
4.1.3. Center Directory ($CENTER)
The Center-Wide File System (CWFS) is an NFS-mounted file system. It is accessible from the login nodes of all HPC systems at the center and from the HPC Portal. It provides centralized, shared storage that enables users to easily access data from multiple systems. The CWFS is not tuned for parallel I/O and does not support application-level I/O.
All users have a directory on the CWFS. The name of your directory may vary between systems and between centers, but the environment variable $CENTER always refers to this directory.
$CENTER has a quota of 100 TB. It is not backed up or exported to any other system and is subject to an automated deletion cycle. If available disk space gets too low, files that have not been accessed in 120 days may be deleted. If this happens or if catastrophic disk failure occurs, lost files are irretrievable. To prevent the loss of important files, transfer them to a long-term storage area, such as your archival directory ($ARCHIVE_HOME, see Archive Usage), which has no quota. Or, for smaller files, your home directory ($HOME).
4.1.4. Projects Directory ($PROJECTS_HOME)
The Projects directory, $PROJECTS_HOME, is a file system set aside for group-shared storage. It is intended for storage of semi-permanent files, similar to a home directory, but typically larger and shared by a group. It is not meant for high-speed application output ($WORKDIR, see Work Directory). A new project sub-directory can be created via an HPC Help Desk request and appears as follows: $PROJECTS_HOME/new_group_dir. The HPC Help Desk request must specify a UNIX group to be assigned to the project sub-directory. Users can create and manage UNIX groups in the Portal to the Information Environment, allowing the creator of the assigned group to manage the members of the group with access to the project sub-directory.
4.1.5. Storage On-Node ($PROJECTS_HOME)
Some compute nodes (see the nodes in the Node Configuration Table) include a local solid-state storage device NVMe that is local to and accessible by the node only and can be accessed by the environment variable $LOCALWORKDIR. It has improved local bandwidth and latency, but each device is a separate drive with no parallel read/write capability. Files stored on this device must be relocated at the end of a job or they may be lost when the node is reassigned to a new job.
4.1.6. Specialized Temporary Directories
Each node includes several specialized directories.
The /tmp and /var/tmp directories are usually intended for temporary files as created by the operating system. Do not use these directories for your own files, as filling up these file systems can cause issues.
Coral also provides a "virtual" file system (i.e., "RAM disk") called /dev/shm which is local to each compute node. You may use this file system to store files in memory. It automatically increases in size as needed, up to half of the memory of the node. It is extremely fast, but it is also small and takes available node memory away from your application. An example use case is performing significant I/O with many small files when the memory is not otherwise needed by the application.
4.2. Shells
The following shells are available on Coral: csh, bash, ksh, tcsh, sh, and zsh.
To change your default shell, log into the Portal to the Information Environment and go to "User Information Environment" > "View/Modify personal account information". Scroll down to "Preferred Shell" and select your desired default shell. Then scroll to the bottom and click "Save Changes". Your requested change should take effect within 24 hours.
4.3. Environment Variables
A number of environment variables are provided by default on all HPCMP high performance computing (HPC) systems. We encourage you to use these variables in your scripts where possible. Doing so will help simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems.
4.3.1. Common Environment Variables
The following environment variables are automatically set in both your login and batch environments:
Variable | Description |
---|---|
$ARCHIVE_HOME | Your directory on the archive system |
$ARCHIVE_HOST | The host name of the archive system |
$BC_ACCELERATOR_NODE_CORES | The number of CPU cores per node for a compute node which features CPUs and a hosted accelerator processor |
$BC_BIGMEM_NODE_CORES | The number of cores per node for a big memory (BIGMEM) compute node |
$BC_CORES_PER_NODE | The number of CPU cores per node for the node type on which the variable is queried |
$BC_HOST | The generic (not node specific) name of the system. Examples include centennial, mustang, onyx and gaffney |
$BC_NODE_TYPE | The type of node on which the variable is queried. Values of $BC_NODE_TYPE are: LOGIN, STANDARD, PHI, BIGMEM, BATCH, or ACCELERATOR |
$BC_PHI_NODE_CORES | The number of Phi cores per node, if the system has any Phi nodes. It will be set to 0 on systems without Phi nodes |
$BC_STANDARD_NODE_CORES | The number of CPU cores per node for a standard compute node |
$CC | The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded |
$CENTER | Your directory on the Center-Wide File System (CWFS) |
$CSE_HOME | The top-level directory for the Computational Science Environment (CSE) tools and applications |
$CXX | The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded |
$DAAC_HOME | The top level directory for the DAAC (Data Analysis and Assessment Center) supported tools |
$F77 | The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded |
$F90 | The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded |
$HOME | Your home directory on the system |
$JAVA_HOME | The directory containing the default installation of JAVA |
$KRB5_HOME | The directory containing the Kerberos utilities |
$LOCALWORKDIR | A high-speed work directory that is local and unique to an individual node, if the node provides such space |
$PET_HOME | The directory containing tools installed by PET staff, which are considered experimental or under evaluation. Certain older packages have been migrated to $CSE_HOME, as appropriate |
$PROJECTS_ARCHIVE | The directory on the archive system in which user-supported applications, code, and data may be kept |
$PROJECTS_HOME | The directory in which user-supported applications and codes may be installed |
$SAMPLES_HOME | A directory that contains the Sample Code Repository, a variety of sample codes and scripts provided by a center's staff |
$WORKDIR | Your work directory on the local temporary file system (i.e., local high-speed disk) |
4.3.2. Batch-Only Environment Variables
In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts can see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.
Variable | Description |
---|---|
$BC_MEM_PER_NODE | The approximate maximum memory (in integer MB) per node available to an end user program for the compute node type to which a job is being submitted |
$BC_MPI_TASKS_ALLOC | The number of MPI tasks allocated for a particular job |
$BC_NODE_ALLOC | The number of nodes allocated for a particular job |
$JOBDIR | Job-specific directory in $WORKDIR immune to scrubbing while job is active. |
Please refer to the Coral Slurm Guide for a number of helpful environment variables provided during batch runs.
4.4. Archive Usage
All our HPC systems have access to an online archival mass storage system that provides long-term storage for users' files on a petascale tape file system that resides on a robotic tape library system. A 400-TB disk cache frontends the tape file system and temporarily holds files while they are being transferred to or from tape.
Tape file systems have very slow access times. The tapes must be robotically pulled from the tape library, mounted in one of the limited number of tape drives, and wound into position for file archival or retrieval. For this reason, users should always tar up their small files in a large tarball when archiving a significant number of files. A good size range for tarballs is about 10 GB. At that size, the time required for file transfer and tape I/O is reasonable. Files larger than 10 TB will greatly increase the time required for both archival and retrieval. Files larger than 400 TB will not be archived.
The environment variable $ARCHIVE_HOME is automatically set for you and can be used to reference your archive directory when using archive commands.
4.4.1. Archive Command Synopsis
After using the cd command to set your current working directory to $ARCHIVE_HOME, Linux file and directory commands may be used to interact with the MHPCC archive system. For information on additional capabilities, see the MHPCC DSRC Archive Guide or read the online man page available on each system. The archive command is non-Kerberized and can be used in batch submission scripts if desired.
Copy one or more files from the archive system:
cp [cpopts] file1 [file2...] destdir
List files and directory contents on the archive system:
ls [lsopts] [file/dir ...]
Create directories on the archive system:
mkdir [mkdiropts] dir1 [dir2 ...]
Copy one or more files to the archive system:
cp [cpopts] file1 [file2...] .
Move or rename files and directories on the archive server:
mv [mvopts] file1 [file2...] target
Remove files and directories from the archive server:
rm [rmopts] -file1 [file2...]
Remove empty directories from the archive server:
rmdir [rmdiropts] dir1 [dir2 ...]
Change permissions of files and directories on the archive server:
chmod [chmodopts] mode file1 [file2...]
Change the group of files and directories on the archive server:
chgrp [chgrpopts] group file1 [file2...]
5. Program Development
5.1. Modules
Software modules are a convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. Coral also uses modules to initialize your environment with application software, system commands, libraries, and compiler suites.
A number of modules are loaded automatically as soon as you log in. To see the currently loaded modules, use the module list command. To see the entire list of available modules, use the module avail command. You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this and other information on using modules, see the MHPCC DSRC Modules Guide.
5.2. Programming Models
Coral supports several parallel programming models. A programming model augments a programming language with parallel processing capability. Different programming models may use a different approach to express parallelism, such as message passing, threads, distributed memory, shared memory, etc.
Note, if an application is not programmed for distributed memory, then only the cores on a single node can be used. This is limited to 32 cores on Coral's standard nodes. See the Node Configuration table for core counts on other nodes.
Note, keep the system architecture in mind during code development. For instance, if your program requires more memory than is available on a single node, then you need to parallelize your code so it can function across multiple nodes.
Key supported programming models are discussed in each subsection below.
5.2.1. Message Passing Interface (MPI)
Coral's default MPI stack supports the MPI 3.1 Standard. MPI is part of the software support for parallel programming across a network of computer systems through a technique known as message passing. MPI establishes a practical, portable, efficient, and flexible standard for high-performance message passing. See man openmpi for additional information.
When creating an MPI program, ensure the default MPI
module (openmpi) or other available MPI module (
mpich is loaded.
To check this, run the module list command. To load the desired
module, run the following command:
module load /openmpi
Also, ensure the source code contains one of the following for the MPI library:
INCLUDE "mpif.h" ## for older Fortran USE mpi ## for newer Fortran #include <mpi.h> ## for C/C++
To compile an MPI program, use one of the following:
mpifort -o MPI_executable mpi_program.f ## for Fortran mpicc -o MPI_executable mpi_program.c ## for C mpic++ -o MPI_executable mpi_program.cpp ## for C++
For more information on compilers, compiler wrappers, and compiler options, see Available Compilers.
To run an MPI program within a batch script, load the same modules as
used to compile the application before using the following command to launch your executable:
mpirun -n mpi_procs ./MPI_executable [user_arguments]
where mpi_procs is the number of MPI processes being started. For example:
#### The following starts 30 MPI processes #### (the placement of the processes on nodes is handled by the batch scheduler) mpirun -n 30 ./MPI_executable
For more information about Open MPI, type man openmpi.
For more information on which MPI Standard features are supported by the default MPI on the system, check the BC MPI Test Suite page.
5.2.2. Open Multi-Processing (OpenMP)
OpenMP is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications. It supports shared-memory multiprocessing programming in C, C++, and Fortran and consists of a set of compiler directives, library routines, and environment variables that influence compilation and run-time behavior.
When creating an OpenMP program, if using OpenMP functions (e.g., omp_get_wtime), ensure the source code includes one of the following lines:
INCLUDE "omp.h" ## for older Fortran USE omp_lib ## for newer Fortran #include <omp.h> ## for C/C++
To compile an OpenMP program, ensure the desired compiler module is loaded. Use the following compiler commands and flags:
ifort -o OpenMP_executable -qopenmp openmp_program.f ## for Intel Fortran gfortran -o OpenMP_executable -fopenmp openmp_program.f ## for GNU Fortran icc -o OpenMP_executable -qopenmp openmp_program.c ## for Intel C gcc -o OpenMP_executable -fopenmp openmp_program.c ## for GNU C icpc -o OpenMP_executable -qopenmp openmp_program.cpp ## for Intel C++ g++ -o OpenMP_executable -fopenmp openmp_program.cpp ## for GNU C++
For more information on compilers, compiler wrappers, and compiler options, see Available Compilers.
When running OpenMP applications, the $OMP_NUM_THREADS
environment variable must be used to specify the number of threads. For example:
#### run 32 threads on one node
export OMP_NUM_THREADS=32
./OpenMP_executable [user_arguments]
In the example above, the application starts the OpenMP_executable on one node and spawns a total of 32 threads. Since Coral has 32 cores per compute node, if you wanted to run one thread per core, you would set $OMP_NUM_THREADS to 32 instead.
5.2.3. Hybrid MPI/OpenMP
An application built with the hybrid model of parallel programming can run using both OpenMP and Message Passing Interface (MPI). This allows the application to run on multiple nodes yet leverages OpenMP's advantages within each node. In hybrid applications, multiple OpenMP threads are spawned by MPI processes, but MPI calls should not be issued from OpenMP parallel regions or by an OpenMP thread.
When creating a hybrid MPI/OpenMP program, follow the instructions in both the MPI and OpenMP sections above for creating your program.
To compile a hybrid program, use the MPI compilers in conjunction with the OpenMP options, as follows:
mpif77 -o hybrid_executable -qopenmp hybrid_program.f ## for Intel Fortran mpif77 -o hybrid_executable -fopenmp hybrid_program.f ## for GNU Fortran mpicc -o hybrid_executable -qopenmp hybrid_program.c ## for Intel C mpicc -o hybrid_executable -fopenmp hybrid_program.c ## for GNU C mpic++ -o hybrid_executable -qopenmp hybrid_program.cpp ## for Intel C++ mpic++ -o hybrid_executable -fopenmp hybrid_program.cpp ## for GNU C++
For more information on compilers, compiler wrappers, and compiler options, see Available Compilers.
When running hybrid MPI/OpenMP programs, use the MPI launcher as in
MPI programs along with the $OMP_NUM_THREADS environment variable
to specify the number of threads per MPI process. In the following example,
four MPI processes will spawn eight threads each for a total of 32 threads:
#### run 32 hybrid threads (4 MPI procs, 8 threads per proc)
export OMP_NUM_THREADS=8
mpirun -n 4 -N 8 ./hybrid_executable [user_arguments]
Ensure the number of threads per node does not exceed the number of cores on each node. See the mpirun man page and the Batch Scheduling section for more detail on how MPI processes and threads are allocated on the nodes.
5.2.4. Co-Array Fortran
The Intel & GNU compilers support Co-Array Fortran (CAF). This is a set of Partitioned Global Address Space (PGAS) extensions that lets you reference memory locations on any node without the need for message-passing protocols. This can greatly simplify writing and debugging parallel code.
To compile a CAF program, use the following compilers:
ifort -o CAF_executable -coarray caf_program.f ## for Intel
gfortran -o CAF_executable –fcoarray=<option> caf_program.f ## for GNU
5.3. Available Compilers
Coral has four compiler suites:
- Intel
- GNU
The GNU compiler suite module is loaded by default.
Compiling can be affected by which MPI stack is being used. Coral has two MPI stacks:
- Open MPI
- MPICH
For more information about MPI, or if you are using another programming model besides MPI, see Programming Models above.
All versions of MPI share a common base set of compilers that are available on both the login and compute nodes. Codes running on the login nodes must be serial. The following table lists serial compiler commands for each language.
Compiler | Intel | GNU |
---|---|---|
C | icc | gcc |
C++ | icpc | g++ |
Fortran 77 | ifort | gfortran |
Fortran 90 | ifort | gfortran |
Codes running on compute nodes may be serial or parallel. To compile parallel codes with Open MPI, use the openmpi module and the following compiler wrappers:
Compiler | Intel | GNU |
---|---|---|
C | icc | gcc |
C++ | icpc | g++ |
Fortran 77 | ifort | gfortran |
Fortran 90 | ifort | gfortran |
To compile parallel codes with MPICH, use the mpich module and the following compiler wrappers:
Compiler | Intel | GNU |
---|---|---|
C | mpicc | mpicc |
C++ | mpicxx | mpicxx |
Fortran 77 | mpif77 | mpif77 |
Fortran 90 | mpifort | mpifort |
For more information about compiling with MPI, see Programming Models above.
5.3.1. Intel Compiler Environment
The Intel compiler is a highly optimizing compiler typically producing very fast executables for Intel processors. This compiler can be loaded with the compiler/intel module. The following table lists some of the more common options you may use.
Option | Purpose |
---|---|
-c | Generate intermediate object file but do not attempt to link |
-I directory | Search in directory for include or module files |
-L directory | Search in directory for libraries |
-o outfile | Name executable "outfile" rather than the default "a.out" |
-Olevel | Set the optimization level. For more information on optimization, see the sections on Compiler Optimization and Code Profiling |
-g | Generate symbolic debug information |
-fPIC | Generate position-independent code for shared libraries |
-ip | Single-file interprocedural optimization. See the sections on Compiler Optimization and Code Profiling |
-ipo | Multi-file interprocedural optimization. See the sections on Compiler Optimization and Code Profiling |
-free | Process Fortran codes using free form |
-convert big_endian | Big-endian files; the default is little-endian |
-qopenmp | Recognize OpenMP directives |
-Bdynamic | Compiling using shared objects |
-fpe-all=0 | Trap floating point, divide by zero, and overflow exceptions |
Detailed information about these and other compiler options is available in the Intel compiler (ifort, icc, and icpc) man pages.
5.3.2. GNU Compiler Collection (GCC)
The GCC Programming Environment is a popular open-source compiler typically found on all Linux systems and generally works in a compatible manner across these systems. It provides many options that are the same for all compilers in the suite. This compiler can be loaded with the gcc module. The following table lists some of the more common options you may use.
Option | Purpose |
---|---|
-c | Generate intermediate object file but do not attempt to link |
-I directory | Search in directory for include or module files |
-L directory | Search in directory for libraries |
-o outfile | Name executable "outfile" rather than the default "a.out" |
-Olevel | Set the optimization level. For more information on optimization, see the sections on Compiler Optimization and Code Profiling |
-g | Generate symbolic debug information |
-fPIC | Generate position-independent code for shared libraries |
-fconvert=big=endian | Read/write big-endian files; the default is for little-endian |
-Wextra -Wall | Turns on increased error reporting |
Detailed information about these and other compiler options is available in the GNU compiler (gfortran, gcc, and g++) man pages.
5.4. Libraries
Several scientific and math libraries are available on Coral. The libraries provided by the vendor and/or compiler are typically faster than the open-source equivalents (CSE).
5.4.1. Intel Math Kernel Library (MKL)
Coral provides the Intel Math Kernel Library (Intel MKL), a set of numerical routines tuned specifically for Intel platform processors and optimized for math, scientific, and engineering applications. The routines, which are available via both Fortran and C interfaces, include:
- LAPACK plus BLAS (Levels 1, 2, and 3)
- ScaLAPACK plus PBLAS (Levels 1, 2, and 3)
- Fast Fourier Transform (FFT) routines for single-precision, double-precision, single-precision complex, and double-precision complex data types
- Discrete Fourier Transforms (DFTs)
- Fast Math and Fast Vector Library
- Vector Statistical Library Functions (VSL)
- Vector Transcendental Math Functions (VML)
The MKL routines are part of the Intel Programming Environment as Intel's MKL is bundled with the Intel Compiler Suite.
Linking to the Intel Math Kernel Libraries can be complex and is beyond the scope of this introductory guide. Documentation explaining the full feature set along with instructions for linking can be found at the Intel Math Kernel Library documentation page.
Intel also makes a link advisor available to assist users with selecting proper linker and compiler options: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html.
5.4.2. Additional Libraries
There is also an extensive set of math, I/O, and other libraries available in the $CSE_HOME directory on Coral. Information about these libraries can be found on the Baseline Configuration website at BC policy FY13-01 and the CSE Quick Reference Guide.
5.5. Debuggers
Coral has the GNU Project Debugger (gdb). It can perform a variety of tasks ranging from analyzing core files to setting breakpoints and debugging running parallel programs. As a rule, your code must be compiled using the -g command-line option.
For in-depth training using debuggers, visit the PET Knowledge Management Learning System and search for "debug" or use the following search link.
5.5.1. GNU Project Debugger (gdb)
The gdb debugger is a source-level debugger that can be invoked
either with a program for execution or a running process id. It is serial-only.
To launch your program under gdb for debugging, use the following command:
gdb a.out corefile
To attach gdb to a program that is already executing on a node, use the following
command:
gdb a.out pid
For more information, the GDB manual can be found at http://www.gnu.org/software/gdb.
5.6. Code Profiling
Profiling is the process of analyzing the execution flow and characteristics of your program to identify sections of code that are likely candidates for optimization, which increases the performance of a program by modifying certain aspects for increased efficiency.
We provide gprof and VTune to assist in the profiling process. In addition, a basic overview of optimization methods with information about how they may improve the performance of your code can be found in the Techniques for Improving Performance guide.
For in-depth training on using profiling tools, visit the PET Knowledge Management Learning System and search for "optimiz" or use the following search link.
5.6.1. GNU Project Profiler (gprof)
The gprof profiler shows how your program is spending its time and which function calls are made. It works best for serial codes but can be used for small parallel codes, though it will not provide MPI or threaded information.
To profile code using gprof, use the -pg option during compilation. It will automatically generate profile information when executed. Use the gprof command to view the profile information. See man gprof on Coral or the gprof web site for more information.
5.6.2. Additional Profiling Tools
There is also a set of profiling tools available in CSE. Information about these tools may be found on the Baseline Configuration website at BC policy FY13-01 and the CSE Quick Reference Guide.
5.7. Compiler Optimization Options
The -Olevel option enables code optimization when compiling. The level you choose (0-4 depending upon the compiler) determines how aggressive the optimization will be. Increasing levels of optimization may increase performance significantly but may also cause a loss of precision. There are additional options that may enable further optimizations. The following table contains the most commonly used options.
Option | Purpose | Compiler Suite |
---|---|---|
-O0 | No Optimization. (default in GNU) | All |
-O1 | Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimization | All |
-O2 | Level 1 plus traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. Generally safe and beneficial. (default in Cray and Intel) | All |
-O3 | Levels 1 and 2 plus more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable. Generally beneficial | All |
-fipa-* | The GNU compilers automatically enable IPA at various -O levels. To set these manually, see the options beginning with -fipa in the gcc man page | GNU |
-finline-functions | Enables function inlining within a single file | Intel |
-ip | Enables interprocedural optimization within single files at a time | Intel |
-ipon | Enables interprocedural optimization between files and produces up to n object files (default: n=0) | Intel |
-inline-level=n | Number of levels of inlining (default: n=2) | Intel |
-opt-reportn | Generate optimization report with n levels of detail | Intel |
-xHost | Generate code with the highest vector instruction set available on the processor | Intel |
-fp-model model | Used to tune the float-point optimizations, typically to override -On. -O3 uses model=fast which may be considered too imprecise for scientific codes, so often -O3 is used in conjunction with -fp-model precise, consistent, or strict | Intel |
6. Batch Scheduling
6.1. Scheduler
The Slurm Workload Manager (Slurm) is currently running on Coral. It schedules jobs, manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. Slurm can manage both single-processor and multiprocessor jobs. The appropriate module is automatically loaded for you when you log in. This section is merely a brief introduction to Slurm; please see the Coral Slurm Guide for more details.
6.2. Queue Information
The following table describes the QOS (queues) available on Coral.
Priority | Queue Name | Max Wall Clock Time | Max Cores Per Job | Max Queued Per User | Max Running Per User | Description |
---|---|---|---|---|---|---|
Highest | ||||||
![]() |
debug | 30 Minutes | 1 | N/A | N/A | Time/resource-limited for user testing and debug purposes |
standard | 7 Days | 7 | N/A | N/A | Standard jobs | |
transfer | 2 Days | 1 | N/A | N/A | Data transfer for user jobs. Not charged against project allocation. See the AFRL DSRC Archive Guide, section 5.2. | |
Lowest |
6.3. Interactive Logins
When you log in to Coral, you will be running in an interactive shell on a login node. The login nodes provide login access for Coral and support such activities as compiling, editing, and general interactive use by all users. Please note the MHPCC DSRC Login Node Abuse policy. The preferred method to run resource-intensive interactive executions is to use an interactive batch session (see Interactive Batch Sessions below).
6.4. Batch Request Submission
Slurm batch jobs are submitted via the sbatch command. The format
of this command is:
sbatch [ options ] batch_script_file
sbatch options may be specified on the command line or embedded
in the batch script file by lines beginning with #SBATCH. Some
of these options are discussed in Batch Resource
Directives below. The batch script file is not required for interactive
batch sessions (see Interactive Batch Sessions).
For a more thorough discussion of Slurm Batch Submission, see the Coral Slurm Guide.
6.5. Batch Resource Directives
Batch resource directives allow you to specify how your batch jobs should be run and the resources your job requires. Although Slurm has many directives, you only need to know a few to run most jobs.
Slurm directives can be specified in your batch script or on the command line.
The syntax for a batch file is as follows:
#SBATCH --directive1[=value1]
#SBATCH --directive2[=value2]
...
Command lines may use sbatch or salloc depending on
whether you are submitting for batch processing or running interactively. Syntax
is as follows:
sbatch --directive1[=value1] --directive2[=value2] ...
salloc --directive1[=value1] --directive2[=value2] ...
Some directives may require values. For example, to request 32 processes per
node, use the following:
#SBATCH --ntasks-per-node=32
The sbatch command requires a batch file. For salloc,
a batch file is optional. If no batch file is specified, then all required
directives must be specified on the command line, as follows:
salloc --nodes=N1 --ntasks=N2 --account=Project_ID --qos=Queue_Name --time=HH:MM:SS ...
You must specify the desired maximum walltime (HH:MM:SS), Project_ID, and Queue_Name. N1 is the number of nodes requested. N2 is the total number of tasks (usually one task per core unless specified otherwise).
Note, command-line use is required for interactive batch sessions (see Interactive Batch Sessions) since no batch file is specified.
The following directives are required for all jobs:
Directive (Long form) | Short form | Description |
---|---|---|
--account=Project_ID | -A | Name of the project |
--partition=Queue_Name | -p | Name of the queue |
--nodes=# --ntasks=# --ntasks-per-node=# |
-N -n |
Total number of cores (across all nodes) Total number of tasks (across all nodes) The number of cores to use per node (Note: You must use any two of these three directives, but ntasks-per-node defaults to the number of cores per node.) |
--time=HH:MM:SS | -t | Maximum wall time in hours, minutes, and seconds. (Note: Additional time formats are supported; see man salloc) |
The following directives are optional but are commonly used:
Directive (Long form) | Short form | Description |
---|---|---|
--gres=gpu:type:# | Number of GPUs requested. | |
--exclusive | Specifies exclusive access to nodes and resources | |
--job-name=Job_Name | -J | Name of the job |
--error=File_Name | -e | Redirect standard error to the named file |
--output=File_Name | -o | Redirect standard output to the named file |
--pty | Request a shell for an interactive job | |
--export Variable_List | Export environment variables to the job. Use "ALL" to export all |
A more complete listing of batch resource directives is available in the Coral Slurm Guide.
6.6. Interactive Batch Sessions
An interactive batch session allows you to run interactively (in a command shell) on a compute node after waiting in the batch queue.
You can run an interactive job like this:
srun --account=Project_ID --nodes=1 --ntasks-per-node=1 --time=01:00:00 --pty bash -i
Your batch shell request will be placed in the interactive queue and scheduled for execution. This may take a few minutes or a long time depending on the system load. Once your shell starts, you will be logged into the first compute node of the compute nodes assigned to your interactive batch job. At this point, you can run or debug applications interactively, execute job scripts, or start executions on the compute nodes you were assigned. The -X option enables X-Windows access, so it may be omitted if that functionality is not required for the interactive job.
6.7. Launch Commands
There are different commands for launching parallel executables, including MPI, from within a batch job depending on which MPI implementation or other parallel library your code uses. See the Programming Models section for more information on launching executables within a batch session.
6.8. Sample Scripts
The following example is a good starting template for a batch script to run a serial job for one hour:
#!/bin/bash # # Specify name of the job (Optional Directive) #SBATCH --job-name=serialjob # # Append std output to file serialjob.out (Optional Directive) #SBATCH --output=serialjob.out # # Append std error to file serialjob.err (Optional Directive) #SBATCH --error=serialjob.err # # Specify Project ID to be charged (Required Directive) #SBATCH --account=Project_ID # # Request wall clock time of 1 hour (Required Directive) #SBATCH --time=01:00:00 # # Specify queue (partition) name (Required Directive) #SBATCH --partition=standard # # Specify the number of nodes requested (Required Directive) #SBATCH --nodes=1 # # Specify the number of tasks per node (Optional Directive) #SBATCH --ntasks-per-node=1 # # Change to the specified directory, in this case, the user's work directory cd $WORKDIR # # Execute the serial executable on 1 core ./serial_application # End of batch job
The first few lines tell Slurm to save the standard output and error output to the given files and give the job a name. Skipping ahead, we estimate the run-time to be about one hour,which we know is acceptable for the standard batch queue. We need one core in total, so we request one core. The resource allocation is one full 32-core node for exclusive use by the job.
Important! Except for jobs in the transfer queue, which use shared nodes, jobs on standard nodes are charged for full 32-core nodes, even if you do not use all cores on the node.
The following example is a good starting template for a batch script to run a parallel (MPI) job for two hours:
#!/bin/bash # ## Required Slurm Directives -------------------------------------- #SBATCH --account=Project_ID #SBATCH --partition=standard #SBATCH --nodes=2 # ntasks-per-node is not defined so it defaults to cores-per-node (32) #SBATCH --time=02:00:00 # ## Optional Slurm Directives -------------------------------------- #SBATCH --job-name=Test_Run_1 #SBATCH --export=ALL # ## Execution Block ---------------------------------------------- # Environment Setup # Get sequence number of unique job identifier JOBID=`echo $SLURM_JOB_ID` # # create and cd to job-specific directory in your personal directory # in the scratch file system ($WORKDIR/$JOBID) # mkdir $WORKDIR/$JOBID cd $WORKDIR/$JOBID # # Launching # copy executable from $HOME and execute it with a .out output file # cp $HOME/my_mpi_program . # module load openmpi4-cuda11.8-ofed5-gcc11/4.1.4 mpiexec -n 256 ./my_mpi_program > my_mpi_program.out # # Don't forget to archive and clean up your results (see the MHPCC DSRC Archive Guide for details)
We estimate the run time to be about two hours, which we know is acceptable for the standard batch queue. The optional Slurm lines give the job a name and import all environmental variables. This job is requesting two nodes, which is 64 total cores and 32 cores per node. The default value for number of cores per node is 32.
A common concern for MPI users is the need for more memory for each process. By default, one MPI process is started on each core of a node. This means on Coral standard nodes, the available memory on the node is split 32 ways. To allow an individual process to use more of the node's memory, you need to start fewer processes on each node. To do this, you must request more nodes from Slurm but run on fewer cores on each. For example, the following Slurm statements request four nodes with 32 cores per node, but it only uses 16 of those cores for MPI processes on each node:
#!/bin/bash # #### Starts 64 MPI processes; only 16 on each node #SBATCH --nodes=4 #SBATCH --ntasks-per-node=16 #SBATCH --account=Project_ID #SBATCH --partition=standard #SBATCH --time=02:00:00 # ## execute on 4 nodes, total of 64 MPI processes across all module load openmpi4-cuda11.8-ofed5-gcc11/4.1.4 mpiexec -n 64 ./a.out # # Don't forget to archive and clean up your results (see the MHPCC DSRC Archive Guide for details)
Further sample scripts can be found in the Coral Slurm Guide and in the Sample Code Repository ($SAMPLES_HOME) on the system. There is also an extensive discussion in the MHPCC DSRC Archive Guide of sample scripts to perform data staging in the transfer queue using chained batch scripts to archive and clean up your work directory results files.
6.9. Slurm Commands
The following commands provide the basic functionality for using the Slurm batch system:
Submit jobs for batch processing:
sbatch [sbatch_options] my_job_script
Check the status of submitted jobs:
squeue JOBID ##check one job squeue -u my_user_name ##check all of your jobs
Kill queued or running jobs:
scancel JOBID
A more complete list of Slurm commands is available in the Coral Slurm Guide.
6.10. Advance Reservations
7. Software Resources
7.1. Application Software
A complete list of the software versions installed on Coral can be found on the software page. The general rule is that the two latest versions of all COTS software packages are maintained on our systems. For convenience, modules are also available for most COTS software packages. The following are other available software-related services:
- The Software License Buffer provides access to commercial software licenses on compute nodes. See the SLB User Guide.
- Singularity is the approved software for running and building containers. Containers allow you to deploy or use applications with all their software dependencies packaged together. See the Introduction to Singularity.
- The HPCMP Portal is a web interface for several graphics and web-based applications. It also includes virtual desktops for most HPC systems. See the HPC Portal Page.
- The Secure Remote Desktop (SRD) is a client-based VNC virtual desktop application that supports graphical acceleration on GPU nodes for intensive visualization. See the SRD User Guide.
- GitLab is a web-based source code management platform. See the GitLab User Guide.
7.2. Useful Utilities
The following utilities are available on Coral. For command-line syntax and examples of usage, please see each utility's online man page.
Name | Description |
---|---|
show_queues | Report current batch queue status, usage, and limits |
show_storage | Display disk/file usage and quota information |
show_usage | Display CPU allocation and usage by subproject |
7.3. Sample Code Repository
The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area and is automatically defined in your login environment.
8. Links to Vendor Documentation
8.1. GNU Links
GNU Home: https://www.gnu.org
GNU Compiler: https://gcc.gnu.org
9. Glossary
- Batch Job :
- a single request for a set of compute nodes along with a set of tasks (usually in the form of a script) to perform on those nodes
- Batch-scheduled :
- users request compute nodes via commands to batch scheduler software and wait in a queue until the requested nodes become available
- Compute Node :
- a node that performs computational tasks for the user. There may be multiple types of compute nodes for specialized purposes.
- Distributed Memory Model :
- a programming methodology where memory is distributed across multiple nodes giving processes on each node faster direct access to local memory, but requiring slower techniques such as message passing to access memory on other nodes
- Interconnect :
- a specialized, very high-speed network that connects the nodes of an HPC system together. It is typically used for application inter-process communication (e.g., message passing) and I/O traffic.
- Kerberos :
- authentication and encryption software required by the HPCMP to access HPC system login nodes and other resources. See Kerberos & Authentication
- Login Node :
- a node that serves as the user's entry point into an HPC system
- Node :
- an individual server in a cluster or collection of servers of an HPC system
- Parallel File System :
- A software component designed to store data across multiple networked servers and to facilitate high-performance access through simultaneous, coordinated input/output operations (IOPS) between clients and storage nodes.
- a programming methodology where a set of processors (such as the cores within one node) have direct access to a shared pool of memory