Utility Server User Guide

Table of Contents

Table of Contents

View: Condensed | Full

1. Introduction

1.1. Document Scope and Assumptions

This document provides an overview and introduction to the use of the Utility Server and a description of the specific computing environment on the Utility Server. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

  • Use of the UNIX operating system
  • Use of an editor (e.g., vi or emacs)
  • Remote usage of computer systems via network access
  • A selected programming language and its related tools and libraries

1.2. Policies to Review

Users are expected to be aware of the following policies for working on the Utility Server.

1.2.1. Login Node Abuse Policy

The login nodes provide login access for the Utility Server and support such activities as compiling, editing, and general interactive use by all users. Consequently, memory or CPU-intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring less than 10 minutes of runtime and less than 8 GBytes of memory are allowed on the login nodes. Any job running on the login nodes that exceeds these limits may be unilaterally terminated.

1.2.2. Purge Policy

Close management of file system space is a high priority. Each DSRC implements local purge policies to govern how long files may be retained on the $WORKDIR and the $CENTER file systems. In general, files may be retained for 30 days after which they may be purged. However, each center's policies may vary if available space becomes critically low. For details of the purge policy on a particular Utility Server, contact the CCAC Helpdesk.

In all cases, you are responsible for archiving your own data to the long-term storage. Files not archived by the user within the retention period may be deleted and cannot be retrieved.

1.3. Obtaining an Account

Your account on the Utility Server is an unallocated account, and is automatically provided to you when your High Performance Computing Account is requested.

1.4. Requesting Assistance

The Consolidated Customer Assistance Center (CCAC) is available to help users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m - 11:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

For more detailed contact information, please see our Contact Page.

1.5. Available Services

The Utility Server is primarily intended to provide reliability and stability for the following services:

  • Access to the Center-Wide File System
  • Center-wide remote job management
  • Remote visualization
  • General Processing computing on Graphics Processing Units (GPGPU).

The following functions are available as well:

  • Interactive use, including:
  • Debugging
  • Complete pre- and post- processing
  • Building applications
  • Remote visualization tasks
  • Batch processing during low-usage times
  • Access to GPGPUs

2. System Configuration

2.1. System Summary

The Utility Server is a mixed-node cluster consisting of login nodes and three types of compute nodes: standard-memory, graphics, and large-memory. Login nodes are identical to standard-memory nodes, but have only 64 GBytes of RAM. The standard-memory and graphics nodes each have two AMD Opteron 2.3-GHz processors. The graphics nodes include an NVDIA Tesla M2050 for graphics acceleration. The large-memory nodes feature twice the number of cores and memory as the standard-memory nodes. The Utility Server uses a QDR InfiniBand network as its high-speed interconnect for MPI messages and IO traffic.

Node Configuration
Login Nodes Compute Nodes
Standard Memory Graphics Large Memory
Total Cores | Nodes 32 | 2 448 | 28 (ORS only)

192 | 12 (MHPCC only)

704 | 44
224 | 14 (ORS only)

96 | 6 (MHPCC only)

352 | 22
448 | 14 (ORS only)

192 | 6 (MHPCC only)

704 | 22
Operating System RHEL 6.4
Cores/Node 16 32
Core Type AMD Opteron 6134
Magny-Cours (x2)
AMD Opteron 6134
Magny-Cours (x4)
Core Speed 2.3 GHz
GPU Type N/A NVIDIA Tesla M2050 N/A
Memory/Node 64 GBytes 128 GBytes 256 GBytes
Accessible Memory/Node 62 GBytes 126 GBytes 254 GBytes
Memory Model Shared on node. Shared on node.
Distributed across cluster.
Interconnect Type QDR InfiniBand

2.2. Processors

The Utility Server uses eight-core 2.3-GHz AMD Opteron Magny-Cours processors. These processors have 512 KBytes of L2 cache per core and 12 MBytes of L3 cache.

The login, standard-memory, and graphics nodes have two processors per node, for a total of 16 cores per node.

The large-memory nodes have four processors per node, for a total of 32 cores per node.

2.3. Memory

The Utility Server uses both shared- and distributed-memory models. Memory is shared among all the cores on a node, but is distributed (not shared) among the nodes across the cluster.

2.4. Operating System

The operating system on the Utility Server is Red Hat Enterprise Linux (RHEL) 6.

2.5. File Systems

The Utility server has the following file systems available for user storage:

2.5.1. /u/home/

This file system is locally mounted from the Utility Server's Panasas PanFS file system. It has a formatted capacity of 20 TBytes. All users have a home directory located on this file system that can be referenced by the environment variable $HOME. This file system is not backed up. You are responsible for making backups of your files to archive storage or to another local system.

2.5.2. /u/work/

This file system is locally mounted from the Utility Server's Panasas PanFS file system. It has a formatted capacity of 200 TBytes. All users have a work directory located on this file system which can be referenced by the environment variable $WORKDIR. This file system is not backed up. You are responsible for making backups of your files to archive storage or to another local system.

2.5.3. /p/cwfs/

The Center-Wide File System (CWFS) has a formatted capacity of 810 TBytes. All users have a directory on this file system (/p/cwfs/username) which can be referenced by the environment variable $CENTER. It is accessible from all nodes on the Utility Server and from the HPC login nodes. It is intended for short-term storage (no longer than 30 days).

2.6. Peak Performance

The Utility Server is rated at 11.51 TFLOPS, not counting the GPUs, and 39.18 TFLOPS counting the GPUs.

3. Accessing the System

3.1. Kerberos

A Kerberos client kit must be installed on your desktop to enable you to get a Kerberos ticket. Kerberos is a network authentication tool that provides secure communication by using secret cryptographic keys. Only users with a valid HPCMP Kerberos authentication can gain access to the Utility Server. More information about installing Kerberos clients on your desktop can be found at HPC Centers: Kerberos & Authentication.

3.2. Logging In

The table below shows Kerberized ssh and Kerberized rlogin commands for logging into the Utility Servers at all sites.

Login Commands
SiteCommand
Kerberized sshKerberized rlogin
ARL % ssh us.arl.hpc.mil % krlogin us.arl.hpc.mil
AFRL % ssh us.afrl.hpc.mil % krlogin us.afrl.hpc.mil
ORS % ssh us.ors.hpc.mil % krlogin us.ors.hpc.mil
ERDC % ssh us.erdc.hpc.mil % krlogin us.erdc.hpc.mil
Navy % ssh us.navo.hpc.mil % krlogin us.navo.hpc.mil
MHPCC % ssh us.mhpcc.hpc.mil % krlogin us.mhpcc.hpc.mil

3.3. File Transfers

File transfers to DSRC systems (including those to the local archive server) must be performed using Kerberized versions of the following tools: rcp, scp, ftp, sftp, and mpscp. Before using any Kerberized tool, you must use a Kerberos client to obtain a Kerberos ticket. Information about installing and using a Kerberos client can be found at HPC Centers: Kerberos & Authentication.

Files can also be transferred to local systems through the CWFS using simple copy commands.

Examples:

The command below uses secure copy (scp) to copy a single local file to the local archive server.

% scp local_file $(ARCHIVE_HOST}:${ARCHIVE_HOME}

The command below uses scp to copy a single local file into a destination directory on the Navy DSRC Utility Server. The mpscp command is similar to the scp and rcp commands, but has a different underlying means of data transfer, and may enable greater transfer rate. The mpscp and rcp commands have the same syntax as scp.

% scp local_fileus.navo.hpc.mil:/target_dir

The three commands, scp, rcp, and mpscp, can be used to send multiple files. The following command transfers all files with the .txt extension to the same destination directory.

% scp *.txt us.navo.hpc.mil:/target_dir

The example below uses the secure file transfer protocol (sftp) to connect to the Utility Server, then uses the sftp cd and put commands to change to the destination directory and copy a local file there. The sftp quit command ends the sftp session. Use the sftp help command to see a list of all sftp commands.

% sftp us.navo.hpc.mil

sftp> cd target_dir
sftp> put local_file
sftp> quit

The Kerberized file transfer protocol (kftp) command differs from sftp in that you are prompted for your username.

% kftp us.navo.hpc.mil

username> user
kftp> cd target_dir
kftp> put local_file
kftp> quit

Windows users may use a graphical file transfer protocol (ftp) client such as Filezilla.

4. User Environment

4.1. User Directories

The following user directories are provided for all users on the Utility Server.

4.1.1. Home Directory

When you log on to the Utility Server, you will be placed in your home directory, /u/home/username. The environment variable $HOME is automatically set for you and refers to this directory. $HOME is visible to the login nodes on the HPC systems, and may be used to store day-to-day items (small user files, binaries, scripts, etc.). It has a quota of 10 GBytes and is not backed up; therefore, it should not be used for long-term storage.

4.1.2. Work Directory

The Utility Server has one large file system (/u/work) for the temporary storage of data files needed for executing programs. You may access your personal working directory under /u/work by using the $WORKDIR environment variable, which is set for you upon login. Your $WORKDIR directory has a 100-TByte quota. Because of high usage, the /u/work file system tends to fill up frequently. Please review the Purge Policy and be mindful of your disk usage.

REMEMBER: /u/work is a scratch file system and is not backed up. You are responsible for managing files in your $WORKDIR by backing up files to the archive server and deleting unneeded files when your jobs end.

All of your jobs should execute from your $WORKDIR directory, not $HOME. While not technically forbidden, jobs that are run from $HOME are subject to a much smaller quota and have a much greater chance of failing.

To avoid unusual errors that can arise from two jobs using the same scratch directory, a common technique is to create a unique subdirectory for each batch job by including the following lines in your batch script:

TMPD=${WORKDIR}/${PBS_JOBID}
mkdir -p ${TMPD}
4.1.3. Center Directory

The Center-Wide File System (/p/cwfs/username) provides short-term file storage that is accessible by the HPC login nodes and by all nodes of the Utility Server. The main purpose of this area is the staging of production system output files for post-processing on the Utility Server. It also permits file transfer between HPC systems and the Utility Server using simple copy commands.

Your personal directory on the CWFS has a 200-TByte quota and may be accessed using the $CENTER environment variable, which is set for you upon login.

Because the compute nodes on the HPC systems are unable to see /p/cwfs, you will need to transfer output files from $WORKDIR to $CENTER from a login node. This may be done manually or through the transfer queue, which executes on the login nodes.

REMEMBER: The CWFS is meant for short-term storage (no longer than 30 days). You must archive your own data from the CWFS to the archive system. Files not archived by the user within the CWFS 30-day retention period may be deleted and cannot be retrieved.

4.2. Shells

The following shells are available on the Utility Server: csh, bash, ksh, tcsh, and sh. To request a change of your default shell, contact the Consolidated Customer Assistance Center (CCAC).

4.3. Environment Variables

A number of environment variables are provided by default on all HPCMP high performance computing (HPC) systems. We encourage you to use these variables in your scripts where possible. Doing so will help to simplify your scripts and reduce portability issues if you ever need to run those scripts on other systems. The following environment variables are automatically set in your login environment:

4.3.1. Login Environment Variables

The following environment variables are common to both the login and batch environments:

Common Environment Variables
Variable Description
$ARCHIVE_HOME Your directory on the archive server.
$ARCHIVE_HOST The host name of the archive server.
$BC_HOST The generic (not node specific) name of the system.
$CC The currently selected C compiler. This variable is automatically updated when a new compiler environment is loaded.
$CENTER Your directory on the Center-Wide File System (CWFS).
$CSI_HOME The directory containing the following list of heavily used application packages: ABAQUS, Accelrys, ANSYS, CFD++, Cobalt, EnSight, Fluent, GASP, Gaussian, LS-DYNA, MATLAB, and TotalView, formerly known as the Consolidated Software Initiative (CSI) list. Other application software may also be installed here by our staff.
$CXX The currently selected C++ compiler. This variable is automatically updated when a new compiler environment is loaded.
$DAAC_HOME The directory containing DAAC supported visualization tools ParaView, VisIt, EnSight, and ezViz.
$F77 The currently selected Fortran 77 compiler. This variable is automatically updated when a new compiler environment is loaded.
$F90 The currently selected Fortran 90 compiler. This variable is automatically updated when a new compiler environment is loaded.
$HOME Your home directory on the system.
$JAVA_HOME The directory containing the default installation of Java.
$KRB5_HOME The directory containing the Kerberos utilities.
$PET_HOME The directory containing the tools installed by the PET ACE staff. The supported software includes a variety of open-source math libraries. ( BC policy FY13-01).
$PROJECTS_HOME A common directory where group-owned and supported applications and codes may be maintained for use by members of a group. Any project may request a group directory under $PROJECTS_HOME.
$SAMPLES_HOME The Sample Code Repository. This is a collection of sample scripts and codes provided and maintained by our staff to help users learn to write their own scripts. There are a number of ready-to-use scripts for a variety of applications.
$WORKDIR Your work directory on the local temporary file system (i.e., local high-speed disk).
4.3.2. Batch-Only Environment Variables

In addition to the variables listed above, the following variables are automatically set only in your batch environment. That is, your batch scripts will be able to see them when they run. These variables are supplied for your convenience and are intended for use inside your batch scripts.

Batch-Only Environment Variables
Option Purpose
$BC_CORES_PER_NODE The number of cores per node for the compute node on which a job is running.
$BC_MEM_PER_NODE The approximate maximum user-accessible memory per node (in integer MBytes) for the compute node on which a job is running.
$BC_MPI_TASKS_ALLOC The number of MPI tasks allocated for a job.
$BC_NODE_ALLOC The number of nodes allocated for a job.

4.4. Modules

Software modules are a convenient way to set needed environment variables and include necessary directories in your path so that commands for particular applications can be found. The Utility Server uses "modules" to initialize your environment with COTS application software, system commands and libraries, compiler suites, environment variables, and PBS batch system commands.

A number of modules are loaded automatically as soon as you log in. To see the modules which are currently loaded, use the module list command. To see the entire list of available modules, use module avail. You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the Modules User Guide.

4.5. Archive Usage

The Utility Server does NOT have direct access to the archive storage capability provided by the local center. For long-term storage, users will need to manually move files to archive storage using scp, rcp, ftp, sftp, or mpscp.

The example below uses secure copy (scp) to copy a single local file to the local archive server.

% scp local_file $(ARCHIVE_HOST}:${ARCHIVE_HOME}

Remember that $CENTER and $WORKDIR are intended for short-term storage (no longer than 30 days). You must archive your own data from $CENTER and $WORKDIR to the archive storage. Files not archived by the user within the 30-day retention period may be deleted and cannot be retrieved.

5. Program Development

5.1. Programming Models

The Utility Server supports three parallel programming models: Message Passing Interface (MPI), Open Multi-Processing (OpenMP), and General Processing computing on Graphics Processing Units (GPGPU). A Hybrid MPI/OpenMP programming model is also supported. MPI is an example of the message- or data-passing models, while OpenMP uses only shared memory on a node by spawning threads. GPGPU programming utilizes the NVDIA Tesla M2050 for computation acceleration.

5.1.1. Message Passing Interface (MPI)

The Message Passing Interface (MPI) is part of the software support for parallel programming across a network of computer systems through a technique known as message passing. MPI establishes a practical, portable, efficient, and flexible standard for message passing that makes use of the most attractive features of a number of existing message-passing systems, rather than selecting one of them and adopting it as the standard. See man intro_mpi for additional information.

When creating an MPI program on the Utility Server, ensure the following:

  • That the default MPI module (mpi/pgi/openmpi/version_number) has been loaded. To check this, run the module list command. If mpi/pgi/openmpi/version_number is not listed and a different MPI module is listed, use the following command:

    module swap other_mpi_module mpi/pgi/openmpi/version_number

    If no MPI module is loaded, load the mpi/pgi/openmpi/version_number module.

    module load mpi/pgi/openmpi/version_number
  • That the source code includes one of the following lines:

    INCLUDE "mpif.h"        ## for Fortran, or
    #include <mpi.h>        ## for C

To compile an MPI program, use the following examples:

mpif90 -o mpi_program mpi_program.f         ## for Fortran, or
mpicc -o mpi_program mpi_program.c            ## for C

The program can then be launched using the mpirun command, as follows:

mpirun -n mpi_procs mpi_program [user_arguments]

where mpi_procs is the number of MPI processes being started. For example:

#### starts 64 mpi processes; 32 on each node, one per core
## request 2 nodes, each with 32 cores and 32 processes per node
#PBS -l select=2:ncpus=32:mpiprocs=32
mpirun -n 64 ./a.out

By default, one MPI process is started on each core of a node. This means that on the Utility Server, the available memory on the node is split 32 ways. A common concern for MPI users is the need for more memory for each process. To allow an individual process to use more of the node's memory, you need to allow some cores to remain idle, using the "-N" option, as follows:

mpirun -n mpi_procs -N mpi_procs_per_node mpi_program [user_arguments]

where mpi_procs_per_node is the number of MPI processes to be started on each node. For example:

####   starts 32 mpi processes; only 16 on each node
## request 2 nodes, each with 32 cores and 16 processes per node
#PBS -l select=2:ncpus=32:mpiprocs=16
mpirun -n 32 -N 16 ./a.out  ## (assigns only 16 processes per node)

For more information about mpirun, see the mpirun man page.

5.1.2. Open Multi-Processing (OpenMP)

OpenMP is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications. It supports shared-memory multiprocessing programming in C, C++ and Fortran, and consists of a set of compiler directives, library routines, and environment variables that influence compilation and run-time behavior.

When creating an OpenMP program on the Utility Server, ensure the following:

That the source code includes one of the following lines:

INCLUDE "ompf.h"     ## for Fortran, or
#include <omp.h>     ## for C

To compile an OpenMP program, use the following examples:

For C codes:

pgcc -o OpenMP_program -mp=nonuma OpenMP_program.c  ## PGI 
icc -o OpenMP_program -openmp OpenMP_program.c      ## Intel
gcc -o OpenMP_program -fopenmp OpenMP_program.c     ## GNU

For Fortran codes:

pgf90 -o OpenMP_program -mp=nonuma OpenMP_program.f ## PGI 
ifort -o OpenMP_program -openmp OpenMP_program.f    ## Intel
f95 -o OpenMP_program -fopenmp OpenMP_program.f     ## GNU

When running OpenMP applications, the $OMP_NUM_THREADS environment variable must be used to specify the number of threads. For example:

setenv OMP_NUM_THREADS 16
./OpenMP_program [user_arguments]

In the example above, the application starts the OpenMP_program on one node and spawns a total of 32 threads. Since the Utility Server has 16 cores per compute node, this yields 1 thread per core.

5.1.3. Hybrid Processing (MPI/OpenMP)

An application built with the hybrid model of parallel programming can run on the Utility Server by using both OpenMP and Message Passing Interface (MPI). In hybrid applications, OpenMP threads can be spawned by MPI processes, but MPI calls should not be issued from OpenMP parallel regions or by an OpenMP thread.

When creating a hybrid (MPI/OpenMP) program on the Utility Server, follow the instructions in the MPI and OpenMP sections above for creating your program. Then use the compilation instructions for OpenMP.

Use the mpirun command and the $OMP_NUM_THREADS environment variable to run a hybrid program.

mpirun -x OMP_NUM_THREADS -npernode 1

5.2. Available Compilers

The Utility Server has three compiler suites:

  • PGI Accelerator Compiler Suite with OpenACC Directives (default)
  • GNU Compiler Suite
  • Intel Compiler Suite

The PGI programming environment is loaded for you by default. To use a different suite, you will need to swap modules. See Relevant Modules (below) to learn how.

5.2.1. Portland Group (PGI) Compiler Suite

The PGI Programming Environment provides a large number of options that are the same for all compilers in the suite. The following table lists some of the more common options that you may use:

PGI Compiler Options
OptionPurpose
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable "outfile" rather than the default "a.out".
-Olevel Set the optimization level. For more information on optimization, see the section on Compiler Optimization Options.
-Mfree Process Fortran codes using free form.
-i8, -r8 Treat integer and real variables as 64-bit.
-Mbyteswapio Big-endian files; the default is for little-endian.
-g Generate symbolic debug information.
-Mbounds Add array bound checking.
-Minfo=all Reports detailed information about code optimizations to stdout as compile proceeds.
-Mlist Generate a file containing the compiler flags used and a line numbered listing of the source code.
-mp=nonuma Recognize OpenMP directives.
-Bdynamic Compiling using shared objects requires CCM mode for execution on compute nodes.
-Ktrap=* Trap errors such as floating point, overflow, and divide by zero (see man page).
-fPIC Generate position-independent code for shared libraries.

Detailed information about these and other compiler options is available in the PGI compiler (pgf95, pgcc, and pgCC) man pages on the Utility Server.

5.2.2. Intel Compiler Environment

The following table lists some of the more common options that you may use:

Intel Compiler Options
OptionPurpose
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable outfile rather than the default a.out.
-Olevel Set the optimization level. For more information on optimization, see the section on Compiler Optimization Options.
-free Process Fortran codes using free form.
-convert big_endian Big-endian files; the default is for little-endian.
-g Generate symbolic debug information.
-openmp Recognize OpenMP directives.
-Bdynamic Compiling using shared objects requires CCM mode for execution on compute nodes.
-fpe-all=0 Trap floating point, divide by zero, and overflow exceptions.
-fPIC Generate position-independent code for shared libraries.

Detailed information about these and other compiler options is available in the Intel compiler (ifort, icc, icpc) man pages on the Utility Server.

5.2.3. GNU Compiler Collection

The GNU Programming Environment provides a large number of options that are the same for all compilers in the suite. The following table lists some of the more common options that you may use:

GNU Compiler Options
OptionPurpose
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable outfile rather than the default a.out.
-Olevel Set the optimization level. For more information on optimization, see the section on Compiler Optimization Options.
-g Generate symbolic debug information.
-fconvert=big-endian Big-endian files; the default is for little-endian.
-Wextra
-Wall
Turns on increased error reporting.
-c Generate intermediate object file but do not attempt to link.
-I directory Search in directory for include or module files.
-L directory Search in directory for libraries.
-o outfile Name executable "outfile" rather than the default "a.out".

Detailed information about these and other compiler options is available in the GNU compiler (gfortran, gcc, and g++) man pages on the Utility Server.

5.3. Relevant Modules

By default, the Utility Server loads the PGI programming environment for you. The Intel and GNU environments are also available. To use either of these, the PGI module must be unloaded and replaced with the one you wish to use. To do this, use the module swap command as follows:

module swap compiler/pgi/version_number compiler/intel/version_number
module swap compiler/pgi/version_number compiler/gnu/version_number

Use the module avail command to see all the available compiler versions for PGI, Intel, and GNU.

5.4. Libraries

Per Baseline Configuration policy, the following libraries will be available on the Utility Server:

  • SCALASCA - Scalable trace analysis package
  • PDT - Source-level auto-instrumentation
  • Valgrind - Memory management analysis and profiling

5.5. Debuggers

Per Baseline Configuration policy, the following debuggers are available on the Utility Server:

  • GNU Project Debugger (GDB)
  • TotalView

The Utility Server provides the TotalView debugger and the DDT Debugger to assist users in debugging their code.

5.5.1. TotalView

TotalView is a debugger that supports threads, MPI, OpenMP, C/C++, and Fortran, mixed-language codes, advanced features like on-demand memory leak detection, other heap allocation debugging features, and the Standard Template Library Viewer (STLView). Unique features like dive, a wide variety of breakpoints, the Message Queue Graph/Visualizer, powerful data analysis, and control at the thread level are also available.

Follow the steps below to use TotalView on the Utility Server via a UNIX X-Windows interface.

  1. Ensure that an X server is running on your local system. Linux users will likely have this by default, but MS Windows users will need to install a third party X Windows solution. There are various options available.
  2. For Linux users, connect to the Utility Server using "ssh -Y". Windows users will need to use PuTTY with X11 forwarding enabled (Connection->SSH->X11->Enable X11 forwarding).
  3. Compile your program on the Utility Server with the "-g" option.
  4. Submit an interactive job:

    qsub -l select=1:ncpus=32:mpiprocs=32 -A Project_ID
    -l walltime=00:30:00 -q debug -X -I

    Once your job has been scheduled, you will be logged into an interactive batch session on a service node that is shared with other users.

  5. Load the TotalView module:

    . /usr/share/modules/init/ksh
    module load totalview
  6. Start program execution:

    totalview mpirun -a -n 4 ./my_mpi_prog.exe arg1 arg2 ...
  7. After a short delay, the TotalView windows will pop up. Click "GO" and then "Yes" to start program execution.

5.6. Compiler Optimization Options

The "-Olevel" option enables code optimization when compiling. The level that you choose (0-4) will determine how aggressive the optimization will be. Increasing levels of optimization may increase performance significantly, but you should note that a loss of precision may also occur. There are also additional options that may enable further optimizations. The following table contains the most commonly used options.

Compiler Optimization Options
Option Description Compiler Suite
-O0 No Optimization. (default in GNU) All
-O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimization. All
-O2 Level 1 plus traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. Generally safe and beneficial. (default in PGI, Cray, & Intel) All
-O3 Levels 1 and 2 plus more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable. Generally beneficial. All
-O4 Levels 1, 2, and 3 plus hoisting of guarded invariant floating point expressions is enabled. PGI
-fast
-fastsse
Chooses generally optimal flags for the target platform. Includes: -O2 -Munroll=c:1 -Mnoframe -Mlre -Mautoinline -Mvect=sse -Mscalarsse -Mcache_align -Mflushz. PGI
-Mipa=fast,inline Performs Interprocedural Analysis (IPA) with generally optimal IPA flags for the target platform, and inlining. IPA can be very time-consuming. Flag must be used in both compilation and linking steps. PGI
Minline=levels:n Number of levels of inlining (default: n = 1) PGI
-fipa-* The GNU compilers automatically enable IPA at various -O levels. To set these manually, see the options beginning with -fipa in the gcc man page. GNU
-finline-functions Enables function inlining within a single file Intel
-ipon Enables interprocedural optimization between files and produces up to n object files Intel
-inline-level=n Number of levels of inlining (default: n=2) Intel
-Mlist Creates a listing file with optimization info PGI
-Minfo Info about optimizations performed PGI
-Mneginfo Info on why certain optimizations are not performed PGI
-opt-reportn Generate optimization report with n levels of detail Intel

6. Batch Scheduling

Although the Utility Server can run batch jobs, it is not meant for that purpose. Batch jobs should be reserved for times when Utility Server usage is low (i.e. evenings/weekends), but during high-usage times, the Utility Server is reserved for access to the Center-Wide File System, and file management, remote visualization, and Job management with Center-Wide Job Management.

6.1. Scheduler

The Portable Batch System Professional™ (PBSPro) is currently running on the Utility Server. It schedules jobs, manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. The PBS module is loaded for you automatically for you when you log in.

6.2. Queue Information

The following table describes the queues available on the Utility Server:

Queue Descriptions and Limits
Queue Name Max Wall Clock Time Max Cores Per Job Comments
Serial 18 Hours 32 ncpus
1 ngpus
This is the default queue for the Utility Server.
X11 18 Hours 32 ncpus
1 ngpus
Graphics queue. Same priority as Serial and VNC.
VNC 18 Hours 32 ncpus
1 ngpus
Graphics queue. Same priority as Serial and X11. Solely available for use by the Secure Remote Desktop (SRD) application.
Parallel 18 Hours 256 ncpus
4 ngpus
The queue for parallel jobs.
Transfer 24 Hours 1 ncpus For use in transferring data files for jobs.

6.3. Interactive Logins

When you log in to the Utility Server, you will be running in an interactive shell on a login node. The login nodes provide login access for the Utility Server and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse policy. The preferred method to run resource intensive executions is to use an interactive batch session.

6.4. Interactive Batch Sessions

To get an interactive batch session, you must first submit an interactive batch job through PBS. This is done by executing a qsub command with the -I option from within the interactive login environment. For example:

% qsub -I -l select=N1:ncpus=N2:mpiprocs=N3 -A Project_ID -q queue_name -l walltime=HHH:MM:SS

Both the number of nodes and processes per node are specified using the same directive as follows, where N1 is the number of nodes you are requesting and N3 is the number of processes per node. The value of ncpus, labeled as N2, refers to the number of physical cores available on each node, and must always be set to either 16 for compute nodes and graphics nodes, or 32 for large memory nodes. The additional values include your Project ID, the queue that the job should go into, and the desired maximum walltime.

6.5. Batch Request Submission

PBSPro ™ batch jobs are submitted via the qsub command. The format of this command is:

% qsub [ options ] batch_script_file

qsub options may be specified on the command line or embedded in the batch script file by lines beginning with #PBS.

For a more thorough discussion of PBS Batch Submission on the Utility Server, see the Utility Server PBS Guide.

6.6. Batch Resource Directives

Batch resource directives allow you to specify to PBS how your batch jobs should be run and what resources your job requires. Although PBS has many directives, you only need to know a few to run most jobs.

The basic syntax of PBS directives is as follows:

#PBS option[[=]value]

where some options may require values to be included. For example, to start a 16-process job, you would request one node of 32 cores and specify that you will be running 16 processes per node:

#PBS -l select=1:ncpus=32:mpiprocs=16

The following directives are required for all jobs:

Required Directives
Directive Value Description
-A Project_ID Name of the project
-q queue_name Name of the queue
-l select=N1:ncpus=N2:mpiprocs=N3 Number of nodes (N1)
Number of cores per node (N2: 16 or 32)
Number of processes per node (N3)
-l walltime=HHH:MM:SS Maximum wall time

A more complete listing of batch resource directives is available in the Utility Server PBS Guide.

Job submissions on the Utility Server will differ based on the type of nodes being requested. The following examples demonstrate job submissions for different node types.

For centers without the $ACCOUNT environment variable, the subproject account number should be used instead.

6.6.1. Requesting Large Memory Nodes

To request a large memory node, use a variation of the following launch command:

% qsub -I -A $ACCOUNT -lselect=1:ncpus=32:mem=200GB script_name

The above command requests 32 CPUs and 200 GBytes of memory. Users may request from 128 GBytes to 250 GBytes of memory.

6.6.2. Requesting Graphics Nodes

To request a graphics node, a variation of the following launch command can be used:

% qsub -I -A $ACCOUNT -lselect=1:ngpus=16 script_name

The above command requests 1 GPU.

6.6.3. Requesting Mixed-Nodes

To request a mixture of nodes, a variation of the following launch command can be used:

% qsub -I -A $ACCOUNT -lselect=1:ncpus=16:ngpus=1 script_name

The above command requests 16 CPUs and 1 GPU.

6.7. Launch Commands

On the Utility Server, the PBS batch scripts and the PBS interactive login session run on a compute node. Once on the compute node, you can use the mpirun command to run an MPI job across the compute nodes assigned to you.

6.8. Sample Scripts

All of the script examples shown below contain a "Cleanup" section which demonstrates how to automatically place your data on the CWFS using the transfer queue and how to clean up your $WORKDIR after your job completes.

6.8.1. MPI Script

The following script is for a 128 core MPI job running for 8 hours in the parallel queue.

#!/bin/bash
## Required Directives ------------------------------------
#PBS -q parallel
#PBS -l walltime=8:00:00
#PBS -l select=8:ncpus=16:mpiprocs=16
#PBS -j oe
#PBS -A Project_ID

## Optional Directives ------------------------------------
#PBS -N testjob
#PBS -j oe
#PBS -M my_email@yahoo.com
#PBS -m be

## Execution Block ----------------------------------------
# cd to your scratch directory in /work
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
mkdir -p ${JOBID}
cd ${JOBID}

# stage input data from CWFS
cp -r ${CENTER}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
mpirun -np 128 ./my_prog.exe > my_prog.out

## Cleanup ------------------------------------------------
# Using the "here document" syntax, create a job script
# for placing your data on CWFS.
cd ${WORKDIR}
rm -f cwfs_job
cat > cwfs_job <<END
#!/bin/bash
#PBS -l walltime=12:00:00
#PBS -q transfer
#PBS -A Project_ID
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -S /bin/bash
cd ${WORKDIR}
# Copy job output to CWFS
mkdir -p ${CENTER}/${JOBID}
cp ${JOBID}/my_prog.out ${CENTER}/${JOBID}
cp -r ${JOBID}/output_data_dir ${CENTER}/${JOBID}
# Remove scratch directory from the file system.
rm -rf ${JOBID}
END

# Submit the CWFS job script.
qsub cwfs_job
6.8.2. MPI Script (accessing large memory node)

The following script is for a 256 core MPI job running for 8 hours in the parallel queue.

#!/bin/bash
## Required Directives ------------------------------------
#PBS -q parallel
#PBS -l walltime=8:00:00
#PBS -l select=8:ncpus=32:mpiprocs=32
#PBS -j oe
#PBS -A Project_ID

## Optional Directives ------------------------------------
#PBS -N testjob
#PBS -j oe
#PBS -M my_email@yahoo.com
#PBS -m be

## Execution Block ----------------------------------------
# cd to your scratch directory in /work
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
mkdir -p ${JOBID}
cd ${JOBID}

# stage input data from CWFS
cp -r ${CENTER}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
mpirun -np 256 ./my_prog.exe > my_prog.out

## Cleanup ------------------------------------------------
# Using the "here document" syntax, create a job script
# for placing your data on CWFS.
cd ${WORKDIR}
rm -f cwfs_job
cat > cwfs_job <<END
#!/bin/bash
#PBS -l walltime=12:00:00
#PBS -q transfer
#PBS -A Project_ID
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -S /bin/bash
cd ${WORKDIR}
# Copy job output to CWFS
mkdir -p ${CENTER}/${JOBID}
cp ${JOBID}/my_prog.out ${CENTER}/${JOBID}
cp -r ${JOBID}/output_data_dir ${CENTER}/${JOBID}
# Remove scratch directory from the file system.
rm -rf ${JOBID}
END

# Submit the CWFS job script.
qsub cwfs_job
6.8.3. MPI Script (accessing graphics node)

The following script is for a 128 core MPI job running for 8 hours in the parallel queue.

#!/bin/bash
## Required Directives ------------------------------------
#PBS -q parallel
#PBS -l walltime=8:00:00
#PBS -l select=8:ncpus=16:mpiprocs=16:ngpus=1
#PBS -j oe
#PBS -A Project_ID

## Optional Directives ------------------------------------
#PBS -N testjob
#PBS -j oe
#PBS -M my_email@yahoo.com
#PBS -m be

## Execution Block ----------------------------------------
# cd to your scratch directory in /work
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
mkdir -p ${JOBID}
cd ${JOBID}

# stage input data from CWFS
cp -r ${CENTER}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
mpirun -np 128 ./my_prog.exe > my_prog.out

## Cleanup ------------------------------------------------
# Using the "here document" syntax, create a job script
# for placing your data on CWFS.
cd ${WORKDIR}
rm -f cwfs_job
cat > cwfs_job <<END
#!/bin/bash
#PBS -l walltime=12:00:00
#PBS -q transfer
#PBS -A Project_ID
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -S /bin/bash
cd ${WORKDIR}
# Copy job output to CWFS
mkdir -p ${CENTER}/${JOBID}
cp ${JOBID}/my_prog.out ${CENTER}/${JOBID}
cp -r ${JOBID}/output_data_dir ${CENTER}/${JOBID}
# Remove scratch directory from the file system.
rm -rf ${JOBID}
END

# Submit the CWFS job script.
qsub cwfs_job
6.8.4. Mixed-Node Sample Script

The following full PBS script demonstrates how to request two large memory nodes and one graphics node.

#!/bin/bash
## Required Directives ------------------------------------
#PBS -q parallel
#PBS -l walltime=8:00:00
#PBS -l select=2:ncpus=32
#PBS -l select=1:ncpus=16:mpiprocs=16:ngpus=1
#PBS -j oe
#PBS -A Project_ID

## Optional Directives ------------------------------------
#PBS -N testjob
#PBS -j oe
#PBS -M my_email@yahoo.com
#PBS -m be

## Execution Block ----------------------------------------
# cd to your scratch directory in /work
cd ${WORKDIR}

# create a job-specific subdirectory based on JOBID and cd to it
JOBID=`echo ${PBS_JOBID} | cut -d '.' -f 1`
mkdir -p ${JOBID}
cd ${JOBID}

# stage input data from CWFS
cp -r ${CENTER}/my_data_dir/*.dat .

# copy the executable from $HOME
cp ${HOME}/my_prog.exe .

## Launching ----------------------------------------------
mpirun -np 128 ./my_prog.exe > my_prog.out

## Cleanup ------------------------------------------------
# Using the "here document" syntax, create a job script
# for placing your data on CWFS.
cd ${WORKDIR}
rm -f cwfs_job
cat > cwfs_job <<END
#!/bin/bash
#PBS -l walltime=12:00:00
#PBS -q transfer
#PBS -A Project_ID
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -S /bin/bash
cd ${WORKDIR}
# Copy job output to CWFS
mkdir -p ${CENTER}/${JOBID}
cp ${JOBID}/my_prog.out ${CENTER}/${JOBID}
cp -r ${JOBID}/output_data_dir ${CENTER}/${JOBID}
# Remove scratch directory from the file system.
rm -rf ${JOBID}
END

# Submit the CWFS job script.
qsub cwfs_job

6.9. PBS Commands

The following commands provide the basic functionality for using the PBS batch system:

qsub: Used to submit jobs for batch processing.
qsub [ options ] my_job_script

qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ## check one job
qstat -u my_user_name ## check all of user's jobs

qdel: Used to kill queued or running jobs.
qdel PBS_JOBID

A more complete list of PBS commands is available in the Utility Server PBS Guide.

6.10. Advance Reservations

An Advance Reservation Service (ARS) is not available on the Utility Server. However, the ARS is available for other HPC systems. The ARS is accessible via most modern web browsers at https://reservation.hpc.mil. Authenticated access is required. An ARS User's Guide is available online once you have logged in.

7. Software Resources

7.1. Application Software

All Commercial Off The Shelf (COTS) software packages can be found in the $CSI_HOME directory (/app). A complete listing of the software versions installed on the Utility Server can be found on our software page. The general rule for all COTS software packages is that the two latest versions will be maintained on our systems. For convenience, modules are also available for most COTS software packages.

7.2. Useful Utilities

The following utilities are available on the Utility Server:

Useful Utilities
CommandDescription
blocker Convert a file to fixed-record-length format.
bull Display the system bulletin board.
cal2jul Converts a date into the corresponding Julian date.
datecalc Print an offset from today's date in various formats.
extabs Expand tab characters to spaces in a text file.
justify Justify a character string with padding.
lss Show unprintable characters in file names.
mpscp High-performance remote file copy.
news Print news items.
qpeek Display spooled stdout and stderr for an executing batch job.
qview Display information about batch jobs and queues.
show_queues Report current batch queue status, usage, and limits.
show_storage Display disk/file usage and quota information.
show_usage Display CPU allocation and usage by subproject.
stripdos Strip DOS end-of-record control characters from a text file.
tails Display the last five lines of one or more files.
trim Trim trailing blanks from text file lines.
vman Browse an online man page using the "view" command.

7.3. Visualization Software

The Utility Server includes the following visualization applications:

  • Ensight Suite
  • Fieldview
  • Matlab
  • NCAR Graphics Library
  • ParaView
  • Tecplot
  • VisIT Visualization Tool
  • ezVIZ

These applications automatically take advantage of the GPUs for graphics acceleration when run within the Secure Remote Desktop (SRD) application.

7.4. GPGPU Computing

The Utility Server includes graphics nodes which can be used for general-purpose computing on GPUs (GPGPU). Users may write a GPGPU program using CUDA or OpenCL.

The following is an example program that would use MPI and CUDA. First we look at the main program which uses MPI to create multiple jobs on multiple nodes.

#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
     
extern void cuda_device_init(void);
     
int main(int argc,char **argv)
{
int me, nprocs;
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &nprocs);
MPI_Comm_rank (MPI_COMM_WORLD, &me);
printf("me: %d nprocs: %d\n",me,nprocs);
 
cuda_device_init();
 
MPI_Finalize();
 
return 0;
}

Next we have the CUDA code, which in this case just queries the NVIDIA GPU for its parameters.

#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
     
extern "C" void cuda_device_init(void)
{
int ndev;
cudaGetDeviceCount(&ndev);
cudaThreadSynchronize();
printf("There are %d GPUs.\n",ndev);
     
for(int i=0;i<1;i++) {
cudaDeviceProp pdev;
cudaGetDeviceProperties(&pdev,i);
cudaThreadSynchronize();
printf("Name  : %s\n",pdev.name);
printf("Capability  : %d %d\n",pdev.major,pdev.minor);
printf("Memory Global: %d Mb\n",(pdev.totalGlobalMem+1024*1024)/1024/1024);
printf("Memory Const : %d Kb\n",pdev.totalConstMem/1024);
printf("Memory Shared: %d Kb\n",pdev.sharedMemPerBlock/1024);
printf("Clock  : %.3f GHz\n",pdev.clockRate/1000000.0);
printf("Processors  : %d\n",pdev.multiProcessorCount);
printf("Cores  : %d\n",8*pdev.multiProcessorCount);
printf("Warp  : %d\n",pdev.warpSize);
printf("Max Thr/Blk  : %d\n",pdev.maxThreadsPerBlock);
printf("Max Blk Size : %d %d %d\n",pdev.maxThreadsDim[0],pdev.maxThreadsDim[1],pdev.maxThreadsDim[2]);
printf("Max Grid Size: %d %d %d\n",pdev.maxGridSize[0],pdev.maxGridSize[1],pdev.maxGridSize[2]);
}
}

With the two pieces of code above, conveniently called main.c and cuda_device_init.cu, you can proceed to compile them. However, you need to take some caution when compiling the codes. The first thing you will do is to load the most recent version of CUDA, which in this case is version 4.1.28. That will be followed by switching out the PGI compiler for the GNU compiler, and switching out the PGI version of OpenMPI for the GNU version of OpenMPI.

$ module load cuda/4.1.28 $ module switch compiler/pgi/11.10 compiler/gcc/4.4 
$ module switch mpi/pgi/openmpi/1.4.3 mpi/gnu/openmpi/1.4.3

Now you can begin to compile the source code listed above.

$ mpicc -c main.c -o main.o
$ nvcc -c cuda_device_init.cu -o cuda_device_init.o
$ mpicc -o gpu_info main.o cuda_device_init.o -L/app/cuda/4.1.28/cuda/lib64/ -lcudart

You can use the cudaGetDeviceProperties(&pdev,i) to look at the properties of the NVidia GPU. If you get the following:

There are 753463296 GPUs.
Name         :
Capability   : 601815808 32767
Memory Global: -4095 Mb
Memory Const : 0 Kb
Memory Shared: 0 Kb
Clock        : 0.011 GHz
Processors   : 32767
Cores        : 262136
Warp         : 0
Max Thr/Blk  : 0
Max Blk Size : 0 6300320 0
Max Grid Size: 758072856 10986 755563456

Then the chances are very good that you have landed on a compute node. Instead you should get the following:

There are 1 GPUs.
Name         : Tesla M2050
Capability   : 2 0
Memory Global: 2688 Mb
Memory Const : 64 Kb
Memory Shared: 48 Kb
Clock        : 1.147 GHz
Processors   : 14
Cores        : 112
Warp         : 32
Max Thr/Blk  : 1024
Max Blk Size : 1024 1024 64
Max Grid Size: 65535 65535 65535

To run on two different graphics nodes, you can do something like:

$ qsub -A ERDCS97290STA -l select=2:ngpus=1 -I
$ module load cuda/4.1.28
$ mpirun -np 2 --host stutilg-0004,stutilg-0005 gpu_info

Only a certain number of GPU nodes are allocated to the parallel queue. These nodes are graphics nodes 1-10. The system favors serial jobs and vnc jobs. Jobs in the serial and vnc queues can access GPU nodes 1-22. When a job starts in the vnc or serial queue, they are allocated to nodes 1-10 first. That blocks people from running parallel jobs on the GPU nodes.

The Data Analysis and Assessment Center (DAAC) provides GPGPU tutorials.

7.5. Sample Code Repository

The Sample Code Repository is a directory that contains examples for COTS batch scripts, building and using serial and parallel programs, data management, and accessing and using serial and parallel math libraries. The $SAMPLES_HOME environment variable contains the path to this area, and is automatically defined in your login environment.

The following table contains the standard set of sample codes that is available on all Utility Servers. Additional samples may also be available on each Utility Server to meet specific needs of users at that location.

Sample Code Repository on Utility Server
applications
Examples illustrating various software packages available on the utility server.
Sub-DirectoryDescription
There are currently no samples available in this category.
dataManagement
Information about data management techniques.
Sub-DirectoryDescription
storageManagementText file describing HOME, WORKDIR, and CENTER location environment variables and preferred practices regarding files and usage practices for each.
debugging
Basic information on how to start up and use the available debuggers on the utility server.
Sub-DirectoryDescription
core_filesSample code and job script illustrating how core files are generated and their use with the gdb command line debugger to debug a program.
jobSubmission
Sample PBS batch scripts and helpful commands for monitoring job progress. Examples include options to submit a jobs such as declaring which group membership you belong to (for allocation accounting), how to request a particular software license, etc.
Sub-DirectoryDescription
dataStagingSample PBS batch script for data transfer from $CENTER to $PBS_O_WORKDIR prior to submitting a PBS job using that data.
MPI_OpenMP_scriptsSample PBS batch script demonstrating how to execute a hybrid MPI/OpenMP job.
MPI_scriptsSample PBS batch script demonstrating how to execute a MPI job.
OpenMP_scriptsSample PBS batch script demonstrating how to execute an OpenMP job.
libraries
Listing of the various libraries available and examples of how to compile a program linking to those libraries. Example source code, Makefiles, and data files may be included.
Sub-DirectoryDescription
There are currently no samples available in this category.
parallelEnvironment
Sample code and scripts containing compiler options for common parallel programming practices including code profiling.
Sub-DirectoryDescription
hello_worldExample codes in Fortran and C demonstrating classic "Hello World" algorithms implemented using MPI, OpenMP, and hybrid MPI/OpenMP techniques. Sample PBS job scripts and a Makefile demonstrating compilation are included.