HPC Interactive Environment (HIE) User Guide

1. Introduction

1.1. The HPC Interactive Environment

There are many HPC workloads that can greatly benefit from rapid response and high availability. This includes interactive sessions, application development, and debugging. The HPC Interactive Environment (HIE) is a queue configuration and computing environment intended to deliver this type of responsiveness and availability. The intent of this guide is to provide information enabling the average user to perform computational tasks in the HIE.

The HIE is primarily intended to provide high availability and response supporting the following services:

  • Remote visualization
  • Application development for General Purpose Graphics Processing Units (GPGPU)
  • Application development for other non-standard processors on a particular system. Depending on the system utilized, this currently only includes Xeon Phi-accelerated processors and Knights Landing (KNL) stand-alone processors.

The following functions are available as well:

  • Interactive use, including:
    • Debugging
    • Complete pre- and post- processing
    • Building applications
    • Remote visualization tasks
  • Batch processing during low-usage times
  • Access to specialty nodes (GPU, large-memory, etc.)

Please note the HIE is intended to provide higher response rate in support of interactive workloads as well as application development and debugging. There are a limited number of nodes available to the HIE and they should be reserved for appropriate use cases. The use of the HIE for regular batch processing is considered abuse and will be closely monitored. The HIE should not be used simply as a mechanism to give your regular batch jobs a higher priority.

1.2. Requesting Assistance

The HPC Help Desk is available to assist users with unclassified problems, issues, or questions. Analysts are on duty 8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).

For more detailed contact information, please see our Contact Page.

2. HIE Configuration

The HIE is simply a separate computing environment available on all unclassified allocated HPC systems in the Program. A select number of different node types (standard, large-memory, GPU, etc.) are dedicated to the HIE for its exclusive use. This number varies from system to system. In addition, an equal number of nodes are available for peak demand (assuming an equal number are available). The table below shows the number of nodes dedicated to the HIE on each system followed by the maximum number of each node type a single job can use.

HIE Configuration
Standard Large-Memory GPU Accelerated Phi Accelerated
Centennial 0/0 8/2 24/5 0/0
Conrad 24/5 2/1 0/0 28/6
Excalibur 0/0 8/2 24/5 0/0
Gaffney 12/2 4/1 4/1 0/0
Gordon 24/5 2/1 0/0 28/6
Koehr 12/2 4/1 4/1 0/0
Mustang 4/1 2/1 4/1 0/0
Onyx 4/1 1/1 4/1 8/2 (KNL)
Thunder 8/2 0/1 4/1 0/0

3. Batch Scheduling

Although the HIE can run batch jobs, it is not meant for that purpose . Batch jobs should be reserved for times when HIE usage is low (i.e. evenings/weekends). During high-usage times, the HIE is reserved for interactive use.

3.1. Scheduler

The Portable Batch System Professional™ (PBSPro) is currently running on HPCMP resources providing the HIE. It schedules jobs, manages resources and job queues, and can be accessed through the interactive batch environment or by submitting a batch request. The PBS module is loaded automatically for you when you log in.

3.2. Queue Information

There is a single queue associated with the HIE, named "HIE".

3.3. Interactive Logins

When you log in to an HPC system, you will be running in an interactive shell on a login node and have access to the HIE. The login nodes provide login access for the HIE and support such activities as compiling, editing, and general interactive use by all users. Please note the Login Node Abuse policy in the specific system's User Guide. The preferred method to run resource intensive executions is to use an interactive batch session.

3.4. Interactive Batch Sessions

To get a HIE interactive batch session, you must first submit an interactive batch job through PBS. This is done by executing a qsub command with the -I option from within the interactive login environment. For example:

% qsub -I -l select=N1:ncpus=N2:mpiprocs=N3 -A Project_ID -q HIE -l walltime=HHH:MM:SS

Both the number of nodes and processes per node are specified using the same directive as follows: where N1 is the number of nodes you are requesting, and N3 is the number of MPI processes per node. The value of ncpus, labeled as N2, refers to the number of physical cores available on each node and varies by system. The additional values include your Project ID and the desired maximum walltime (maximum of 24 hours in the HIE).

3.5. Batch Request Submission

PBSPro ™ batch jobs are submitted via the qsub command. The format of this command is:

% qsub [ options ] batch_script_file

qsub options may be specified on the command line or embedded in the batch script file by lines beginning with #PBS.

3.6. PBS Resource Directives

PBS resource directives allow you to specify to PBS how your batch jobs should be run and what resources your job requires. Although PBS has many directives, you only need to know a few to run most jobs.

The basic syntax of PBS directives is as follows:

#PBS option[[=]value]

where some options may require values to be included. For example, to start a 16-process job, you would request one node of 32 cores and specify you will be running 16 processes per node:

#PBS -l select=1:ncpus=32:mpiprocs=16

The following directives are required for all jobs:

Required Directives
Directive Value Description
-A Project_ID Name of the project
-q queue_name This should be HIE
-l select=N1:ncpus=N2:mpiprocs=N3 Number of nodes (N1)
Number of cores per node (N2)
Number of processes per node (N3)
-l walltime=HHH:MM:SS The maximum walltime in the HIE is 24 hours

A more complete listing of batch resource directives is available in the PBS guide for each system.

Job submissions in the HIE will differ based on the type of nodes being requested. The following examples demonstrate job submissions for different node types.

3.6.1. Requesting Large-Memory Nodes

To request a large-memory node, a variation of the select statement can be used:

-l select=1:ncpus=N2:mpiprocs=N3:bigmem=1

The above command requests 1 node and specifies it should be a large-memory node. The number of cores available will vary by system. Please see the specific system's User Guide for more information.

3.6.2. Requesting GPU Nodes

To request a GPU node, a variation of the select statement can be used:

-l select=1:ncpus=N2:mpiprocs=N3:ngpus=1

The above command requests 1 node with a GPU. Please note some systems may have nodes with more than one GPU. The above command will still work, but you can request up to the number of GPUs available on a node. For example, two on Thunder.

3.6.3. Requesting Phi Nodes

To request a Phi node, a variation of the select statement can be used:

-l select=1:ncpus=N2:mpiprocs=N3:nmics=1

The above command requests 1 node with a Phi accelerator. Note that some systems may have nodes with more than one Phi accelerator. The above command will still work, but you can request up to the number of Phis available on a node. For example, two on Thunder.

3.6.4. Requesting Mixed-Nodes

To request a mixture of nodes, a variation of the select statement can be used:

-l select=1:ncpus=N2:mpiprocs=N3+1:ncpus=N4:mpiprocs=N5:ngpus=1

The above command requests one standard compute node plus a second node with a GPU.

3.6.5. Requesting KNL Nodes

If you are using Onyx and want to use KNL nodes within the HIE, this is done the same as outside the HIE. Please see the Onyx KNL Quick Start Guide for more information.

3.7. Launch Commands

The launch command for the HIE is the same as for non-HIE jobs on the same HPC system. Please see the User Guide for a particular HPC system for available launch commands.

3.8. Sample Scripts

Since the HIE is just an execution environment, job scripts are created the same as for any other job on the particular HPC system, with the exception that the queue name should be specified as "HIE". Please see the User Guide for the particular HPC system for additional information. Note that ARL systems do not have standard compute nodes assigned to the HIE. If you do not specify either large-memory or GPU nodes, your job will still run in the HIE but will be randomly assigned to one of these node types.

3.9. PBS Commands

The following commands provide the basic functionality for using the PBS batch system:

qsub: Used to submit jobs for batch processing.
qsub [ options ] my_job_script

qstat: Used to check the status of submitted jobs.
qstat PBS_JOBID ## check one job
qstat -u my_user_name ## check all of user's jobs

qdel: Used to kill queued or running jobs.
qdel PBS_JOBID

A more complete list of PBS commands is available in the PBS guide for a particular system.

3.10. Advance Reservations

The HIE is not accessible through the Advance Reservation Service (ARS). For information about the ARS, please see the ARS User Guide.