HPC Centers: SCOUT Quick Start Guide

When to Contact the HPC Help Desk	Users should contact the HPC Help Desk when assistance is needed for unclassified problems, issues, or questions.
Hours of Operation	8:00 a.m. - 8:00 p.m. Eastern, Monday - Friday (excluding Federal holidays).
HPC Centers Home Page	https://centers.hpc.mil/
Help Desk Video Tutorial	Getting Help: the HPC Help Desk
Phone Number	1-877-222-2039 or (937) 255-0679
Help E-mail	help@helpdesk.hpc.mil
Accounts E-mail	accounts@helpdesk.hpc.mil
HPC Help Desk Manager E-mail	manager@helpdesk.hpc.mil
After Hours	Calls, e-mails and tickets received after normal operating hours will be addressed the following business day.
Mailing Address	DoD HPCMP HPC Help Desk AFRL/RCM 2435 Fifth Street Wright-Patterson Air Force Base, OH - 45433-7802
Ticket Requests	Active users may submit tickets from the Active User Help Ticket. Inactive users may submit tickets using the Inactive User Help Ticket.

SCOUT Quick Start Guide

1. Introduction

This document provides a brief summary of information you'll need to quickly begin working on the Supercomputing Outpost (SCOUT). For more detailed information, see the SCOUT User Guide.

2. Get a Kerberos Ticket

For security purposes, you must have a current Kerberos ticket on your computer before attempting to connect to SCOUT. To get a Kerberos ticket, a Kerberos client kit must be installed on your desktop. Information about installing Kerberos clients on your Windows desktop can be found on the Kerberos & Authentication page.

3. Connect to SCOUT

SCOUT can be accessed via Kerberized ssh as follows:

% ssh user@scout.arl.hpc.mil

4. Home, Working, and Center-wide Directories

Each user has file space in the $HOME, $WORKDIR, and $CENTER directories. The $HOME, $WORKDIR, and $CENTER environment variables are predefined for you and point to the appropriate locations in the file systems. You are strongly encouraged to use these variables in your scripts.

Note: $WORKDIR is a "scratch" file system, and $CENTER is a center-wide file system accessible to all ARL DSRC production systems. Neither of these file systems is backed up. You are responsible for managing files in your $WORKDIR and $CENTER directories by backing up files to the archive system and deleting unneeded files. Currently, $WORKDIR files older than 21 days and $CENTER files older than 120 days are subject to being purged.

If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, you WILL NOT be notified prior to deletion. You are responsible for monitoring your workspace to prevent data loss.

5. Login, Inference, Training, and Visualization Nodes

SCOUT has four types of nodes: login, inference, training, and visualization. When you log into the system, you are placed on a login node. These nodes are used for typically small tasks, such as editing and compiling code. When the Batch Scripts or Interactive Jobs run, the resulting shell runs on the node requested.

The following "compute nodes" are accessed via the mpirun command, if done interactively. The mpiexec command is typically issued from within a batch job script. Otherwise, you wouldn't have any compute nodes allocated, and your parallel job would run on the login node. If this were to happen, your job would interfere with (and be interfered with by) other users' login node tasks.

Inference, training, and visualization nodes have the following naming conventions for accessing them respectively: inf, tra, and vis.

6. Transfer Files and Data to SCOUT

File transfers to DSRC systems must be performed using Kerberized versions of the following tools: scp, sftp, and mpscp. For example, the command below uses secure copy (scp) to copy a local file into a destination directory on a SCOUT login node.

% scp local_file user@scout.arl.hpc.mil:/target_dir

For additional information on file transfers to and from SCOUT, see the File Transfers section of the SCOUT User Guide.

7. Submit Jobs to the Batch Queue

The IBM Spectrum Load Sharing Facility (LSF) is the workload management system for SCOUT. To submit a batch job, use the following command:

bsub < my_job_script

where my_job_script is the name of the file containing your batch script.

For more information on using LSF or job scripts, see the SCOUT User Guide, the SCOUT LSF Guide, or the sample script examples in the $SAMPLES_HOME directory on SCOUT.

8. Batch Queues

The following table describes the LSF queues available on SCOUT:

Queue Descriptions and Limits on SCOUT
Priority	Queue Name	Max Wall Clock Time	Max Cores Per Job	Max Queued Per User	Max Running Per User	Description
Highest	transfer	48 Hours	N/A	N/A	N/A	Data transfer for user jobs. Not charged against project allocation. See the ARL DSRC Archive Guide, section 5.2.
	urgent	96 Hours	N/A	N/A	N/A	Jobs belonging to DoD HPCMP Urgent Projects
	debug	1 Hour	N/A	N/A	N/A	Time/resource-limited for user testing and debug purposes
	high	168 Hours	N/A	N/A	N/A	Jobs belonging to DoD HPCMP High Priority Projects
	frontier	168 Hours	N/A	N/A	N/A	Jobs belonging to DoD HPCMP Frontier Projects
	HIE	24 Hours	N/A	N/A	N/A	Rapid response for interactive work. For more information see the HPC Interactive Environment (HIE) User Guide.
	interactive	12 Hours	N/A	N/A	N/A	Interactive jobs
	standard	168 Hours	N/A	N/A	N/A	Standard jobs
Lowest	background	24 Hours	N/A	N/A	N/A	User jobs that are not charged against the project allocation

9. Monitoring Your Job

You can monitor your batch jobs on SCOUT using the bjobs command.

The bjobs -u all command lists all jobs in the queue. The bjobs command with no options shows only jobs owned by the user, as follows:

% bjobs –u all

JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
9985    username RUN   normal     login02     6*tra014    *_training Feb 19 0:07
9986    username RUN   normal     login02     6*tra002    *_training Feb 19 0:07
9987    username RUN   normal     login02     6*tra004    *_training Feb 19 0:07
9994    username RUN   normal     login02     4*inf033    *inference Feb 19 0:08
9995    username RUN   normal     login02     4*inf019    *inference Feb 19 0:08
9996    username RUN   normal     login02     4*inf034    *inference Feb 19 0:08
9997    username RUN   normal     login02     4*inf048    *inference Feb 19 0:08

% bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
9985    username RUN   normal     login02     6*tra014    *_training Feb 19 0:07

Notice the output contains the JobID for each job. This ID can be used with the bkill, bjobs, and bstop commands.

To delete a job, use the command bkill jobID.

To delete all your jobs, use bkill -u username.

To view a partially completed output file, use the bpeek jobID command.

10. Archiving Your Work

When your job completes, archive any important data to prevent automatic deletion by the purge scripts.

Copy one or more files to the archive system.
archive put [-C path ] [-D] [-s] file1 [file2 ...]

Copy one or more files from the archive system.
archive get [-C path ] [-s] file1 [file2 ...]

For more information on archiving your files, see the ARL DSRC Archive Guide.

11. Modules

Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. SCOUT uses modules to initialize your environment with system commands and libraries, compiler suites, environment variables, and LSF batch system commands.

Several modules are loaded automatically as soon as you log in. To view the currently loaded modules, use module list. To see the entire list of available modules, use module avail. You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the ARL DSRC Modules Guide.

12. Available Software

A list of software on SCOUT is available on the Software page.

HPC Help Desk