SCOUT Quick Start Guide
1. Introduction
This document provides a brief summary of information you'll need to quickly begin working on the Supercomputing Outpost (SCOUT). For more detailed information, see the SCOUT User Guide.
2. Get a Kerberos Ticket
For security purposes, you must have a current Kerberos ticket on your computer before attempting to connect to SCOUT. To get a Kerberos ticket, a Kerberos client kit must be installed on your desktop. Information about installing Kerberos clients on your Windows desktop can be found on the Kerberos & Authentication page.
3. Connect to SCOUT
SCOUT can be accessed via Kerberized ssh as follows:
% ssh user@scout.arl.hpc.mil
4. Home, Working, and Center-wide Directories
Each user has file space in the $HOME, $WORKDIR, and $CENTER directories. The $HOME, $WORKDIR, and $CENTER environment variables are predefined for you and point to the appropriate locations in the file systems. You are strongly encouraged to use these variables in your scripts.
Note: $WORKDIR is a "scratch" file system, and $CENTER is a center-wide file system accessible to all ARL DSRC production systems. Neither of these file systems is backed up. You are responsible for managing files in your $WORKDIR and $CENTER directories by backing up files to the archive system and deleting unneeded files. Currently, $WORKDIR files older than 21 days and $CENTER files older than 120 days are subject to being purged.
If it is determined as part of the normal purge cycle that files in your $WORKDIR directory must be deleted, you WILL NOT be notified prior to deletion. You are responsible for monitoring your workspace to prevent data loss.
5. Login, Inference, Training, and Visualization Nodes
SCOUT has four types of nodes: login, inference, training, and visualization. When you log into the system, you are placed on a login node. These nodes are used for typically small tasks, such as editing and compiling code. When the Batch Scripts or Interactive Jobs run, the resulting shell runs on the node requested.
The following "compute nodes" are accessed via the mpirun command, if done interactively. The mpiexec command is typically issued from within a batch job script. Otherwise, you wouldn't have any compute nodes allocated, and your parallel job would run on the login node. If this were to happen, your job would interfere with (and be interfered with by) other users' login node tasks.
Inference, training, and visualization nodes have the following naming conventions for accessing them respectively: inf, tra, and vis.
6. Transfer Files and Data to SCOUT
File transfers to DSRC systems must be performed using Kerberized versions of the following tools: scp, sftp, and mpscp. For example, the command below uses secure copy (scp) to copy a local file into a destination directory on a SCOUT login node.
% scp local_file user@scout.arl.hpc.mil:/target_dir
For additional information on file transfers to and from SCOUT, see the File Transfers section of the SCOUT User Guide.
7. Submit Jobs to the Batch Queue
The IBM Spectrum Load Sharing Facility (LSF) is the workload management system for SCOUT. To submit a batch job, use the following command:
bsub < my_job_script
where my_job_script is the name of the file containing your batch script.
For more information on using LSF or job scripts, see the SCOUT User Guide, the SCOUT LSF Guide, or the sample script examples in the $SAMPLES_HOME directory on SCOUT.
8. Batch Queues
The following table describes the LSF queues available on SCOUT:
Priority | Queue Name | Max Wall Clock Time | Max Cores Per Job | Max Queued Per User | Max Running Per User | Description |
---|---|---|---|---|---|---|
Highest | transfer | 48 Hours | N/A | N/A | N/A | Data transfer for user jobs. Not charged against project allocation. See the ARL DSRC Archive Guide, section 5.2. |
urgent | 96 Hours | N/A | N/A | N/A | Jobs belonging to DoD HPCMP Urgent Projects | |
debug | 1 Hour | N/A | N/A | N/A | Time/resource-limited for user testing and debug purposes | |
high | 168 Hours | N/A | N/A | N/A | Jobs belonging to DoD HPCMP High Priority Projects | |
frontier | 168 Hours | N/A | N/A | N/A | Jobs belonging to DoD HPCMP Frontier Projects | |
HIE | 24 Hours | N/A | N/A | N/A | Rapid response for interactive work. For more information see the HPC Interactive Environment (HIE) User Guide. | |
interactive | 12 Hours | N/A | N/A | N/A | Interactive jobs | |
standard | 168 Hours | N/A | N/A | N/A | Standard jobs | |
Lowest | background | 24 Hours | N/A | N/A | N/A | User jobs that are not charged against the project allocation |
9. Monitoring Your Job
You can monitor your batch jobs on SCOUT using the bjobs command.
The bjobs -u all command lists all jobs in the queue. The bjobs command with no options shows only jobs owned by the user, as follows:
% bjobs –u all JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 9985 username RUN normal login02 6*tra014 *_training Feb 19 0:07 9986 username RUN normal login02 6*tra002 *_training Feb 19 0:07 9987 username RUN normal login02 6*tra004 *_training Feb 19 0:07 9994 username RUN normal login02 4*inf033 *inference Feb 19 0:08 9995 username RUN normal login02 4*inf019 *inference Feb 19 0:08 9996 username RUN normal login02 4*inf034 *inference Feb 19 0:08 9997 username RUN normal login02 4*inf048 *inference Feb 19 0:08 % bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 9985 username RUN normal login02 6*tra014 *_training Feb 19 0:07
Notice the output contains the JobID for each job. This ID can be used with the bkill, bjobs, and bstop commands.
To delete a job, use the command bkill jobID.
To delete all your jobs, use bkill -u username.
To view a partially completed output file, use the bpeek jobID command.
10. Archiving Your Work
When your job completes, archive any important data to prevent automatic deletion by the purge scripts.
Copy one or more files to the archive system.
archive put [-C path ] [-D] [-s] file1 [file2 ...]
Copy one or more files from the archive system.
archive get [-C path ] [-s] file1 [file2 ...]
For more information on archiving your files, see the ARL DSRC Archive Guide.
11. Modules
Software modules are a very convenient way to set needed environment variables and include necessary directories in your path so commands for particular applications can be found. SCOUT uses modules to initialize your environment with system commands and libraries, compiler suites, environment variables, and LSF batch system commands.
Several modules are loaded automatically as soon as you log in. To view the currently loaded modules, use module list. To see the entire list of available modules, use module avail. You can modify the configuration of your environment by loading and unloading modules. For complete information on how to do this, see the ARL DSRC Modules Guide.
12. Available Software
A list of software on SCOUT is available on the Software page.