MHPCC DSRC Introductory Site Guide

1. Introduction

1.1. Purpose of this document

This document introduces users to the Maui High Performance Computing Center (MHPCC) DoD Supercomputing Resource Center (DSRC). It provides an overview of available resources, links to relevant documentation, essential policies governing the use of our systems, and other information to help you make efficient and effective use of your allocated hours.

1.2. About the MHPCC DSRC

The MHPCC DSRC is one of five DSRCs managed by the DoD High Performance Computing Modernization Program (HPCMP). The DSRCs deliver a range of compute-intensive and data-intensive capabilities to the DoD science and technology, test and evaluation, and acquisition engineering communities. Each DSRC operates and maintains major High Performance Computing (HPC) systems and associated infrastructure, such as data storage, in both unclassified and classified environments. The HPCMP provides user support through a centralized help desk and data analysis/visualization group.

MHPCC is located in Maui, HI and provides research and development for the HPCMP.

1.3. Whom our services are for

The HPCMP's services are available to Service and Agency researchers in the Research, Development, Test, and Evaluation (RDT&E) and acquisition engineering communities of the DoD and its respective DoD contractors, and University staff working on a DoD research grant.

For more details, see HPCMP Presentation " Who may run on HPCMP Resources?"

1.4. How to get an account

Anyone meeting the above criteria may request an HPCMP account. An HPC Help Desk video is available to guide you through the process of getting an account. To begin the account application process, visit the Obtaining an Account page and follow the instructions presented there.

1.5. Visiting the MHPCC DSRC

If you need to travel to the MHPCC DSRC, there are security procedures that must be completed BEFORE planning your trip. Please see our Visit section and coordinate with your Service/Agency Approval Authority (S/AAA) to ensure all requirements are met.

2. Policies

2.1. Baseline Configuration (BC) policies

The Baseline Configuration Team sets policies that apply to all HPCMP HPC systems. The BC Policy Compliance Matrix provides an index of all BC policies and compliance status of systems at each DSRC.

2.2. Login node abuse policy

Memory or CPU-intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring less than 10 minutes of runtime and less than 2 GB of memory are allowed on the login nodes. Any job running on the login nodes that exceeds these limits may be unilaterally terminated.

2.3. File space management policy

2.3.1. $WORKDIR

$WORKDIR is the local temporary file system (i.e., local high-speed disk) that is available on all MHPCC DSRC HPC systems and is available to all users.

$WORKDIR is intended to be used for executing programs and performing file I/O local to that system. $WORKDIR doesn't have space restrictions, such as your home ($HOME) or /tmp directories, but $WORKDIR is not intended for use as a permanent file storage area.

The $WORKDIR file system is NOT backed up or exported to any other system. In the event of file or directory structure deletion or a catastrophic disk failure, such files and directory structures are lost. It is your responsibility to transfer files that need to be saved to a location that allows long-term storage, such as your archival ($ARCHIVE_HOME) or, for smaller files, home ($HOME) directory locations. Please note your archival storage area has no disk quota assigned to it, while your home directory area has a disk quota assigned.

To provide sufficient free $WORKDIR disk space to all users, all files in the $WORKDIR file system that have not been accessed in more than 30 days are subject to deletion.

2.3.2. $CENTER

$CENTER refers to the directory each user has on the Center-Wide File System (CWFS). The CWFS provides file storage accessible from the login nodes of each HPC system and from the HPC Portal. The CWFS allows file transfers and other file and directory operations from these systems using standard Linux commands.

$CENTER is to be used for short-term file storage and to enable shared file access among the MHPCC DSRC HPC systems. All files in the CWFS that have not been accessed in more than 120 days are subject to deletion.

2.4. Maximum session lifetime policy

To provide users with a more secure high performance computing environment, the MHPCC DSRC has implemented a limit on the lifetime of all terminal/window sessions. Any idle terminal or window session connections to the MHPCC DSRC are terminated after 6 hours. Regardless of activity, any terminal or window session connections to the MHPCC DSRC are terminated after 24 hours.

2.5. Batch use policy

The batch queues and internal policies of the batch job scheduler have been configured to allow great flexibility in job execution. However, improper use of the queueing system can result in delayed or cancelled job execution and potential system issues.

2.6. Special request policy

MHPCC does not currently accept special requests for our systems.

2.7. Account removal policy

The MHPCC DSRC follows Baseline Configuration (BC) policy FY13-02 (Data Removal at Account Closure) for actions related to user account removal.

2.8. Communications policy

MHPCC is fully compliant with Baseline Configuration (BC) policy FY06-11 (Announcing and Logging Changes). The key methods we use to communicate announcements and important information to our users about HPC systems and the environment include:

  • Mass emails sent to all users or those assigned to a particular HPC system
  • Maintenance notices posted on the HPC Centers public site at https://centers.hpc.mil
  • System login messages posted to the appropriate HPC systems

It is vital to the MHPCC DSRC's communication process, and mutually beneficial to our users, to understand the responsibilities of being a good citizen of the MHPCC DSRC. We ask that users:

  • Keep the MHPCC DSRC apprised of current email addresses. This way we can assure vital information about our Center reaches you. Please contact your S/AAA to update your email address. If the email address you provide is behind a firewall, you may need to request your local system administrator to allow email from the MHPCC DSRC to pass through the firewall boundary to your work site.
  • Please check the HPC Centers website for current news and information on topics such as HPC resource availability, upcoming training opportunities, or updates to our user guides and the policies and procedures documentation.

2.9. Account sharing policy

You are responsible for all passwords, accounts, YubiKeys, and associated Personal Identification Number (PINs) issued to you, and you are not to share them with any other individual for any reason. Doing so is a violation of the contract you are required to sign to obtain access to HPCMP computational resources.

Upon discovery/notification of a violation of the above policy, your account will be disabled and access to your account assets will be restricted. Any executing jobs will be allowed to complete, but queued jobs will be deleted. The S/AAA who authorized your account will be notified of the policy violation and the actions taken.

2.10. Scheduled Maintenance policy

The Maui High Performance Computing Center may reserve the entire system on site for regularly scheduled maintenance the 3rd Wednesday of every month from 0800 - 2200 (HST). The reservation is scheduled the previous Friday, and every Monday afternoon, a committee convenes to determine if maintenance will be performed.

Additionally, the system may be down periodically for software and hardware upgrades at other times. Users are usually notified of such times in advance by "What's New" and by the login banner. Unscheduled downtimes are unusual but do occur. In such cases, notification to users may not be possible. If you cannot access the system during a non-scheduled downtime period, please send an email or call the HPC Help Desk.

2.11. Archive policy

The MHPCC DSRC Archive Guide provides information on best use of the archive. Users that read/write thousands of files or very large files to the archive, adversely impact the performance of the archive for all users. A user negatively impacting the performance of the archive will be notified and advised of how to best use the archive. After being notified, if the user continues to adversely impact the archive, the user's access to the archive will be suspended until the user has agreed to follow best use practices. Data stored on the archive must be for legitimate projects or task orders. Users will be asked to remove data from the archive that is not for a sanctioned project or task order. If the user does not remove the unacceptable data from the archive, it will be removed by the MHPCC storage administrator.

3. Available resources

3.1. Non-Allocated systems

The MHPCC DSRC Non-Allocated systems are accessible through the Defense Research and Engineering Network (DREN) to all active users. Our current Non-Allocated systems are:

Builder is a single-node Aspen Systems Linux Gigabyte server intended primarily to provide a platform for building and sharing Singularity software containers. The system is populated with AMD 7742 processors and Nvidia V100 graphics processing units. Builder uses Intel Gigabit Ethernet as its high-speed network for I/O traffic. Builder uses AVAGO MegaRAID to manage its local file system that targets 130 TB of disk storage. Builder is equipped with two 64-core processors (128 total cores) running the RHEL 8 operating system, two GPUs, 1,024 gigabytes of memory, with no user-accessible swap space.

See the Systems page for more information about Builder.

Coral is an Aspen Systems Linux Cluster with x86_64 and ARM nodes, enabling architecture comparisons. It features 8-way A100 GPUs, AMD MI100/MI250 GPUs, and DPUs for accelerated computing. With standard and large-memory configurations, InfiniBand interconnect, and a Weka parallel file system, Coral supports diverse HPC tasks requiring batch scheduling and parallel processing. It serves as a testbed for ARM-based HPC evaluation.

See the Systems page for more information about Coral.

Reef is an Aspen Systems Linux Cluster. The login and compute nodes are populated with Intel 2.5-Ghz Cascade Lake processors. Reef has 5 CPU-only and 11 GPU compute nodes. Each compute node has two 20-core processors, sharing 768 GB of DDR4 memory, with no user accessible swap space. Reef has 109 TB (formatted) of disk storage. Reef is intended to be used as a batch scheduled HPC system.

Note: the configuration of Reef is subject to change without notice.

See the Systems page for more information about Reef.

3.2. Data storage

3.2.1. File systems

Each HPC system has several file systems available for storing user data. Your personal directories on these file systems are commonly referenced via the $HOME, $WORKDIR, $CENTER, and $ARCHIVE_HOME environment variables. Other file systems may be available as well.

File System Environment Variables
Environment Variable Description
$HOME Your home directory on the system
$WORKDIR Your temporary work directory on a high-capacity, high-speed scratch file system used by running jobs
$CENTER Your short-term (120-day) storage directory on the Center-Wide File System (CWFS)
$ARCHIVE_HOME Your archival directory on the archive server

For details about the specific file systems on each system, see the system user guides on the MHPCC DSRC Documentation page.

3.2.2. Archive system

All our HPC systems have access to an online archival system, which provides long term storage for users' files on a petascale robotic tape library system.

For information on using the archive server, see the MHPCC DSRC Archive Guide.

3.3. Computing environment

To ensure a consistent computing environment and user experience on all HPCMP HPC systems, all systems follow a standard configuration baseline. For more information on the policies defining the baseline configuration, see the Baseline Configuration Compliance Matrix. All systems run variants of the Linux operating system, but the computing environment varies by vendor and architecture due to vendor-specific enhancements.

3.3.1. Software

Each HPC system hosts a large variety of compiler environments, math libraries, programming tools, and third-party analysis applications which are available via loadable software modules. A list of software is available on the Software page, or for more up-to-date software information, use the module commands on the HPC systems. Specific details of the computing environment on each HPC system are discussed in the system user guides, available on the MHPCC DSRC Documentation page.

To request additional software or to request access to restricted software, please contact the HPC Help Desk at help@helpdesk.hpc.mil.

3.3.2. Bring your own code

While all HPCMP HPC systems offer a diversity of open source, commercial and government software, there are times when we don't support the application codes and tools needed for specific projects. The following information describes a convenient way to utilize your own software on our systems.

Our HPC systems provide you with adequate file space to store your codes. Data stored in your home directory ($HOME) is backed up on a periodic basis. If you need more home directory space, you may submit a request to the HPC Help Desk at help@helpdesk.hpc.mil. For more details on home directories, see to the Baseline Configuration (BC) policy FY12-01 (Minimum Home Directory Size and Backup Schedule).

If you need to share an application among multiple users, BC policy FY10-07 (Common Location to Maintain Codes) explains how to create a common location on the $PROJECTS_HOME file system, to place applications and codes without using home directories or scrubbed scratch space. To request a new "project directory," please provide the following information to the HPC Help Desk:

  • Desired DSRC system where a project directory is being requested.
  • POC Information: Name of the sponsor of the project directory, username, and contact information.
  • Short Description of Project: Short summary of the project describing the need for a project directory.
  • Desired Directory Name: This is the name of the directory created under $PROJECTS_HOME.
  • Is the code/data in the project directory restricted (e.g., ITAR, etc.)?
  • Desired Directory Owner: The username to be assigned ownership of the directory.
  • Desired Directory Group: The group name to be assigned to the directory.
    (New group names must be eight characters or less)
  • Additional users to be added to the group.

If the POC for the project directory ceases being an account holder on the system, project directories are handled according to the user data retention policies of the center.

Once the project directory is created, you can install software (custom or open source) in this directory. Then, depending on requirements, you can set file and/or directory permissions to allow any combination of group read, write, and execute privileges. Since this directory is fully owned by the POC, s/he can even make use of different groups within subdirectories to provide finer granularity of permissions.

Users are expected to ensure that any software or data placed on HPCMP systems is protected according to any external restrictions on the data. Users are also responsible for ensuring no unauthorized or malicious software is introduced to the HPCMP environment.

For installations involving restricted software, it is your responsibility to set up group permissions on the directories and protect the data. It is crucially important to note that there are users on the HPCMP systems who are not authorized to access restricted data. You may not run servers or use software that communicates to a remote system without prior authorization.

If you need help porting or installing your code, the HPC Help Desk provides a "Code Assist" team that specializes in helping users with installation and configuration issues for user supplied codes. To get help, simply contact the HPC Help Desk and open a ticket.

Please contact the HPC Help Desk to discuss any special requirements.

3.3.3. Batch schedulers

Our HPC systems use various batch schedulers to manage user jobs and system resources. Basic instructions and examples for using the scheduler on each system can be found in the system user guides. More extensive information can be found in the Scheduler Guides. These documents are available on the MHPCC DSRC Documentation page.

Schedulers place user jobs into different queues based on the project associated with the user account. Most users only have access to the debug, standard, transfer, HIE, and background queues, but other queues may be available to you depending on your project. For more information about the queues on a system, see the Scheduler Guides.

3.3.4. Advance Reservation Service (ARS)

No MHPCC systems currently support ARS.

3.4. HPC Portal

The HPC Portal provides a suite of custom web applications, allowing you to access a command line, manage files, and submit and manage jobs from a browser. It also supports pre/post-processing and data visualization by making DSRC-hosted desktop applications accessible over the web. For more information about the HPC Portal, see the HPC Portal page.

3.5. Secure Remote Desktop (SRD)

No MHPCC Systems currently support SRD.

3.6. Network connectivity

The MHPCC DSRC is a primary node on the Defense Research and Engineering Network (DREN), which provides up to 10-Gb/sec service to DoD HPCMP centers nationwide across a 100-Gb/sec backbone. We connect to the DREN via a 10-Gb/sec circuit linking us to the DREN backbone.

The DSRC's local network consists of a 40-Gb/sec fault-tolerant backbone with up to 10-Gb/sec connections to the HPC and archive systems.

4. How to access our systems

The HPCMP uses a network authentication protocol called Kerberos to authenticate user access to our HPC systems. Before you can login, you must download and install an HPCMP Kerberos client kit on your local system. For information about downloading and using these kits, visit the Kerberos & Authentication page and click on the tab for your platform. There you will find instructions for downloading and installing the kit, getting a ticket, and logging in.

After installing and configuring a Kerberos client kit, you can access our HPC systems via standard Kerberized commands, such as ssh. File transfers between local and remote systems can be accomplished via the scp, mpscp, or scampi commands. For additional information on using the Kerberos tools, see the Kerberos User Guide or review the tutorial video on Logging into an HPC System. Instructions for logging into each system can be found in the system user guides on the MHPCC DSRC Documentation page.

Another way to access the HPC systems is through the HPC Portal. For information on using the portal, visit the HPC Portal page. To log into the portal, click on the link for the center where your account is located.

5. How to get help

For almost any issue, the first place you should turn for help is the HPC Help Desk. You can email the HPC Help Desk at help@helpdesk.hpc.mil. You can also contact the HPC Help Desk via phone, DSN, or even traditional mail. Full contact information for the Help Desk is on the Technical and Customer Support page. The HPC Help Desk can assist with a wide array of technical issues related to your account and your use of our systems. The HPC Help Desk can also assist in connecting you with various special-purpose groups to address your particular need.

5.1. User Productivity Enhancement and Training (PET)

The PET initiative gives users access to computational experts in many HPC technology areas. These HPC application experts help HPC users become more productive using HPCMP supercomputers. The PET initiative also leverages the expertise of academia and industry experts in new technologies and provides training on HPC-related topics. Help in specific computational technology areas is available providing a wide range of expertise including algorithm development and implementation, code porting and development, performance analysis, application and I/O optimization, accelerator programming, preprocessing and grid generation, workflows, in-situ visualization, and data analytics.

To learn more about PET, see the Advanced User Support page. To request PET assistance, send an email to PET@hpc.mil.

5.2. User Advocacy Group (UAG)

The UAG provides a forum for users of HPCMP resources to influence policies and practices of the Program; to facilitate the exchange of information between the user community and the HPCMP; to serve as an advocate for HPCMP users; and to advise the HPC Modernization Program Office on policy and operational matters related the HPCMP.

To learn more about the UAG, see the User Advocacy Group page (PKI required). To contact the UAG, send an email to hpc-uag@hpc.mil.

5.3. Baseline Configuration Team (BCT)

The BCT defines a common set of capabilities and functions so users can work more productively and collaboratively when using the HPC resources at multiple computing centers. To accomplish this, the BCT passes policies which collectively create a configuration baseline for all HPC systems.

To learn more about the BCT and its policies, see the Baseline Configuration page. To contact the BCT, send an email to BCTinput@afrl.hpc.mil.

5.4. Computational Research and Engineering Acquisition Tools and Environments (CREATE)

The CREATE program provides tools to enhance the productivity of the DoD acquisition engineering workforce by providing high fidelity design and analysis tools with capabilities greater than today's tools, reducing the acquisition development and test process cycle. CREATE projects provide enhanced engineering design tools for the DoD HPC community.

To learn more about CREATE, visit the CREATE page or contact the CREATE Program Office at create@hpc.mil. You may also access the CREATE Community site (Registration and PKI required).

5.5. Data Analysis and Assessment Center (DAAC)

The DAAC serves the needs of DoD HPCMP scientists to analyze an ever-increasing volume and complexity of data. Its mission is to put visualization and analysis tools and services into the hands of every user.

For more information about DAAC, visit the DAAC website. To request assistance from DAAC, send an email to support@daac.hpc.mil.