HPCMP Cloud
User Guide

1. Introduction

1.1. Document Scope and Assumptions

This document provides an overview and introduction to the use of the Open Research Service, HPCMP Cloud, located in AWS GovCloud, along with a description of the specific computing environment on HPCMP Cloud. The intent of this guide is to provide information that will enable the average user to perform computational tasks on the system. To receive the most benefit from the information provided here, you should be proficient in the following areas:

  • Use of the UNIX operating system
  • Use of an editor (e.g., vi or emacs)
  • Remote usage of computer systems via network or modem access
  • A selected programming language and its related tools and libraries

1.2. Policies to Review

Users are expected to be aware of the following policies for working on the HPCMP Cloud environment.

1.2.1. Login Node Abuse Policy

The login nodes, cloud01-cloud02, provide login access for HPCMP Cloud and support such activities as the launching of a cluster and general interactive use by all users. Consequently, memory or CPU-intensive programs running on the login nodes can significantly affect all users of the system. Therefore, only small applications requiring less than 10 minutes of runtime and less than 8 GBytes of memory are allowed on the login nodes. Any job running on the login nodes that exceeds these limits may be unilaterally terminated.

1.2.2. Workspace Management Policy

A default $WORKDIR variable is not setup automatically. Each cluster will have a "scratch" file system that can be manually pointed to with the $WORKDIR variable. The $WORKDIR file system is not backed up. You are responsible for managing files in your $WORKDIR directories by backing up files to the login node and deleting unneeded files. The $WORKDIR files that have not been saved will be deleted and permanently lost with cluster deletion.

Note: You WILL NOT be notified prior to scratch workspace deletion. You are responsible to monitor your workspace to prevent data loss.

1.3. Obtaining an Account

Please navigate to the following link: https://centers.hpc.mil/systems/ors.html#account

2. System Configuration

2.1. System Summary

HPCMP Cloud is an AWS HPC system. The login nodes are populated with AMD EPYC 7000 processors. HPCMP Cloud allows the user to create a compute cluster specific to their needs. The user optimizes the compute, network, and storage as required. The table below provides three different sample configurations. The user is welcome to modify further.

All clusters will be configured with compute nodes that share memory only on the node; memory is not shared across the nodes. Based on the I/O requirements for a given application, the cluster can be configured with the needed shared $WORKDIR.

HPCMP Cloud is intended to be used as a dedicated HPC system without the need for queuing jobs. Its login nodes are not to be used for computational (e.g., memory, IO, long executions) work. All executions that require large amounts of system resources must be sent to the compute nodes by cluster creation and job submission from the master/head node.

Node Configuration
Login Nodes Compute Nodes
High Compute Moderate Compute GPU Accelerated
Instance Type t3a.xlarge c5n.18xlarge c5n.4xlarge p3.16xlarge
Total Cores | Nodes 2 | 1 4,104 | 114 4,096 | 512 4,096 | 128
Operating System Amazon Linux 2
Cores/Node 2 36 8 32
Core Type AMD EPYC 7000 Intel Advanced Vector Extension 512 Intel Xeon E5-2686 v4
Core Speed 2.5 GHz 3.0 - 3.5 GHz 2.5 GHz
Memory/Node 16 GBytes 192 GBytes 42 GBytes 488 GBytes
Accessible Memory/Node 16 GBytes 192 GBytes 42 GBytes 488 GBytes
Memory Model Shared on node.
Interconnect Type Ethernet
File Systems on AWS GovCloud
Path Capacity Type
/p/home ($HOME)500 GBytesEFS

2.2. Operating System

The operating system on HPCMP Cloud's login and compute nodes is Amazon Linux 2. The compute nodes can provide access to dynamically shared objects and most of the typical Linux commands and basic functionality.

2.3. File Systems

HPCMP Cloud has the following file systems available:

2.3.1. /home/username/

This file system is an AWS EFS (Elastic File System), which is the equivalent of an NFS, locally mounted from HPCMP Cloud's login node. It has virtually unlimited storage capacity. All users have a home directory on this EFS which may be referenced by the environment variable $HOME.

Note: The home directory on your login node is not visible to the master node in the cluster. Files must be manually moved onto the master node from your login node.

2.3.2. Working Directory

This file system is locally mounted on the compute nodes. Each user will have to manually point to a working directory which may be referenced for example by the environment variable $WORKDIR. This directory may be one of the few different types listed below.

2.3.3. Elastic Block Storage (EBS)

The ParallelCluster utility (see section Lauching a Cluster) automatically creates and mounts an EBS of 20 GiB under the "/shared" directory that is accessible to the entire cluster (master/head node as well as compute nodes).

The default name may be changed by modifying the appropriate parameter in the Login Node $HOME/.parallelcluster/config file under the EBS section. The following example shows a 50 GiB shared Amazon EBS volume mounted at /myshared:

shared_dir = myshared 
volume_size = 50
2.3.4. Elastic File System (EFS)

Alternatively, if an undefined amount of scratch space will be needed by the job, then it is recommended to use an EFS shared file system. Amazon's EFS expands as data is being written to it and therefore the capacity is virtually unlimited. An EFS may be provisioned by modifying the appropriate parameter in the Login Node $HOME/.parallelcluster/config file under the EFS section. The following example mounts Amazon EFS at /efs:

shared_dir = efs

For further information on optimizing the shared scratch space for your cluster, refer to the AWS ParallelCluster User Guide.

3. Accessing the System

For instructions on setting up your login node for the very first time you login, please refer to the Signing onto the Login Node section below.

HPCMP AWS GovCloud High Level Architecture

3.1. Signing onto the Login Node

3.1.1. Download the SSH Key

Navigate to the following location (S3 buckets list within AWS) and find the bucket belonging to you. This bucket will be named in 'username-bucket' format. Locate your bucket and click on it. It will contain a private key named 'username.pem'. Only you have access to this bucket and can download this key.

https://console.amazonaws-us-gov.com/s3/buckets/dev-keypairhpcmp/?region=us-gov-east-1&tab=overview

Download the .pem file to use a UNIX terminal for login.

Note: This is a personal key-pair that only the user is meant to have access to.

The corresponding public key will be available in AWS and can be used during cluster creation as will be seen below.

Alternatively, a user may create other SSH key-pairs as needed (see the Importing Your Own SSH Key Pair section).

Your username will be the first part of your email address used to create the AWS account. For example, if your email used was username@example.com then the username will be username. The IP address for the login node is 52.222.57.229.

3.2. PuTTY

3.2.1. Using PuTTY

Download PuTTY. PuTTY is a popular SSH (Secure Shell) client for Windows. It is typically used for remote access to server computers over a network using the SSH protocol.

Connect to server using private key tutorial - https://devops.ionos.com/tutorials/use-ssh-keys-with-putty-on-windows/#use-existing-public-and-private-keys

3.2.2. Converting .pem to .ppk on Windows
  1. Click on Start menu > All Programs > PuTTY > PuTTYgen.
  2. In the following window select the option 'RSA' under 'Type of key to generate'.

    PuTTY Key Generator options
  3. Next, click on the option 'Load.' As PuTTY supports its native file format, it will only show files that have .ppk file extension. Therefore, users have to choose the 'All Files' option from the drop-down bar. It will display all key files included the .pem file.
  4. Now, select the .pem file that you want to convert. As aforementioned that PuTTYgen is used for SSH connectivity, so it crucial for users to select the specific file that they plan to convert and click 'Open.' To confirm, click on 'OK.'

    Select specific .pem file

    Successful import key message
  5. In the resultant window, click on 'Save private key' which will convert and save the key file in PuTTY compatible format.
  6. PuTTYgen will prompt a warning of saving the key without a passphrase. Click 'Yes'.

    Save private key screen
  7. Name your file and PuTTYgen will automatically add the .ppk file extension to it.
3.2.3. Adding the Key to Pageant
  1. Start Pageant from the PuTTY folder: Start-Menu > All Programs > PuTTY > Pageant

  2. Pageant starts by default minimized in the system tray. To begin adding your SSH keys, right-click the Pageant icon, and then the following context menu will appear:

  3. Clicking on Add Key from the menu or View Keys to open up the Pageant Key List window. Here you can view, add, and remove keys:

  4. Click the Add Key button. This will open the file explorer, where you can choose one or more keys at a time to load. You should select files with the .ppk extension:

  5. Click the Open button to load the keys with Pageant. After successfully adding a key, you can now see it listed:

  6. Once the key has been added to pagent:

    1. Enter the remote server Host Name or IP address under Session.
    2. Navigate to Connection > SSH > Auth.
    3. Click Browse... under Authentication parameters / Private key file for authentication.
    4. Locate the private key and click Open.
    5. Make sure to check "Allow agent forwarding"
    6. Finally, click Open again to log into the remote server with key pair authentication.

    1. 3.2.4. Checking the Path

      Once on the Login Node, run the following command:

      echo $PATH

      The path should include /home/"username"/.local/bin

      3.3. Using a Linux Shell

      Place the key where you will be using it from and make sure to change the permission so that it's accessible only by you. chmod 400 <key-file> will work.

      Before logging in through SSH, you can modify your .ssh/config file to configure your session and enable AgentForwarding. This allows you to SSH from your local machine to the Login Node, and once you create your cluster, seamlessly SSH onto your cluster's Master Node.

      Alternatively, you may copy your private key over to your account on the Login Node and SSH from there.

      Your config file may contain multiple entries. Add one for the Login Node:

      Host devlog # you can pick a name of your choosing
      HostName 52.222.57.229
      User <your-username>
      ForwardAgent yes
      IdentityFile ~/.ssh/DevLoginNode.pem #point to your key file here

      In order to enable Agent Forwarding, you must add to the ssh agent by running the following command each time you start your computer.

      ssh-add <your-key>

      Then to login:

      ssh devlog

      Or SSH explicitly using the following command:

      ssh -i /path_to_key_pair/my-key-pair.pem username@52.222.57.229

      3.4. Logging In

      ssh -i /path_to_key_pair/my-key-pair.pem username@52.222.57.229

      Your username will be the first part of your email address used to create the AWS account. For example, if your email used was username@example.com then the username will be username.

      3.5. File Transfers

      3.5.1. On Linux

      File transfers to Login node will be performed using the following tools: scp, psftp, sftp and pscp. The command below uses secure copy (scp) to copy a single local file into a destination directory on a HPCMP login node.

      scp -i mykeypair.pem test.txt username@52.222.57.229:/home/username

      Alternatively, the example below uses the secure file transfer protocol (sftp) to connect to HPCMP Login Node, then uses the sftp, cd, and put commands to change to the destination directory and copy a local file there. The sftp quit command ends the sftp session. Use the sftp help command to see a list of all sftp commands.

      sftp -i mykeypair.pem user@52.222.57.229
      sftp> cd target_dir
      sftp> put local_file
      sftp> quit
      3.5.2. On Windows

      Download the PSCP utility from PuTTy.org by clicking the file name link and saving it to your computer. To open a Command Prompt window, from the Start menu, click Run, and if necessary, set up your path variable.

      pscp -i mykeypair.ppk c:\documents\info.txt username@server.example.com:/home/username/info.txt

      4. User Environment

      The following user directories are provided for all users on HPCMP Cloud.

      4.1. User Directories

      4.1.1. Home Directory

      When you log on to the login node, you will be placed in your home directory, /home/username. The environment variable $HOME is automatically set for you in the login node and refers to this directory. $HOME is visible to only to the login node and may be used to store all user files.

      Note: On the master node of each cluster, $HOME will point to the /home/ec2-user directory, which is not the same as or linked to the /home/username directory on the login node.

      4.1.2. Work Directory

      When you log on to the Login Node, you will be placed in your home directory, /home/username. The environment variable $HOME is automatically set for you and refers to this directory.

      You may manually set the $WORKDIR variable to point to the "scratch" space or working directory of your choice on the compute cluster. For example, if you want to set the "/shared" directory as your work directory, do so as shown below:

      WORKDIR=/shared
      echo $WORKDIR
      /shared

      Note: All of your jobs execute from your $WORKDIR directory on the compute cluster, not $HOME of the login node. Copy any important files to $HOME on the login node from the $WORKDIR of the compute cluster before deleting the cluster.

      4.2. Shells

      The following shells are available on HPCMP Cloud: csh, bash, tcsh, and sh. You may use the chsh command to change your default shell as you please.

      4.3. Modules

      Software modules are a convenient way to set needed environment variables and include necessary directories in your path so that commands for particular applications can be found.

      Note: Currently no modules are loaded by default. The user must set up their modules manually.

      4.4. Archive Usage

      No archive process is in place. User may store their files in the $HOME of the Login Node.

      5. Program Development

      5.1. Programming Models

      Intel MPI

      5.2. Available Compilers

      GNU Compiler Collection

      The GNU Programming Environment provides a large number of options that are the same for all compilers in the suite. The following table lists some of the more common options that you may use:

      5.2.1. GNU Compiler Options
      GNU Compiler Options
      Option Purpose
      -c Generate intermediate object file but do not attempt to link.
      -l directory Search in directory for include or module files.
      -L directory Search in directory for libraries.
      -o outfile Name executable "outfile" rather than the default "a.out".
      -Olevel Set the optimization level. For more information on optimization, see the section on Profiling and Optimization below.
      -g Generate symbolic debug information.
      -fconvert=big-endian Big-endian files; the default is for little-endian.
      -Wextra
      -Wall
      Turns on increased error reporting.

      5.3. Debuggers, Code Profiling, and Optimization

      No licensed utilities are available at this time. The GNU Debugger (gdb) is available.

      5.3.1. Compiler Optimization Options

      The "-Olevel" option enables code optimization when compiling. The level that you choose (0-4) will determine how aggressive the optimization will be. Increasing levels of optimization may increase performance significantly, but you should note that a loss of precision may also occur. There are also additional options that may enable further optimizations. The following table contains the most commonly used options.

      Compiler Optimization Options
      Option Description
      -O0 No Optimization. (default in GNU)
      -O1 Scheduling within extended basic blocks is performed. Some register allocation is performed. No global optimization.
      -O2 Level 1 plus traditional scalar optimizations such as induction recognition and loop invariant motion are performed by the global optimizer. Generally safe and beneficial. (default in PGI, Cray, & Intel)
      -O3 Levels 1 and 2 plus more aggressive code hoisting and scalar replacement optimizations that may or may not be profitable. Generally beneficial.
      5.3.2. Performance Optimization Methods

      Optimization generally increases compilation time and executable size, and may make debugging difficult. However, it usually produces code that runs significantly faster. The optimizations that you can use will vary depending on your code and the system on which you are running.

      Before considering optimization, you should always ensure that your code runs correctly and produces valid output.

      In general, there are four main categories of optimization:

      • Global Optimization
      • Loop Optimization
      • Interprocedural Analysis and Optimization(IPA)
      • Function Inlining

      6. Job Scheduling

      HPCMP Cloud is intended as an elastic, scalable and on-demand cloud infrastructure to run your HPC applications. A dedicated cluster is stood up by the user for the duration of their job(s) and then taken down once the job has completed.

      The Login Node serves as the single point of entry into the environment, and this is where the user performs cluster creation and deletion with the ParallelCluster utility available on the login node (described in detail below). ParallelCluster creates a master or head node along with a fleet of compute nodes as specified by the user. The user submits and monitors jobs from the master node.

      7. Setting Up MFA Access

      Your permissions within the AWS environment are limited by default for enhanced security. To gain elevated privileges needed to launch a cluster and perform other related functions, you must obtain temporary credentials that authenticate you using your Multi-factor Authentication (MFA) token (shown below) for 12 hours (or the length of your session).

      MFA Token device

      If your SSH session is terminated, you must run the request_session.sh shell script again to re-authenticate with MFA.

      For a first-time user, the following configuration is needed from the Login Node:

      vi request_session.sh #or text editor of choice

      Edit the snumber parameter to equal the Serial Number on the back of your MFA token, save and quit.

      #!/bin/bash
      
      # Store your MFA Token's Serial Number
      snumber=SPBT<your-serial-number>
      
      ...

      Now run the script as follows:

      . ./request_session.sh
      Please enter your token code (obtained from your MFA token):
      <enter your 6 digit token code>
      Authentication Successful! Your session will expire in 12 hours.

      You are now authenticated with elevated permissions.

      8. Importing Your Own SSH Key Pair

      AWS EC2 key pairs are required to access EC2 instances via SSH.

      8.1. From the AWS CLI (from a local Linux system)

      Using the following command locally saves the private key with a name you specify, for ex. MyKeyPair.pem

      aws ec2 create-key-pair --key-name MyKeyPair --query 'KeyMaterial' --output text > MyKeyPair.pem

      Change the permissions on the key to prevent tampering

      chmod 400 MyKeyPair.pem

      8.2. From the AWS Console (from a local Windows system)

      1. Open the Amazon EC2 console at https://console.amazonaws-us-gov.com/ec2/home?region=us-gov-west-1#KeyPairs
      2. Choose Create key pair.
      3. For Name, enter a descriptive name for the key pair, for example, MyKeyPair. The name can include up to 255 ASCII characters. It cannot include leading or trailing spaces.
      4. For File format, choose the format in which to save the private key. To save the private key in a format that can be used with OpenSSH, choose pem. To save the private key in a format that can be used with PuTTY, choose ppk.
      5. Choose Create key pair.
      6. The private key should download.

      8.3. Retrieving the public key for your key pair on Linux

      On your local Linux or macOS computer, you can use the ssh-keygen command to retrieve the public key for your key pair. Specify the path where you downloaded your private key (the .pem file).

      ssh-keygen -y -f /path_to_key_pair/my-key-pair.pem

      The command returns the public key, as shown in the following example.

      ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQClKsfkNkuSevGj3eYhCe53pcjqP3maAhDFcvBS7O6V
      hz2ItxCih+PnDSUaw+WNQn/mZphTk/a/gU8jEzoOWbkM4yxyb/wB96xbiFveSFJuOp/d6RJhJOI0iBXr
      lsLnBItntckiJ7FbtxJMXLvvwJryDUilBMTjYtwB+QhYXUMOzce5Pjz5/i8SeJtjnV3iAoG/cQk+0FzZ
      qaeJAAHco+CY/5WrUBkrHmFJr6HcXkvJdWPkYQS3xqC0+FmUZofz221CBt5IMucxXPkX4rWi+z7wB3Rb
      BQoQzd8v7yeb7OzlPnWOyN0qFU0XA246RA8QFYiCNYwI3f05p6KLxEXAMPLE

      If the command fails, run the following command to ensure that you've changed the permissions on your key pair file so that only you can view it.

      chmod 400 my-key-pair.pem

      These steps can also be performed on Windows using PuTTYgen.

      8.4. Importing keys into AWS

      1. If you created a key in the 'west' region for example, you can import it into the 'east' region as follows:
      2. Navigate to the east region Key Pairs page - https://us-gov-east-1.console.amazonaws-us-gov.com/ec2/home?region=us-gov-east-1#KeyPairs.
      3. Click on Actions, then Import key pair

        Screenshot of the Import key pair action
      4. Name your key. It can be the same as for the west region. Then paste in the public key information that was obtained in the previous section, and click Import key pair. Your key is now available in the east region.

        Screenshot of Import Key Pair page

      8.5. Replacing your login node key pair

      1. Create a new key pair using the Amazon EC2 console or a third-party tool.
      2. Retrieve the public key from your new key pair as shown above.
      3. Connect to your instance using your existing private key file.
      4. Using a text editor of your choice, open the .ssh/authorized_keys file on the instance. Paste the public key information from your new key pair underneath the existing public key information. Save the file.
      5. Disconnect from your instance, and test that you can connect to your instance using the new private key file.
      6. When you're replacing an existing key pair, connect to your instance and delete the public key information for the original key pair from the .ssh/authorized_keys file.

      9. Configuring AWS CLI

      The AWS CLI has already been installed on the login node. You can configure it using your credentials. The AWS Access Key ID and AWS Secret Access Key are your AWS credentials. They are associated with an AWS Identity and Access Management (IAM) user or role that determines what permissions you have.

      Access keys consist of an access key ID and secret access key, which are used to sign programmatic requests that you make to AWS. If you don't have access keys, you can create them from the AWS Management Console.

      The only time that you can view or download the secret access key is when you create the keys. You cannot recover them later. However, you can create new access keys at any time.

      Note: Your account may already have an Access Key created and shown from the time of account creation. If you see there are keys in your account that you do not have access to, you should delete that key and create a new one as shown below.

      To create access keys for an IAM user:

      1. Sign in to the AWS Management Console and open the IAM console by clicking your username, then "My Security Credentials."
      2. In the Access keys section, choose Create access key.
      3. To view the new access key pair, choose Show. You will not have access to the secret access key again after this dialog box closes. Your credentials will look something like this:
        1. Access key ID: AKIAIOSFODNN7EXAMPLE
        2. Secret access key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
      4. To download the key pair, choose Download .csv file. Store the keys in a secure location. You will not have access to the secret access key again after this dialog box closes.
      5. Keep the keys confidential in order to protect your AWS account and never email them. Do not share them outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your secret key.
      6. After you download the .csv file, choose Close. When you create an access key, the key pair is active by default, and you can use the pair right away.
      7. Set up your AWS credentials using the AWS Key that you just created. Make sure you enter the correct region name.
        aws configure
        AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
        AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
        Default region name [us-east-1]: us-gov-west-1
        Default output format [None]:

      10. Launching a Cluster

      Below is a sample configuration, your cloud admin (hpcaceadmin@jaspersolutions.com) will provide you with the appropriate login node $HOME/.parallelcluster/config file to use based on your use-case.

      Note: The following tags are required for launching the cluster: hpcmp:user-name, hpcmp:group-name, hpcmp:project-name. They must be included in the [cluster default] section of your config file, as follows:

      tags = {"hpcmp:user-name": "CLDproj00XX", "hpcmp:group-name": "dev", "hpcmp:project-name": "test"}

      10.1. Configure ParallelCluster

      pcluster configure

      From the list of valid AWS Region identifiers, choose the us-gov-west-1 Region in which you want your cluster to run.

      Allowed values for the AWS Region ID:
      1. us-gov-east-1
      2. us-gov-west-1
      AWS Region ID [us-gov-west-1]:

      Pick the EC2 Key Pair that you wish to use. Look at the section "Importing Your Own SSH Key Pair" (above) to create your own key pair.

      Allowed values for EC2 Key Pair Name:
      1. MyKeyPair
      EC2 Key Pair Name [MyKeyPair]:

      Choose the scheduler to use with your cluster. SLURM works well with PBS scripts.

      Allowed values for Scheduler:
      1. sge
      2. torque
      3. slurm
      4. awsbatch
      Scheduler [sge]:

      Choose the operating system. CentOS is not supported in AWS GovCloud.

      Allowed values for Operating System:
      1. alinux
      2. alinux2
      3. ubuntu1604
      4. ubuntu1804
      Operating System [alinux]:

      The minimum and maximum size of the cluster of compute nodes is entered. This is measured in number of instances.

      Minimum cluster size (instances) [0]:
      Maximum cluster size (instances) [10]:

      The master and compute nodes instance types are entered.

      Master instance type [t2.micro]:
      Compute instance type [t2.micro]:

      Do not create a new VPC. Please select the existing VPC-Test.

      Automate VPC creation? (y/n) [n]: n
      Allowed values for VPC ID:
      1. vpc-90bba9f4 | VPC-Test | 2 subnets inside
      2. vpc-ad9e8cc9 | VPC-JasperDev | 1 subnets inside
      3. vpc-d7090db3 | Default | 0 subnets inside
      VPC ID [vpc-0b4ad9c4678d3c7ad]: 1

      After the VPC has been selected, decide whether to use existing subnets or create new ones. Please use the existing subnets in the Test VPC. Pick the public subnet for the master instance and the private subnet for the compute instances.

      Automate Subnet creation? (y/n) [y]: n
      Allowed values for Master Subnet ID:
      1. subnet-8524d6e2 | Subnet-TestPrivate | Subnet size: 16384
      2. subnet-f23fcd95 | Subnet-TestPublic | Subnet size: 1024Master Subnet ID
      subnet-027125c6a81d73006]: 2
      Allowed values for Compute Subnet ID:
      1. subnet-8524d6e2 | Subnet-TestPrivate | Subnet size: 16384
      2. subnet-f23fcd95 | Subnet-TestPublic | Subnet size: 1024
      Compute Subnet ID [subnet-093c3f1589a870ff0]: 1

      Configuration is complete.

      Configuration file written to /home/"user"/.parallelcluster/config
      You can edit your configuration file or simply run 'pcluster create -c /home/"user"/.parallelcluster/config cluster-name' to create your cluster

      10.2. Launch Cluster Command

      When all settings contain valid values, you can launch the cluster by running the create command where 'mycluster' is the cluster name.

      $ pcluster create mycluster

      After the cluster reaches the "CREATE_COMPLETE" status, you can connect to it by using your normal SSH client settings.

      $ pcluster ssh HelloCluster –i ~/.ssh/mysshkey.pem

      OR

      $ ssh –i MyKeyPair ec2-user@<Master server public IP>

      10.3. AMI List and Config Files

      Each config file will create a cluster with the associated Amazon Machine Image (AMI), wherein the master node of the cluster will have 50 GiB storage by default, and a "/shared" EBS volume that can be set as the $WORKDIR.

      These config files are intended to serve as templates to get the user started and also allow the user to make any further customizations. The user is required to fill in the tags in the config file. They're located at:

      https://console.amazonaws-us-gov.com/s3/buckets/dev-parallelcluster-config-files/?region=us-gov-west-1&tab=overview

      Below is a catalog of AMIs as well the packages each AMI contains. You can pick the appropriate AMI from the list; wherein, total cores = number of nodes * number of cores per node.

      AMI Use Cases
      Use Case AMI ID Region Packages Config File Resources
      per Node
      baseline ami-74c4f215 West Base alinux2 created by AWS, used to build other AMIs N/A
      baseline_al2_EFA_w ami-05284c732a36a5e4a West gcc, gmake, cmake, HDF5, LAPACK, git, netcdf, intel MPI and EFA baseline_moderate_west
      baseline_moderate_efs_west
      8 cores
      avfleslie_al2_w ami-0240f2a3ae87bb116 West baseline +
      avfleslie
      avfleslie_west 36 cores
      FFTW_al2_EFA_w ami-06b77e929540b0ceb West baseline +
      FFTW
      FFTW_west 36 cores
      hycom_al2_EFA_w ami-0dc68ef211321e106 West baseline +
      hycom
      hycom_efs_west 36 cores
      tensorflow_cuda_al2_w ami-0eefb36487bd23928 West cuda +
      anaconda +
      TensorFlow
      tensorflow_efs_west 8 cores
      1 GPU
      baseline_al2_EFA_e ami-03ce8abecc3ef1ad8 East gcc, gmake, cmake, HDF5, LAPACK, git, netcdf, intel MPI and EFA baseline_moderate_east
      baseline_moderate_efs_east
      8 cores
      emacs_al2_EFA_e ami-0b8b9c58bd9d22181 East Baseline EFA
      with emacs
      emacs_baseline_moderate_east 8 cores
      avfleslie_al2_e ami-0e16badb17352c7c7 East baseline +
      avfleslie
      avfleslie_east 36 cores
      FFTW_al2_EFA_e ami-0a826ce9427a5a884 East baseline +
      FFTW
      FFTW_east 36 cores
      hycom_al2_EFA_e ami-0aeb3ec0a320e89f2 East baseline +
      hycom
      hycom_efs_east 36 cores
      tensorflow_cuda_al2_e ami-06a6d709837066ea8 East baseline +
      cuda +
      tensorflow
      tensorflow_efs_east 8 cores
      1 GPU
      10.3.1. AMI Tree

      Listing of available Amazon Machine Images

      11. Running a Job

      To verify the current state of your cluster, enter the following command (with SLURM scheduler). You'll see three hosts running. For example,

      $ sinfo
      PARTITION AVAIL TIMELIMIT NODES STATE  NODELIST
      batch     up     infinite     2 alloc  adev[8-9]
      batch     up     infinite     6 idle   adev[10-15]
      debug*    up        30:00     8 idle   adev[0-7]

      Set up the working directory variable to point to the shared EBS drive:

      WORKDIR=/shared

      Submit the job:

      sbatch <script-name>

      Monitor the job:

      squeue

      12. Deleting the Cluster

      Once a job has completed, the cluster and all of its associated resources can be deleted with the following command issued from the Login Node.

      pcluster delete mycluster

      Note: All data on the cluster that hasn't been moved to the Login Node will be deleted along with the cluster.

      13. Batch Scheduling

      13.1. Scheduler

      SLURM (Simple Linux Utility for Resource Management) is currently running on HPCMP Master Node. It is a software package for submitting, scheduling, and monitoring jobs on large compute clusters. Slurm is automatically loaded for you when you log in.

      13.2. Interactive Logins

      When you log in to the compute cluster, you will be running in an interactive shell on a master node. You can run Slurm in interactive mode by, for example:

      srun --pty bash -i

      Srun's --pty option runs task zero in pseudo terminal mode. Bash's -i option tells it to run in interactive mode (with prompts).

      14. Recommended Workflow

      The below process is an overview of the Login Node and the compute cluster with a sample workflow that a user might employ for running their job.

      Note: It is assumed that the user already has an account on AWS and has performed the initial first-time login steps necessary to set up their accounts on the Login Node.

      Transfer your files into your home directory of your login node by replacing username with yours.

      scp -i <your-key> test.txt username@52.222.57.229:/home/username

      Sign onto the Login Node.

      ssh -A -i <your-key> username@52.222.57.229

      Obtain temporary elevated crendentials.

      . ./request_session.sh

      Download a pre-configured config file to use with ParallelCluster from the designated S3 bucket.

      aws s3 cp s3://dev-parallelcluster-config-files/East/Baseline/baseline_moderate_east $HOME/.parallelcluster/

      Modify the config file to your specifications.

      vi .parallelcluster/baseline_moderate_east 
      [aws]
      aws_region_name = us-gov-east-1
      
      [global]
      cluster_template = default
      update_check = true
      sanity_check = true
      
      [aliases]
      ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}
      
      [cluster default]
      key_name = MyKeyPairEast
      base_os = alinux2
      scheduler = slurm
      initial_queue_size = 1
      max_queue_size = 10
      master_instance_type = t3.micro
      compute_instance_type = c5n.4xlarge
      disable_hyperthreading = true
      placement_group = DYNAMIC
      custom_ami = ami-03ce8abecc3ef1ad8
      master_root_volume_size = 50
      tags = {"hpcmp:user-name": "CLDproj00  ", "hpcmp:group-name": " ", "hpcmp:project-name": " "}
      
      [vpc default]
      vpc_id = vpc-0c0abc0cb9540daaa
      master_subnet_id = subnet-0b37638f9ca65ea1e
      compute_subnet_id = subnet-00cf48ac5bec8c64b

      Launch the cluster. Be sure to point to the config file that you edited.

      pcluster create <cluster-name-of-choice> -c  ~/.parallelcluster/baseline_moderate_east

      Copy necessary files, code, and scripts to the master node.

      scp -i <your-key> test.txt ec2-user@<master node ip address>:/

      SSH to the master node.

      ssh ec2-user@<master node ip address>

      Setup the working directory. A 20 GiB EBS is automatically attached to the cluster. Designate this EBC as $WORKDIR.

      WORKDIR=/shared

      Compile your code.

      Setup your job script.

      Submit your job.

      sbatch <script-name>

      Monitor your job.

      squeue

      Move output files from the master node to the Login Node.

      scp <output-files> username@52.222.57.229:/home/username

      Delete the cluster.

      pcluster delete <cluster-name-of-choice>