Systems News

The ARL DSRC Cray XC-40 system debuted at #19 of the world's TOP 500 list of most powerful supercomputers on November 17th, 2014. ARL took delivery of its most recent HPC system, a 101,184 core Cray XC-40. The system was acquired through the DOD High Performance Computing Modernization Program and will complement a stable of other HPC resources at the ARL DOD Supercomputing Resource Center (DSRC). The ARL DSRC Cray XC-40 system will be one of the largest to date in the DOD HPC Program boasting 101,184 cores augmented with 32 NVIDIA Tesla K40 GPGPUs. The system has a theoretical peak of over 3.7 PETA Flops, 400 TBytes of memory, and 122 TBytes of solid-state disk (or 'flash' storage). ARL's Excalibur system will serve as a key HPC resource for the DOD Research, Development and Test and Evaluation communities.

Image of Excalibur

ARL's newest HPC system, Excalibur, has over 101,000 cores and a theoretical peak speed of 3.7 PETAFlops.

Beginning last week, PBS experienced communications issues that killed jobs. This problem is ongoing and jobs may fail to start or be killed randomly. Our team is working with Cray and PBS to resolve the problem.

Users inappropriately running jobs on Batch nodes are contributing to this problem. Users must use "aprun" or "ccmrun" to properly place jobs on the compute nodes. Jobs running on Batch nodes overload the Batch node resources, causing it to crash and killing all user work linked to that Batch node. Please see the Lightning User's Guide for guidance on proper job scripts (www.afrl.hpc.mil/docs/lightningUserGuide.html#ccm and www.afrl.hpc.mil/docs/lightningUserGuide.html#launchCom). Additionally, you can log on to Lightning and go $SAMPLES_HOME for examples (found in Workload_Management and Parallel_Programming).

The AFRL DSRC reserves the right to kill any jobs running directly on a Batch node, as they can directly impact all users running jobs on Lightning.

If you have any questions, please contact CCAC by email at help@ccac.hpc.mil or by phone at 1-877-222-2039.

As of December 2, 2014, Copper will have a round-robin login configuration, meaning that one will only have to use copper.ors.hpc.mil when logging in. However, this initially will lead to the following when logging in via Linux:

[user@user ~]$ ssh copper.ors.hpc.mil
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
f8:b5:46:2c:6c:f6:5d:98:9e:01:28:19:ca:cd:2e:df.
Please contact your system administrator.
Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/user/.ssh/known_hosts:10
RSA host key for copper01 has changed and you have requested strict checking.
Host key verification failed.

This can be eliminated by editing the known_hosts file in the .ssh directory in the home directory on the system from which one is logging in. The entries for Copper in this file should be deleted. For example here are two for copper01 and copper02:


copper01.ors.hpc.mil,IP_ADDRESS ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDPmFsykLAmJ/uS+RWMN
Xf4zRGW3m7kk3a2rwxEAjMx2o7AIWwRjtaG2J07aX5sgtcRdZsilUcncff5+6uH1gSzO3pnTmAVuielJ
6QI9XRyCC5HJG6WaYuki38fSMHlGfsOe3Y7DnsUKhEvriWl7K9IoeUhJbFMpgQVa/mzTZShhQ==
copper02,IP_ADDRESS ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQDPmFsykLAmJ/uS+RWMN
Xf4zRGW3m7kk3a2rwxEAjMx2o7AIWwRjtaG2J07aX5sgtcRdZsilUcncff5+6uH1gSzO3pnTmAVuielJ
6QI9XRyCC5HJG6WaYuki38fSMHlGfsOe3Y7DnsUKhEvriWl7K9IoeUhJbFMpgQVa/mzTZShhQ==

The lines for both copper01 and copper02 need to be deleted and then the known_host file saved. Then, on the next login to Copper, the following will require an answer:


The authenticity of host 'copper02 (IP_ADDRESS)' can't be established.
RSA key fingerprint is f8:b5:46:2c:6c:f6:5d:98:9e:01:28:19:ca:cd:2e:df.
Are you sure you want to continue connecting (yes/no)?

Answering "yes" will populate the known_hosts file with an updated copper entry and log one into the system.

*************************************************************************

If logging in from a Windows system, the following message will be displayed:


WARNING - POTENTIAL SECURITY BREACH!
The server's host key does not match the one PuTTY has cached in the registry. This means that either the server administrator has changed the host key, or you have actually connected to another computer pretending to be the server.
The new rsa2 key fingerprint is: ssh-rsa 1024 cc:83:cc:83:d6:h7:94:3f:19:8b:b0:e3:4c:36:e7:cf
If you were expecting this change and trust the new key, hit Yes to update PuTTY's cache and continue connecting. If you want to carry on connecting but without updating the cache, hit No.
If you want to abandon the connection completely, hit Cancel. Hitting Cancel is the ONLY guaranteed safe choice.

One should simply hit Yes and proceed.

The HPC Portal is now available for your use at https://portal.hpc.mil to access Garnet and the Utility Server at the ERDC DSRC. The HPC Portal enables supercomputing and visualization capability from a web browser and requires no user-installed software on the local workstation.

What capabilities does the HPC Portal provide?

The HPC Portal enables access to supercomputing with a web browser. No software installation is required to use the HPC Portal, only a browser with CAC or YubiKey authentication, which enables access from desktops that do not permit user-installed software. The HPC Portal provides file management, an X-terminal, Ensight, ParaView, Tecplot, FieldView, Pointwise, distributed MATLAB, a job status dashboard, and, for approved users, the CREATE analysis codes Kestrel, Helios, SENTRi, Capstone, and DaVinci, as well as the MATLAB IDE, among other applications.

How do I access the HPC Portal?

Browse to the site https://portal.hpc.mil, where you'll be able to select a DSRC where you have an allocation (currently ERDC, NAVY, AFRL, and MHPCC are available via the HPC Portal). For convenience, all users of Garnet have been enabled for access to the HPC Portal without any additional requirements. At this site, you will authenticate via the HPCMP OpenID capability using your CAC or YubiKey. No downloaded software or Kerberos kit is required.

How do I get help?

Contact CCAC at http://centers.hpc.mil/users > Help. The HPC Portal also provides tutorials, videos, and a community forum.

Where can I read more about the HPC Portal (including Browser Compatibility)?

The HPC Portal uses HTML5 and WebGL technology, available in the newest generation of browsers (IE, Chrome, and Firefox), to allow interactive visualization of large science and engineering datasets. Users with older browsers may also use the portal with some limitations on performance. Read more about the development of the HPC Portal at http://www.mhpcc.hpc.mil/portal.

What if I want to suggest a feature or develop an application for the HPC Portal?

The HPC Portal Team is interested in your thoughts and feedback, including any suggestions for new applications. You may submit feedback by contacting CCAC. The portal team has created a Software Development Kit (SDK) to provide developers the ability to enable applications for use in the HPC Portal. For more information, check the Support Forum at the HPC Portal.