Topic: HPCMP Baseline Configuration Survey
Date Received: March 20, 2007

OVERALL

Of all the MSRCs, ASC has the best setup. I would recommend that all the other MSRCs (ARL, ERDC are the other two I have used) move to a similar setup. I see the biggest differences in:

  1. environment variables + batch environment (e.g. LSF)
  2. archival retrieval
  3. home directory (storage mechanism, quota should be at least 1GB, etc.)
  4. open-source math libraries

More of an observation without any real solution on my part. I believe that there might be an over allocation of users (regular vs. Challenge) on some of these MSRC machines. At some machines, there are too many of either type of user (regular vs. Challenge) and so one type of a user is inevitably at a disadvantage.

It would also be helpful if there was some central place that a user could get a suggestion on what machine to run on give his/her needs. For example, you would ask memory per process/thread, network (fast/slow), file storage, job size + length and it would recommend an MSRC and a specific name. For example, my Challenge project necessitates the use of no less than a 256 processor per job. But on some machines (particularly JVN), the network is a bottleneck and so the code cannot scale beyond 128 processors.

BC Team Feedback
Reply Date:
May 30, 2007

In your 20 February 2007 e-mail to Mr. Jeff Graham, Team Lead of the DoD HPCMP Baseline Configuration (BC) Initiative, you recommended a "Common" setup for the HPCMP HPC Centers. Areas of interest included:

  1. Environment variables and batch environment (e.g. LSF),
  2. Archival retrieval,
  3. Home directory (storage mechanism, quota should be at least 1GB, etc.)
  4. Open source math libraries,
  5. "Central Location" where a user can get advice on "which center" and "which machine" to run for a specific job.

The purpose of this note is to seek from you more details and information on the above to help us address your valued needs in the immediate future.

  1. As you may already know, the BC team has established a policy called "Common Set of Environment Variables (FY05-04)" in which a core set of environment variables are defined to represent the same thing at each of the HPCMP participating sites. The environment variables included in FY05-04 are: WORKDIR, ARCHIVE_HOME, ARCHIVE_HOST, PET_HOME and JAVA_HOME. The BC team is very eager to know of other environment variables that you would like to see included in our policy FY05-04.
  2. With regard to your request for a common archival and retrieval system, we would appreciate getting more details on the common archival and retrieval features and capabilities that are essential in your daily work.
  3. Please elaborate on the storage mechanism features that you are interested in. Your request for a quota of 1 GB minimum is a topic outside the BC scope. Separately, we will be forwarding your request to the User Advocacy Group (UAG), and kindly ask them to include the subject as an agenda item at their next UAG meeting.
  4. In our current list of 15 completed BC policies, the BC team has established a common set of open source math libraries (FY06-01) across all participating HPCMP sites. The latest release of policy FY06-01 includes the following math libraries: ARPACK, FFTW (both MPI and non-MPI versions), PETSc, SUPERLU, ScaLAPACK (ATLAS (BLAS), LAPACK and BLACKS), SPRNG and GSL. We would like to know if there are other open source math libraries that you make use of at more than one site.
  5. Your request for a "Central Location" to advice users on which machine to utilize for a given job is outside the BC scope. The subject, however, is very interesting and worthy of further consideration. If you are attending the 2007 HPCMP Users Group Conference, there will be a number of presentations on Tuesday afternoon, June 19, 2007, in room D, related to this subject. The talk entitled "Targeting CTA Based Computing to Specific Architectures Based upon HPCMP Systems Assessment," by Paul Bennett, is particularly relevant.

The BC team looks forward to your valued input.