AFRL DSRC Archive Guide

1. Archival Basics

1.1. Why do I need to archive my data?

The short answer is to free up system resources and protect your data.

Your work directory, $WORKDIR, resides on a large temporary file system that is shared with other users. This file system is intended to temporarily hold data that is needed or generated by your jobs. Since there is no quota on these directories and since user jobs often generate a lot of data, the file system would fill up very quickly if everyone was allowed to just leave their files there indefinitely. This would negatively impact everyone and make the system unusable. To protect the system, an automated purge cycle may run to free up disk space by deleting older or unused files. And, if file space becomes critically low, ALL FILES, regardless of age, are subject to deletion. To avoid this, we strongly encourage you to archive the data you want and keep your $WORKDIR clean by removing unnecessary files. Remember your $WORKDIR is not backed up; so, if your files are purged, and you didn't archive them, they are gone forever!

1.2. How does archival work?

The archive system ($ARCHIVE_HOST) provides a long-term storage area for your important data. It is extremely large, and your personal archive directory ($ARCHIVE_HOME) has no quota. Even so, you probably don't want to archive everything you generate.

When you archive a file, it's copied to your $ARCHIVE_HOME directory on the archive server's disk cache, where it waits to be written to tape by the system. The disk cache is a large temporary storage area for files moving to and from tape. A file in the cache is said to be "online," while a file on tape is "offline." Once your file is written to tape, it may remain "online" for a short time, but eventually it is removed from the disk cache to make room for other files in transit. Both online and offline files show up in a directory listing, but offline files need to be retrieved from tape before you can use them.

Retrieval from tape can take a while, so be patient; there's a lot going on in the background. First, the system must determine on which tape (or tapes) your data resides. These are then robotically pulled from the tape library, mounted in one of the limited number of tape drives (assuming not all of them are busy), and wound into position before retrieval can begin. Your wait time depends on how big your file is, how many tapes it is spread across, and how many other archival jobs are running. After a delay, your file is retrieved from tape and available for use.

1.2.1. Is there any way to estimate retrieval time?

Transfer or wait times are dependent on a multitude of parameters ranging from size and number of files, to number of tapes involved, to network load (tape-to-cache and cache-to-hpc), and others too numerous to list. Since most of these parameters constantly vary throughout the day, estimating transfer times can be very difficult. To assist you with estimating transfer times, the following table lists observed transfer times at AFRL DSRC for files of various sizes:

Observed File Retrieval Times from Archive to HPC
File
Size
Sample
Size
Average
Time
Median
Time
80% Finish
Within
90% Finish
Within
10 GB 640 64 Sec. 84 Sec. 10 Min. 12.5 Min.
100 GB 127 12 Min. 11 Min. 15 Min. 16 Min.
200 GB 63 15 Min. 15 Min. 25 Min. 26 Min.
500 GB 9 26 Min. 30 Min. 49 Min. 50 Min.
1 TB 16 1 Hour 50 Min. 95 Min. 100 Min.

1.3. What are the archival configurations?

When using the standardized archival commands (see Section 3 below), the details of the archival configuration at a center are unimportant. Some archival functions, however, can't be done with the archive command, so it's helpful to understand the archival setup wherever you're working.

There are two main archival processes currently in use across the HPCMP, but each DSRC has minor variations that affect how you access the archive server. Some centers use NFS to mount their archive system on their HPC systems, so they appear as directories on the local machine. This is very convenient but slightly slower. Other centers provide access via remote commands, such as scp or rsh. It's a little less convenient but slightly faster. Other centers do both, and at those centers, you can choose which method to use. In addition, some centers allow direct login to their archive server, allowing you to easily manage archived files.

The table below shows the access methods in use at each center.

Archival Processes at the DSRC's
Center NFS Mount Remote Commands Direct Login
AFRL x x x
ARL x
ERDC x x (from HPCMP resources) x (from HPCMP resources)
MHPCC x
Navy x x x
1.3.1. NFS Mount (All Sites)

An NFS-mounted archive file system provides perhaps the most familiar environment for interacting with archived data. Mounted file systems appear as local directories and are accessible via standard Linux commands, such as cd, mkdir, chmod, etc. Files can be archived/retrieved simply by copying them to/from $ARCHIVE_HOME. This approach is extremely convenient and has virtually no learning curve but can result in slightly slower transfer speeds, which may be more evident with larger files. It's also not portable if used in job scripts. For portability, we recommend you use the archive command discussed in Section 3.2. The $ARCHIVE_HOST environment variable is irrelevant for NFS-mounted file systems.

1.3.2. Remote Commands (AFRL, ERDC, and Navy)

Remote access to the archive servers, while not quite as convenient as an NFS mount, provides a slightly faster transfer speed due to lower network overhead. Files can be archived/retrieved using commands such as scp, rcp, or mpscp, and other functions (such as chmod, mkdir, etc.) can be performed via the remote shell commands, ssh and rsh. Remote commands are available in Kerberized and non-Kerberized variants, and each center may support a slightly different set of commands. In addition, Kerberized commands generally don't work in transfer queue jobs. All these commands can make use of the $ARCHIVE_HOST and $ARCHIVE_HOME environment variables. Some remote commands are demonstrated in Section 3.3 (below). Note that scripts using remote commands may not be portable. For portability, we recommend you use the archive command discussed in Section 3.2.

1.3.3. Direct Login (AFRL, ERDC, and Navy)

Direct login to the archive server provides a standard Linux environment with access to all the familiar commands, such as cp, mv, mkdir, rmdir, chmod, etc. This allows you to organize your archived content, set permissions, or delete content that is no longer needed easily and efficiently. Most commands can be run without causing the retrieval of anything from tape. However, actions requiring access to the contents of a file automatically retrieve the file and take longer to complete. For example: copying or editing a file already on tape, tar operations, compression operations, etc.

1.4. What is data staging?

Data staging is the process of making sure your data is in the right place at the right time. Related terms are "staging in" or "pre-staging" and "staging out" or "post-job archival." Before a job can run, the input data needs to be "staged in" or "pre-staged." This simply means the data is copied from the archive server (or some other source) into a directory accessible by the job script. Archiving your output data after the job completes is called "post-job archival" or "staging out." "Staging out" may also refer to moving your output data to another location, like the Center-Wide File System ($CENTER) for further processing.

Staging may be performed manually or via a batch script, but since retrieving a file (especially a large file) from tape may take a while, ensuring your input data is in place before your job runs and stays there until it runs, isn't always as simple as it sounds. To help with this, every HPC system has a transfer queue just for handling file transfers. For more about manual staging, see Section 3 (below). For more about batch staging with the transfer queue, see Section 5 (below).

2. Important Guidelines

These guidelines are important to help safeguard stability of the archive server and to minimize negative impact to all users. Failure to observe these guidelines may result in loss of archival privileges.

2.1. Use compressed tar files

There are two factors that make archival using compressed tar files a good idea: overhead and size.

First, let's look at overhead. Every time you archive or retrieve a file, a complex set of time-consuming actions occurs. Some of these actions are described in Section 1.2, but there are others as well. So, if you archive 100 individual files, those time-consuming actions must be performed 100 times. This can really add up. But if you combine those 100 files into a single tar file, those time-consuming actions happen only once. Also note that NOT using tar files can adversely impact the performance of the archive server for all users.

Now let's look at size. By compressing a tar file, you not only save space on the archive server (which benefits everyone), but you also increase the likelihood your file can fit entirely on a single tape, eliminating the need to pull and mount multiple tapes and decreasing the chance of file corruption. It also reduces the transfer time when moving the file to or from the archive server. Note: Always remember tar/gzip your files before transferring them. There is, however, one gotcha you need to watch for when using tar files. Do not make them too big. While the optimal tar file size may vary between sites, a maximum tar file size of about 500 GB is a good rule-of-thumb. At that size, the time required for file transfer and tape I/O is still reasonable. Files larger than 1 TB are far more likely to span tapes, greatly increasing archival and retrieval times, as well as the chance that a portion of the file could become unusable. The following table shows the maximum recommended tar file sizes at each of the centers.

Recommended Maximum Tar File Size
Center Recommended Size
AFRL 500 GB
ARL 200 GB
ERDC 500 GB
MHPCC 200 GB
Navy 500 GB

There is one final caveat to address. If your files are mostly binary data, compressing them does little good and could possibly cost more time than would be saved. If this is true of your data, you should forego compression, though we still recommend combining multiple files into a single tar file.

2.2. Do not overwhelm the archive system

Although the archive system provides enormous capacity, it is, in fact, limited in two important ways. The most significant limit is the number of tape drives, which determines the number of tapes that can be read from or written to at once. The second limit is the size of the disk cache, which determines how much data can be online at once.

Attempting to archive or retrieve too many files at once can fill up the disk cache on the archive server, halting archival and staging for all users. Even if the cache does not reach capacity, it could still tie up all available tape drives, impacting other users. To avoid this possibility, if you need to retrieve more than about 10 TB of data or more than about 300 files at once, please contact the HPC Help Desk for assistance.

2.3. Do not use files in the archive directly

This is a common mistake for users who are logged into the archive server directly or who use an NFS-mounted archive partition. The important thing to realize is that although files appear to be on disk, they're actually on tape. Any attempt to use those files (for instance with commands like tar, vi, more, less, or grep) begins the time-consuming process of retrieving the file from tape. Imagine the result of the following: zcat *.tar.gz | tar -tv | grep search_term

The intent of this command would be to grep through the content listings of multiple compressed tar files for a search term. On a normal file system, this would be no big deal. But on an archive file system, this requires the retrieval of every one of the compressed tar files (possibly thousands of files), which could potentially overwhelm the disk cache on the archive server. This is undesirable.

If you find you have inadvertently done something like this, cancel the command immediately and contact the HPC Help Desk.

2.4. Treat important data with appropriate caution

The AFRL DSRC cannot guarantee against unexpected data loss or corruption. Especially important or irreplaceable data should be stored in a second location at your local facility.

3. Archival from the Command Line (Manual Staging)

3.1. Why might I choose to manually stage my data?

Manual staging is simply staging from the command line without using the transfer queue. For many users, this is the simplest way to do staging because small data sets can usually be transferred while you wait. (Your mileage may vary based on system load.) There are, however, a few things to consider before deciding to stage data manually.

  • Check the size of your data first - if your data exceeds 500 GB, you may want to consider staging via the transfer queue. See Section 5 for additional details.
  • Start with a fresh Kerberos ticket - if your transfer time exceeds the lifetime of your Kerberos ticket, your transfer could fail. To help avoid this, get a new ticket before beginning your transfer.
  • Start with a fresh login shell - due to security considerations, your login shell may be automatically terminated after 24 hours. If you start with a fresh shell, your transfer has a full 24 hours to complete.
  • Consider "backgrounding" your transfer - by placing your running transfer into the background, it continues to run, even if your shell doesn't. For example:

    nohup archive get myfile.tar.gz &

3.2. Standardized Archive Command

The archive command is available on all HPC systems, allowing you to use the same commands to perform common archival tasks regardless of where you're running or how the local archive server is configured. The archive command can use wild cards when listing, archiving, or retrieving files, and works the same way in transfer queue job scripts as in an interactive login shell. The archive command uses $ARCHIVE_HOME as its default target directory on the archive server unless an alternative path is specified with the -C path option. For operations within $ARCHIVE_HOME, -C path may be omitted. For complete information on the archive command see the archive man page on the systems.

Functions covered by the archive command are demonstrated below.

3.2.1. Listing files

To list files on the archive server, use the following command: archive ls -al [-C path]

3.2.2. Archiving files

To send one or more files to the archive server, use the following command: archive put [-C path] file1.tar.gz file2.tar.gz ...

3.2.3. Retrieving files

To retrieve a single file from the archive server, use the following command: archive get [-C path] file1.tar.gz

Multiple files can be retrieved by listing them in sequence or by using wildcards. However, wildcard strings must be enclosed in double quotes, as shown below. archive get [-C path] "file*"

3.2.4. Making directories

To create a directory on the archive server, use the following command: archive mkdir [-C path] [-m mode] [-p] dir1 dir2 ...

The -m mode option sets permissions on the newly created directory. It is equivalent to executing chmod on the directory using numeric mode specifiers, for instance, -m 750.

The -p option creates necessary intermediate directories in a path if they don't already exist.

3.2.5. Checking server status

Before performing an archive operation, it's always a good idea to check that the archive server is up and available. To check the server status, use the following command: archive stat

3.3. Non-standardized Archival Commands

There are, unfortunately, several functions not currently covered by the standardized archive command. If you need to chmod, rm, or mv a file or directory on the archive server, there's currently no standardized way to do it, so you'll have to rely on methods that may differ from center to center. For the AFRL DSRC, the following commands are recommended:

3.3.1. Deleting a file

To delete a file on the archive server, use the following command: rm $ARCHIVE_HOME/file

3.3.2. Deleting a directory

To delete a directory on the archive server, use the following command: rmdir $ARCHIVE_HOME/directory

3.3.3. Moving or renaming a file or directory

To move or rename a file or directory on the archive server, use the following command: mv $ARCHIVE_HOME/file $ARCHIVE_HOME/file-new

3.3.4. Changing the permissions of a file or directory

To change the permissions of a file or directory on the archive server, use the following command: chmod [-R] permission $ARCHIVE_HOME/file The -R option recursively changes the permissions of all matching directories and files beneath the specified directory.

4. Archival in Compute Jobs

Archival and retrieval operations within a batch script running in a compute queue are generally a really bad idea and are strongly discouraged. While your data is being transferred, the cores reserved by your compute job sit idle and are unavailable to other jobs but continue to accrue time, wasting your allocation. In addition, archival access (and possibly even the archive command) is not available from compute queues at all centers, and compute job scripts attempting to perform archival operations may fail.

5. Archival in Transfer Queue Jobs (Batch Staging)

5.1. When should I batch stage my data?

If any of the following apply to you, use batch staging:

  • If you don't have time to wait for your data to stage
  • If you want to submit a job as soon as the input data is staged
  • If you want to archive your data as soon as a job completes

5.2. What is the transfer queue?

The transfer queue is a special-purpose queue for transferring or archiving files. It has access to $HOME, $ARCHIVE_HOME, $WORKDIR, and $CENTER. Jobs running in the transfer queue use non-computational cores and do not accrue time against your allocation.

5.3. Archival Commands

The archival functions listed in Section 3 work the same way in transfer queue jobs as in interactive login shells, so the command examples in Sections 3.2 and 3.3 apply to transfer queue jobs as well. For more information on specific commands, see the associated man pages on the systems. Additional transfer queue examples are also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.

5.4. Staging in via the transfer queue (Pre-staging)

By pre-staging your data in a transfer queue job, you don't have to sit around and wait for your data to be staged before submitting your computational job. The following standalone script demonstrates retrieval of archived data from the archive server, placing it in a newly created directory in your $WORKDIR, whose name is based on the JOBID. Let's call this a "pre-staging job."

#!/bin/sh
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -A Project_ID

# Create a directory for this job in $WORKDIR and cd into it.
cd $WORKDIR
JOBID=`echo $PBS_JOBID | cut -d . -f 1`
mkdir my_job.$JOBID
cd my_job.$JOBID

# If the archive server is available, get the data. Otherwise, exit.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ $STATUS -eq 0 ]; then
  echo "Archive system not on-line!!"
  echo "Exiting: `date`"
  exit
fi
echo "Archive system is on-line; retrieving job files."
archive get my_input_data.tar.gz

echo "Input data files retrieved: `date`"
echo "Unpacking input tar file"
tar xvzf my_input_data.tar.gz

echo "Directory contents:"
ls

An additional example of this script is also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.

5.5. Staging out via the transfer queue

The term "staging out" refers to the process of dealing with the data that's left in your $WORKDIR after your computational job completes. This generally entails deletion of unneeded files and archival or transfer of important data, which can be time-consuming. Because of this, users can benefit from using the transfer queue for these activities. (Remember that jobs in the transfer queue do not consume allocation.) The following standalone script demonstrates archival of output data to the archive server via the transfer queue. Let's call this a "stage out job."

#!/bin/sh
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -A Project_ID

# cd to wherever your data is located
cd $WORKDIR
echo "Packing data for archiving:"
tar cvzf my_output_data.tar.gz my_output_data

echo "Storing data from computation job:`date`"

# Check to see if archive server is on-line.  If so, run archive task.
# If not, say so, and indicate where the output data is stored for later
# retrieval.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ $STATUS -eq 0 ]; then
  echo "Archive system not on-line!!"
  echo "Job data files cannot be stored."
  echo "Retrieve them in `pwd` in my_output_data.tar"
  echo "Exiting"
  echo `date`
  exit 2
fi
JOBID=`echo $PBS_JOBID | cut -d. -f 1`
archive mkdir my_job.$JOBID
archive put -C my_job.$JOBID my_output_data.tar.gz
archive ls my_job.$JOBID

date
exit

An additional example of this script is also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.

5.6. Tying it all together

While the previous examples were standalone examples, the following technique creates a 3-step job chain that runs from stage-in to stage-out without any involvement from you. This can be advantageous if your workflow is already well-defined and proven and does not require you to personally analyze your output prior to staging out.

If, however, your workflow does require an eyes-on analysis of the output data or if it requires post processing prior to analysis, you may want to use the stage out job instead to transfer your data to $CENTER, as demonstrated in Section 5.6.4 (below). You may still submit a transfer queue job later to archive data you want to keep.

For the purposes of this demonstration, we'll assume the following scripts are saved as prestaging.pbs, computation.pbs, and outstaging.pbs. Additional examples of these scripts are also found in the Sample Code Repositories ($SAMPLES_HOME) on the systems.

Note the use of the $PBS_O_WORKDIR environment variable in script 2 and script 3 (below). This variable is automatically set to the directory in which qsub is executed in script 1. Scripts 2 and 3 then cd to that directory before launching their jobs.

5.6.1. Script 1 of 3 (Pre-staging)

This script contains the pre-staging job and launches the computation job.

#!/bin/sh
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -A Project_ID

# Create a directory for this job in $WORKDIR and cd into it.
cd $WORKDIR
JOBID=`echo $PBS_JOBID | cut -d . -f 1`
mkdir my_job.$JOBID
cd my_job.$JOBID

# If the archive server is available, get the data. Otherwise, exit.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ $STATUS -eq 0 ] ; then
  echo "Archive system not on-line!!"
  echo "Exiting: `date`"
  exit
fi

echo "Archive system is on-line; retrieving job files."
archive get my_input_data.tar.gz

echo "Input data files retrieved: `date`"
echo "Unpacking input tar file"
tar xvzf my_input_data.tar.gz
rm my_input_data.tar.gz

echo "Directory contents:"
ls

echo "Submitting computational job"
qsub -W depend=afterok:${JOBID} ${WORKDIR}/computation.pbs
exit
5.6.2. Script 2 of 3 (Computation)

This script contains the computational job and launches the stage-out job.

#!/bin/sh
#PBS -l walltime=00:30:00
#PBS -j oe
#PBS -q debug
## The following lines show PBS select statements for the systems
## at this center. Uncomment the line for the system you're running on.
## Cray select statement
##PBS -l select=2:ncpus=32:mpiprocs=32
## SGI select statement
##PBS -l select=4:ncpus=16:mpiprocs=16
##PBS -l place=scatter:excl
#PBS -A Project_ID
#PBS -r n

cd $PBS_O_WORKDIR

echo "Executing computation"
## The following lines show launch commands for the systems at this center.
## Uncomment the line for the system you're running on.
## Cray launch command
# aprun -n 64 ./my_executable | tee my_output_data
## SGI launch command
# mpiexec_mpt -n 64 ./my_executable | tee my_output_data

echo "Computation finished, submitting job to pack and archive data"
COMP_JOB=`echo $PBS_JOBID | cut -d. -f 1`

if [ -f ${WORKDIR}/outstaging.pbs ] ; then
  echo "Submitting archive job to transfer queue: `date`"
  qsub -W depend=afterok:${COMP_JOB} ${WORKDIR}/outstaging.pbs
else
  echo "Post archival script is missing!!!"
  echo "Archive step to store data cannot be performed."
  echo "Exiting."
  exit 1
fi
exit
5.6.3. Script 3 of 3 (Stage out to $ARCHIVE_HOME)

This script contains the out-staging script and is launched by the computation script.

#!/bin/sh
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -A Project_ID
#
cd $PBS_O_WORKDIR
echo "Packing data for archiving:"
tar cvzf my_output_data.tar.gz my_output_data

echo "Storing data from computation job:`date`"

# Check to see if archive server is on-line.  If so, run archive task.
# If not, say so, and indicate where the output data is stored for later
# retrieval.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ $STATUS -eq 0 ] ; then
  echo "Archive system not on-line!!"
  echo "Job data files cannot be stored."
  echo "Retrieve them in `pwd` in my_output_data.tar.gz"
  echo "Exiting"
  echo `date`
  exit 2
fi
JOBID=`echo $PBS_JOBID | cut -d. -f 1`
archive mkdir my_job.$JOBID
archive put -C my_job.$JOBID my_output_data.tar.gz
archive ls my_job.$JOBID

date
exit
5.6.4. Alternate Script 3 of 3 (Stage out to $CENTER)

This script contains the out-staging script and is launched by the computation script.

#!/bin/sh
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -A Project_ID
#
cd $PBS_O_WORKDIR
echo "Packing data for archiving:"
tar cvzf my_output_data.tar.gz my_output_data

echo "Storing data from computation job:`date`"

# Check to see if $CENTER is on-line.  If so, copy the files.
# If not, say so, and indicate where the output data is stored for later
# retrieval.
if [ ! -d $CENTER ] ; then
  echo "$CENTER is not available!!"
  echo "Job data files cannot be stored."
  echo "Retrieve them in `pwd` in my_output_data.tar.gz"
  echo "Exiting"
  echo `date`
  exit 2
fi
JOBID=`echo $PBS_JOBID | cut -d. -f 1`
mkdir $CENTER/my_job.$JOBID
cp my_output_data.tar.gz $CENTER/my_job.$JOBID
ls $CENTER/my_job.$JOBID
date
exit