ARL DSRC Archive Guide

How to Use This Document

Archiving your code, data, intermediate products and results is an essential part of operating at the DSRCs. Archive is your only means of accessible long-term storage while using its supercomputing resources. Some filesystems are never backed up (e.g., $WORKDIR) while others have size limits or are not backed-up as often as you need (e.g., $HOME). This guide covers how to best make use of archival capabilities in a variety of situations to ensure their efficiency and availability to all users.

Section 1 of this document provides basic information about the archive server, the process of archiving your data, and why this capability is important to you.

Section 2 details important guidelines and precautions for the proper and efficient use of the archive server to ensure maximum availability for all users.

Section 3 demonstrates the use of the archive command, the preferred tool for archiving and retrieving data.

Finally, Section 4 describes methods for building automated processes to tie together your compute jobs with their data retrieval and archival requirements.

1. Archival Basics

1.1. Why do I need to archive my data?

The short answer is to free up system resources and protect your data.

Your work directory, $WORKDIR, resides on a large temporary file system that is shared with other users. This file system is intended to temporarily hold data that is needed or generated by your jobs. Since user jobs often generate a lot of data, the file system would fill up very quickly if everyone just left their files in $WORKDIR indefinitely. This would negatively impact everyone and make the system unusable. To protect the system, an automated purge cycle may run to free up disk space by deleting older or unused files. And, if file space becomes critically low, ALL FILES, regardless of age, are subject to deletion. To avoid this, we strongly encourage you to archive the data you want and keep your $WORKDIR clean by removing unnecessary files. Remember your $WORKDIR is not backed up; so, if your files are purged, and you didn't archive them, they are gone forever!

1.2. How does archival work?

The archive system ($ARCHIVE_HOST) provides a long-term storage area for your important data. It is extremely large, and your personal archive directory ($ARCHIVE_HOME) has no quota. Even so, you probably don't want to archive everything you generate.

When you archive a file, it's copied to your $ARCHIVE_HOME directory on the archive server's disk cache, where it waits to be written to tape by the system. The disk cache is a large temporary storage area for files moving to and from tape. A file in the cache is said to be "online," while a file on tape is "offline." Once your file is written to tape, it may remain "online" for a short time, but eventually it is removed from the disk cache to make room for other files in transit. Both online and offline files show up in a directory listing, but offline files need to be retrieved from tape before you can use them.

Retrieval from tape can take a while, so be patient; there's a lot going on in the background. First, the system must determine on which tape (or tapes) your data resides. These are then robotically pulled from the tape library, mounted in one of the limited number of tape drives (assuming not all of them are busy), and wound into position before retrieval can begin. Your wait time depends on how many files you are retrieving, how big they are, how many tapes they're on, how full the disk cache is, how many other archival jobs are running, the network load (tape-to-cache and cache-to-HPC), and many other factors. After a delay, your data is retrieved from tape and available for use.

1.3. Accessing the archive file system

The archive file system is NFS-mounted to each HPC system, allowing you to perform archival tasks in a familiar Linux environment using standard commands, such as cp, mkdir, chmod, etc. Files can be archived/retrieved simply by copying them to/from $ARCHIVE_HOME. This approach is extremely convenient and has virtually no learning curve. As a result, many users prefer to simply use Linux commands, which are fine in some circumstances.

That said, we also support the archive command, which provides benefits the Linux commands cannot. This command was developed specifically for interactions with the archive server and has several useful features that make it more robust than the standard Linux commands in many circumstances. Since most users are familiar with the use of the standard Linux commands, the remainder of this guide will focus on the archive command. A detailed explanation of the archive command capabilities is described in Section 3.

Important! When interacting with archived files from the command line, please remember, while files may appear to be readily accessible on disk, they may actually be on tape. If so, any action which reads or modifies a file automatically triggers the retrieval of the file from tape, causing the action to take much longer to complete than normal. Such actions include opening, copying, moving, removing, editing, compressing, or tarring a file that is already on tape. See Section 2.3 for more information.

Because the archive file system is a critical resource for all users, please report any errors generated by the unclassified archive system to the HPC Help Desk or by the classified archive system to the ARL DSRC Help Desk immediately, as they may indicate an issue requiring administrative intervention to resolve.

2. Important Guidelines

These guidelines are important to help safeguard stability of the archive server and to minimize negative impact to all users. Failure to observe these guidelines may result in loss of archival privileges.

2.1. Use compressed tar files

Always tar and compress your files before archiving them. This reduces archival overhead and file size and shortens archival time. The sole exception to this general rule is binary data, which does not always compress well. If you have binary data, you should still combine multiple files using tar, but compression may not be advantageous.

Archival overhead refers to the complex set of time-consuming actions that occur every time you archive or retrieve a file. Some of these actions are described in Section 1.2, but there are others as well. So, if you archive 100 individual files, those time-consuming actions must be performed 100 times. This can really add up. But if you combine those 100 files into a single tar file, those time-consuming actions happen only once.

Technically the only limit on the size of an archived file is the size of the file system disk cache. That said, the size of your archival file matters for three important reasons. The larger the file (1) the higher the likelihood the entire file will be lost if a tape error occurs, (2) the longer it will take to stage back from tape, and (3) the more quickly the file will be removed from the disk after staging.

Be careful not to make your files too big. The optimal tar file size at the ARL DSRC is between 200 GB to 2 TB. At that size, the time required for file transfer and tape I/O is still reasonable. Files larger than this are more likely to require the library to load a tape with more free space, greatly increasing archival and retrieval times. If your file is larger than this threshold, you should consider splitting it into multiple smaller files. The maximum recommended file size is 17 TB. You are strongly encouraged not to archive files at or larger than this size.

Also note, using compressed tar files can improve the performance of the archive server for all users because they consume less space on the archive server's disk cache, which benefits everybody. They also reduce your transfer time when moving data to or from the disk cache.

2.2. Do not overwhelm the archive system

Although the archive system provides enormous capacity, it is, in fact, limited in two important ways. The most significant limit is the number of tapes that can be accessed at once. The second limit is the size of the disk cache, which determines how much data can be online at once.

Attempting to archive or retrieve too many files at once can fill up the disk cache on the archive server, halting archival and retrieval for all users. Even if the cache does not reach capacity, it could still tie up all available tape drives, impacting all archival operations at the center.

2.3. Do not directly use files in the archive

A common mistake on systems with NFS-mounted archive file systems is that users attempt to access the contents of a file, forgetting that it may actually be on tape. Any attempt to read or use a file that is on tape (e.g., with commands like tar, vi, more, less, or grep) must first retrieve the file, which consumes time and space on the disk cache. Imagine the result of the following command if the targeted files are on tape:

zcat *.tar.gz | tar -tv | grep search_term   # DON'T EVER DO THIS!!!

The intent of this command would be to grep through the content listings of multiple compressed tar files for a search term. On a normal file system, this is no big deal. But on an archive file system, since the zcat command reads the contents of compressed files, it requires the retrieval of every one of the compressed tar files (possibly many files), which could overwhelm the disk cache on the archive server.

If you inadvertently do something like this on an unclassified system, cancel the command immediately and contact the HPC Help Desk. If this occurs on a classified system, cancel the command immediately and contact the ARL DSRC Help Desk.

2.4. Use manifests

If you frequently need to search the contents of many tar files, creating and using a manifest file is easier on the archive system and faster for you. To create a manifest for a tar file:

tar -tf file.tar > file.manifest

This can be searched as follows:

grep search_term file.manifest  # Search within that file
# or
grep search_term *.manifest  # Search all manifest files in this directory

While it may be convenient to keep the manifest files in the same location as the tar files on the archive server, you could keep them in your home or permanent project directory. This would improve performance even further because you would not need to wait for the manifest files to migrate from tape. Since the manifest files are relatively small, they should not occupy too much space in these locations; however, it is advisable to back up your manifest files periodically.

2.5. Treat important data with appropriate caution

The ARL DSRC cannot guarantee absolute protection from unexpected data loss or corruption. Especially important or irreplaceable data should be stored in a second location at your local facility.

3. The Archive Command

The archive command is the preferred method for performing archival operations on HPC systems within the program. It is available on all HPC systems and is specifically designed to handle interactions with the archive server. Its commands, while more complex, are more robust than standard Linux commands. It also provides a standard interface, allowing you to perform common archival tasks with the same commands regardless of where you're running or how the local archive server is configured. It works the same way in transfer queue jobs as in an interactive login shell.

3.1. Common archive command features

3.1.1. Auto-Retry

All archive commands begin by automatically querying the status of the archive server to determine if the command can proceed. If the archive server is unavailable, the command waits five seconds and retries. The wait time increases by two seconds with each attempt until either the command can proceed or the maximum number of retries is met.

3.1.2. Targets $ARCHIVE_HOME

The archive command uses $ARCHIVE_HOME as its default target directory on the archive server unless you specify an alternative path with the -C path option.

Note: Most archive commands can utilize wildcards, but those resolving to targets on the archive server must be enclosed in double quotes, as follows:

archive get [options] "file*"
3.1.3. Common command options

The following options are available for most archive commands. These options are demonstrated in the example scripts in Section 4.4 below.

Common Archive Command Options
Option Description
-C path Manage files in the archival storage directory path.
-retry N Limit the number of retries.
-s Run in silent mode. Suppress all messages except usage errors.

For complete information on the archive command, see the archive man page on each system.

3.2. Supported archive capabilities and options

The following capabilities and their associated options are supported by the archive command on all HPC systems.

3.2.1. Checking server status

As mentioned above, most archive commands check the archive server status automatically. However, in some circumstances, you may wish to check the server status yourself using the archive stat command. The example below represents our unclassified systems. Expect differences between outputs of the archive stat command between classification levels.

By default, the archive stat command returns a message indicating the archive server is either "on-line" or "unavailable", as follows:

%> archive stat
07:16:12 05/14/2025 msas##.arl.hpc.mil on-line          (where ## changes based on users 
                                                        archive location)

or

%> archive stat
07:16:12 05/14/2025 msas##.arl.hpc.mil unavailable      (where ## changes based on users 
                                                        archive location)

Note: Users will not be able to ssh/putty directly into the archive servers.

Warning: While the archive stat command is also designed to return an exit status like most Linux commands, the exit status is currently meaningless since a 0 is returned regardless of the archive server status. An alternate method of testing archive server availability using archive stat is demonstrated in the example scripts in Section 4.4 below.

3.2.2. Listing files

To list files on the archive server, use the following command:

archive ls [options]

where options can be any option supported by the standard Linux ls command.

3.2.3. Archiving files

To copy one or more files to the archive server, use the following command:

archive put [options] file1 [file2 ...]

The following additional options can be used to modify your command.

Archive put Options
Option Description
-t tar_file Combine your files into a temporary tar_file, which is then sent to the archive server and deleted after the transfer completes.
-S In conjunction with -t, save the temporary tar_file after the transfer completes.
-z Compress files with gzip before transfer, creating .gz files. With -t , create .tgz files.

Caution: If your data contains symbolic links, they will not be dereferenced by the -t option. If you wish to dereference symbolic links in your data, create your own tar file using the -h option to the tar command.

3.2.4. Retrieving files

To retrieve one or more files from the archive server, use the following command:

archive get [options] file1 [file2 ...]   

The following additional options can be used to modify your command.

Archive get Options
Option Description
-x Extract the contents of a tar file after retrieval and delete the tar file.
-S In conjunction with -x, save the temporary tar_file after the transfer completes.
-z Unzip gzip compressed files (e.g., .gz , .tgz).
3.2.5. Making directories

To create a directory on the archive server, use the following command:

archive mkdir [options] dir1 [dir2 ...]  

The following additional options can be used to modify your command.

Archive mkdir Options
Option Description
-m mode Set permissions on the newly created directory. It is equivalent to executing chmod on the directory using numeric mode specifiers (e.g., -m 750).
-p Create necessary intermediate directories in a path if they don't already exist.
3.2.6. Deleting directories

As with Linux commands, deleting an empty directory is different from deleting a non-empty directory. To delete an empty directory on the archive server, use archive rmdir. To delete a non-empty directory and its contents, use archive rm -r. For example:

archive rmdir [options] dir1 [dir2 ...]
archive rm -r [options] dir1 [dir2 ...]

To delete the entire path to an empty directory, use the -p option. For example:

archive rmdir -p [options] /path/to/dir1

This would delete dir1, then /path/to, and finally /path assuming each directory is empty.

3.2.7. Deleting files

To delete one or more files on the archive server, use the following command:

archive rm [options] [-f] file1 [file2 ...]

The -f option prevents the rm command from prompting the user for approval.

3.2.8. Moving or renaming files or directories

To move or rename a file or directory on the archive server, use the archive mv command, which works just like the Linux mv command. For example:

archive mv [options] [-f] file1 target                            #rename
archive mv [options] [-f] "file1 [file2 ...]" target_directory    #move  

The -f option prevents the rm command from prompting the user for approval.

The archive mv command represents a special case in the use of the -C path option. While this option is available, it is far more intuitive to simply specify the target_directory as shown above instead of using -C path.

3.2.9. Changing the permissions of files or directories

To change the permissions of a file or directory on the archive server, use the following command:

archive chmod [options] [-R] mode file1 [file2 ...] 

The -R option changes permissions recursively within a specified directory.

The mode may be stated in numeric or symbolic form, as follows:

archive chmod 750 file1
archive chmod o-rwx file1
3.2.10. Changing group ownership of files or directories

To change the group ownership of a file or directory on the archive server, use the following command:

archive chgrp [options] group_name file1 [file2 ...] 

The following additional options can be used to modify your command.

Archive chgrp Options
Option Description
-R Execute command recursively in all directories and subdirectories in the argument list.
-h Affect the group ownership of symbolic links not the referenced files.

4. Data Staging

4.1. What is data staging?

Data staging is the process of ensuring your data is in the right place at the right time. Related terms are "staging in" or "pre-staging" and "staging out" or "post-job archival." Before a job can run, the input data must be "staged in", "pre-retrieve" or "pre-staged." This simply means the data is copied from the archive server (or some other source) into a directory accessible by the job script. Archiving your output data after the job completes is called "post-job archival" or "staging out". "Staging out" may also refer to moving your output data to another location, like the Center-Wide File System ($CENTER) for further processing.

Staging may be performed manually, but since retrieving a file (especially a large file) from tape may take a while, ensuring your input data is in place before your job runs and that it stays there until it runs, isn't always as simple as it sounds. The following sections describe different approaches for staging your data.

4.2. Staging in Compute Queues (Not Supported)

WARNING! DO NOT attempt to stage data to or from the archive server in a job running in any compute queue. The staging attempt WILL FAIL and may consume a significant amount of your allocation before it does. Additionally, failed stage-out attempts may leave your data at risk.

Staging in batch jobs should only be performed in the transfer queue.

4.3. Staging from the Command Line (Manual Staging)

Manual staging is simply staging from the command line without using the transfer queue. For many users, this is the simplest way to do staging because small data sets can usually be transferred while you wait. (Your mileage may vary based on system load.) There are, however, a couple of things to consider before deciding to stage data manually.

  • Check the size of your data first. If your data exceeds 50 GB (approximately 4 hours of transfer time), you may want to consider staging via the transfer queue. See Section 4.4 for additional details.
  • If your login shell terminates before your transfer completes, your transfer will die. If you do not plan to have an active terminal for the entire time your transfer is occurring, use a transfer job (Section 4.4). Note: DO NOT use the background to transfer your data manually. This can overwhelm the archive system.

4.4. Staging in Transfer Queue Jobs (Batch Staging)

If any of the following apply, you should batch stage your data in the transfer queue:

  • If you don't have time to wait for your data to stage
  • If you want to submit a job as soon as the input data is staged
  • If you want to archive your data as soon as a job completes

Note: Additional examples of all scripts in this guide are also found in the Sample Code Repositories ($SAMPLES_HOME/Data_Management) on the systems.

4.4.1. What is the transfer queue?

The transfer queue is a special-purpose queue for transferring or archiving files. It has access to $HOME, $ARCHIVE_HOME, $WORKDIR, and $CENTER. Jobs running in the transfer queue use non-computational cores and do not accrue time against your allocation.

4.4.2. Staging-in via the transfer queue (Pre-staging)

By pre-staging your data in a transfer queue job, you don't have to sit around and wait for your data to be staged before submitting your computational job. The following standalone script demonstrates retrieval of archived data from the archive server, placing it in a newly created directory in your $WORKDIR, whose name is based on the JOBID. Let's call this a "pre-staging job" (or "pre-retrieve").

Note, all transfer queues are first-come, first-served, regardless of walltime. While you must supply a walltime, it is not used to schedule your transfer queue job. If you set it too low, however, your transfer may die before completion, so you should always use the maximum walltime for transfer queue jobs.

PBS Example Script

#!/bin/bash
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -l walltime=48:00:00
#PBS -A Project_ID

JOBID=`echo ${PBS_JOBID} | cut -d . -f 1`      # Extract numeric portion of $PBS_JOBID
mkdir ${WORKDIR}/my_job.${JOBID}               # Create unique Job directory
cd ${WORKDIR}/my_job.${JOBID}                  # cd to unique Job directory

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit
fi
archive get -x my_input_data.tgz        # Retrieve data from archive and extract

Slurm Example Script

#!/bin/bash
#SBATCH -p transfer
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o filename.out
#SBATCH -e filename.err
#SBATCH -t 48:00:00
#SBATCH -A Project_ID

mkdir ${WORKDIR}/my_job.${SLURM_JOB_ID}         # Create unique Job directory
cd ${WORKDIR}/my_job.${SLURM_JOB_ID}            # cd to unique Job directory

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit
fi
archive get -x my_input_data.tgz        # Retrieve data from archive and extract

An additional example of the PBS and Slurm scripts are in the Sample Code Repositories ($SAMPLES_HOME/Data_Management/Transfer_Queue_with_Archive_Commands) on the respective PBS and Slurm systems.

4.4.3. Staging-out via the transfer queue

The term "staging out" or "post-archive" refers to the process of dealing with the data that's left in your $WORKDIR after your computational job completes. This generally entails deletion of unneeded files and archival or transfer of important data, which can be time-consuming. Because of this, users can benefit from using the transfer queue for these activities. (Remember jobs in the transfer queue do not consume allocation.) The following standalone scripts demonstrate archival of output data to the archive server via the transfer queue.

PBS Example Script

#!/bin/bash
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -l walltime=48:00:00
#PBS -A Project_ID

cd ${WORKDIR}                               # cd to wherever your data is 

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit 1
fi

JOBID=`echo ${PBS_JOBID} | cut -d . -f 1`   # Extract numeric portion of $PBS_JOBID
archive mkdir my_job.${JOBID}               # Create unique Job directory
# Tar, zip, and archive data.
archive put -S -s -t my_output_data.tgz -z -C my_job.${JOBID} my_output_data

Slurm Example Script

#!/bin/bash
#SBATCH -p transfer
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o filename.out
#SBATCH -e filename.err
#SBATCH -t 48:00:00
#SBATCH -A Project_ID

cd ${WORKDIR}                               # cd to wherever your data is 

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit 1
fi

archive mkdir my_job.${SLURM_JOB_ID}        # Create unique Job directory
# Tar, zip, and archive data.
archive put -S -s -t my_output_data.tgz -z -C my_job.${SLURM_JOB_ID} my_output_data
4.4.4. Tying it all together

While the previous examples were standalone examples, the following technique creates a 3-step job chain that runs from stage-in to stage-out without any involvement from you. This can be advantageous if your workflow is already well-defined and proven and does not require you to personally analyze your output prior to staging out.

Conceptually, the process looks like this:

Picture of three boxes labeled Stage-in, Compute, and Stage-out.

If your workflow requires an eyes-on analysis of the output data, or if it requires post processing prior to analysis, you may want to use the stage out job instead to transfer your data to $CENTER, as demonstrated in Section 4.4.4.4 (below). You may still submit a transfer queue job later to archive data you want to keep.

Important! The examples below are for our unclassified systems only. For ALL systems, working examples of the "typing it all together" method can be located on each system in $SAMPLES_HOME/Data_Management.

4.4.4.1. Script 1 of 3 (Stage-In)

PBS Example Script

This script contains the stage-in (or "pre-retrieve) job and launches the compute job.

#!/bin/bash
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -l walltime=48:00:00
#PBS -A Project_ID

JOBID=`echo ${PBS_JOBID} | cut -d . -f 1`    # Extract numeric portion of $PBS_JOBID
mkdir ${WORKDIR}/my_job.${JOBID}             # Create unique Job directory
cd ${WORKDIR}/my_job.${JOBID}                # cd to unique Job directory

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit 1
fi
archive get -x my_input_data.tgz        # Retrieve data from archive and extract
# Submit computational job
qsub ${WORKDIR}/my_compute_script
exit

Slurm Example Script

This script contains the stage-in (or "pre-retrieve") job and launches the compute job.

#!/bin/bash
#SBATCH -p transfer
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o filename.out
#SBATCH -e filename.err
#SBATCH -t 48:00:00
#SBATCH -A Project_ID

mkdir ${WORKDIR}/my_job.${SLURM_JOB_ID}         # Create unique Job directory
cd ${WORKDIR}/my_job.${SLURM_JOB_ID}            # cd to unique Job directory

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit 1
fi
archive get -x my_input_data.tgz        # Retrieve data from archive and extract
# Submit compute job
sbatch ${WORKDIR}/my_compute_script
exit
4.4.4.2. Script 2 of 3 (Compute)

PBS Example Script

This script contains the compute (or "processing") job and launches the stage-out (or "post-archive") job. Note the use of the $PBS_O_WORKDIR environment variable. This variable is automatically set to the directory in which qsub is executed in script 1. This script then cd's to that directory before launching its job.

Important! The example below is for an unclassified system only. $SAMPLES_HOME/Application/picalc will contain an accurate example of a basic processing/compute script on each system. For all systems, working examples of the "typing it all together" method can be located in $SAMPLES_HOME/Data_Management.

#!/bin/bash
#PBS -l walltime=96:00:00
#PBS -j oe
#PBS -q standard
#PBS -A Project_ID
#PBS -r n
#PBS -l select=2:ncpus=192:mpiprocs=192  

# cd to the job directory that was created in the stage-in script (Script 1)
cd ${PBS_O_WORKDIR}

## The following lines show launch commands for the PBS systems at this center.
## Keep only the line for the system you're running on.

mpiexec -V -n 96 ./my_executable | tee my_output_data

# Submit stage-out job
qsub ${WORKDIR}/my_stage-out_script
exit

Slurm Example Script

This script contains the compute (or "processing") job and launches the stage-out (or "post-archive") job. Note the use of the $SLURM_SUBMIT_DIR environment variable. This variable is automatically set to the directory in which sbatch is executed in script 1. This script then cd's to that directory before launching its job.

Important! The example below is for an unclassified system only. $SAMPLES_HOME/Application/picalc will contain an accurate example of a basic processing/compute script on each system. For all systems, working examples of the "typing it all together" method can be located in $SAMPLES_HOME/Data_Management.

#!/bin/bash
#SBATCH -p standard
#SBATCH -o filename.out
#SBATCH -e filename.err
#SBATCH -t 96:00:00
#SBATCH -A Project_ID
#SBATCH --no-requeue
#SBATCH -N 2
#SBATCH -n 184
##SBATCH -ntasks-per-node=92

# cd to the job directory that was created in the stage-in script (Script 1)
cd ${SLURM_SUBMIT_DIR}

## The following lines show launch commands for the Slurm systems at this center.
## Keep only the line for the system you're running on.

mpirun -ppn 92 -genvall -bootstrap slurm ./my_executable | tee my_output_data 

# Computation finished. Submit job to pack and archive data
sbatch ${WORKDIR}/my_stage-out_script
exit
4.4.4.3. Script 3 of 3 (Stage-out to $ARCHIVE_HOME)

PBS Example Script

This script contains the stage-out (or "post-archive") job launched by Script 2. Note the use of the $PBS_O_WORKDIR environment variable. This variable is automatically set to the directory in which qsub is executed in script 1. This script then cd's to that directory before attempting to stage data to $ARCHIVE_HOME.

#!/bin/bash
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -l walltime=48:00:00
#PBS -A Project_ID

# cd to the job directory that was created in the stage-in script (Script 1)
cd ${PBS_O_WORKDIR}

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit 3
fi

JOBID=`echo ${PBS_JOBID} | cut -d . -f 1`   # Extract numeric portion of $PBS_JOBID
archive mkdir my_job.${JOBID}               # Create unique Job directory
# Tar, zip, and archive data.
archive put -S -s -t my_output_data.tgz -z -C my_job.${JOBID} my_output_data
exit

Slurm Example Script

This script contains the stage-out (or "post-archive) job launched by Script 2. Note the use of the $SLURM_SUBMIT_DIR environment variable. This variable is automatically set to the directory in which sbatch is executed in script 1. This script then cd's to that directory before attempting to stage data to $ARCHIVE_HOME.

#!/bin/bash
#SBATCH -p transfer
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o filename.out
#SBATCH -e filename.err
#SBATCH -t 48:00:00
#SBATCH -A Project_ID

# cd to the job directory that was created in the stage-in script (Script 1)
cd ${SLURM_SUBMIT_DIR}

# Exit if the archive server is unavailable.
STATUS=`archive stat -retry 1 | grep 'on-line' | wc -l`
if [ ${STATUS} -eq 0 ]; then
  echo "Exiting: `date` - Archive system not on-line!!"
  exit 3
fi

archive mkdir my_job.${SLURM_JOB_ID}        # Create unique Job directory
# Tar, zip, and archive data.
archive put -S -s -t my_output_data.tgz -z -C my_job.${SLURM_JOB_ID} my_output_data
exit
4.4.4.4. Alternate Script 3 of 3 (Stage-out to $CENTER)

PBS Example Script

This script contains the stage-out (or "post-archive") job launched by Script 2. Note the use of the $PBS_O_WORKDIR environment variable. This variable is automatically set to the directory in which qsub is executed in script 1. This script then cd's to that directory before attempting to stage data to $CENTER.

#!/bin/bash
#PBS -q transfer
#PBS -l select=1:ncpus=1
#PBS -j oe
#PBS -l walltime=48:00:00
#PBS -A Project_ID

# cd to the job directory that was created in the stage-in script (Script 1)
cd ${PBS_O_WORKDIR}

# Exit if the Center-Wide file system is unavailable.
if [ ! -d ${CENTER} ] ; then
  echo "Exiting: `date` - The Center-Wide file system is unavailable!!"
  exit 3
fi

tar cvzf my_output_data.tar.gz my_output_data

JOBID=`echo ${PBS_JOBID} | cut -d . -f 1`   # Extract numeric portion of $PBS_JOBID
mkdir ${CENTER}/my_job.${JOBID}
cp my_output_data.tar.gz ${CENTER}/my_job.${JOBID}
exit

Slurm Example Script

This script contains the stage-out (or "post-archive") job launched by Script 2. Note the use of the $SLURM_SUBMIT_DIR environment variable. This variable is automatically set to the directory in which sbatch is executed in script 1. This script then cd's to that directory before attempting to stage data to $CENTER.

#!/bin/bash
#SBATCH -p transfer
#SBATCH -N 1
#SBATCH -n 1
#SBATCH -o filename.out
#SBATCH -e filename.err
#SBATCH -t 48:00:00
#SBATCH -A Project_ID

# cd to the job directory that was created in the stage-in script (Script 1)
cd ${SLURM_SUBMIT_DIR}

# Exit if the Center-Wide file system is unavailable.
if [ ! -d ${CENTER} ] ; then
  echo "Exiting: `date` - The Center-Wide file system is unavailable!!"
  exit 3
fi

tar cvzf my_output_data.tar.gz my_output_data
mkdir ${CENTER}/my_job.${SLURM_JOB_ID}        # Create unique Job directory
cp my_output_data.tar.gz ${CENTER}/my_job.${SLURM_JOB_ID}
exit