The Research Computing Center (RCC) hosts and maintains a number of storage systems. RCC users have access to both persistent high-capacity storage that can be shared among a research group or remain private to each individual user and to high-performance storage when data needs to be temporarily staged and accessed quickly.
Persistent and High-Capacity Storage
Storage is accessible from all compute nodes on Midway as well as outside of the RCC's compute environment through various mechanisms, such as mounting directories as network drives on your personal computer or accessing data as a Globus Online endpoint. The RCC takes snapshots of all home directories (users' private storage space) at regular intervals so that if any data is lost or corrupted, it can easily be recovered.
Visit the data sharing services page to learn about how the RCC can help researchers customize data access, and visit the data management page to learn how the RCC provides support for writing data management plans for grants and other sources of funding.
Each RCC user has a home directory for storing small, frequently used items such as source code, binaries, and scripts. By default, a home directory is only accessible by its owner and is suitable for storing files that do not need to be shared with others. The data in the home directory is accessible from Midway as well as remotely via different protocols. Please see our guide to data transfer for more details.
Principal investigators may request a project space for their research group. These directories are generally used for longer-term storage of data/files which are shared by members of a research group/project and are accessible from all RCC compute systems as well as remotely.
High-Performance Scratch Space
Scratch space is hosted on the RCC’s high-performance storage system and is intended to be used for staging data which is required/generated by computational processes running on the cluster. Unlike home and project directories, scratch space is neither snapshotted nor backed up and may be periodically purged. As such, it is the responsibility of the user to ensure any important data in the scratch space is replicated in a location providing persistent storage such as project or home directories.
Backup and Data Recovery
The RCC maintains Filesystem Snapshots for quick and easy data recovery. In the event of more serious storage failure, archival tape backups can be used to recover data from persistent storage locations.
Automated snapshots of the home, project, and cds directories are available in case of accidental file deletion or other problems. In general, snapshots are available for these time periods:
7 daily snapshots
4 weekly snapshots
Since snapshots take extra space to store, when space usage is alarmingly high, the RCC may lower the number of available snapshots as necessary to carry on normal operation.
Backups are performed on a nightly basis to a tape machine located in a different data center than the main storage system. These backups are meant to safeguard against events such as hardware failure or disasters that could result in the complete loss of the RCC’s primary data center. During periods of high activity, the nightly tape backup may take longer than 24 hours to complete. It is, therefore, possible that the tape backup can occasionally be a few days out of date. Users should make use of the snapshots described above to recover files as tape backup is intended for disaster recovery only. Users should also be aware that special characters are not supported by our backup system so they should restrict their choice of filenames to alphanumeric characters, dash, underscore, and period.