Data Storage

RCC provides a high-performance GPFS shared file system which is used for users’ home directories, shared project spaces, and high-throughput scratch space. Most compute nodes on Midway1 also have local disk storage that can be used for temporary scratch space if necessary. Midway2 compute nodes are all diskeless.

In addition to high-performance GPFS file system, RCC also offers Cost-effective Data Storage (CDS) through Cluster Partnership Program for long-term data storage. CDS is only available from login nodes and is meant to be used as a storage for less frequently used data. Before performing any computation on the data stored on CDS, it first needs to be copied to the GPFS file system.

Quotas

The amount of data that can be stored in home directories, project directories, and shared scratch directories is controlled by quota. RCC enforces hard and soft limits on quotas. A soft quota can be exceeded for a short period of time called a grace period. The hard quota cannot be exceeded under any circumstances.

Additional storage is available through the Cluster Partnership Program, a Research I Allocation, Research II Allocation or, in certain circumstances, a Special Allocation.

Checking available space

To check your current quotas use the quota command. Typical output may look like this

../_images/quota.png

The output could have up tp three sections. The top section displays information about the home directory and the scratch space on both Midway1 (scratch) and Midway2 (scratch2). The middle section displays information about the project space on Midway1. The bottom section displays information about the project space on Midway2. Depending on how many groups you are part of, you may see multiple lines in the middle and the bottom sections.

Descriptions of the fields:

fileset

This is the file set or file system where this quota is valid.

type

This is the type of quota. This can be blocks for the amount of consumed disk space or files for the number of files in a directory. Either of blocks or files quotas can be set at the user or group level. The quota on the home directory and the scratch space is set as per user basis and the quota on the scratch space is set as per group basis.

used

This is the amount of disk space consumed or the number of files in the specified location.

quota

This is the soft quota (storage space or file count) associated with the specified location. It is possible for usage to exceed the soft quota for the grace period or up to the hard limit.

limit

This is the hard quota (storage space or file count) associated with the specified location. When your usage exceeds this limit, you will NOT be able to write to that filesystem.

grace

This is the grace period, or the amount of time remaining that the soft quota can be exceeded. The value none means that the quota is not exceeded. After a soft quota has been exceeded for longer than the grace period, it will no longer be possible to create new files.

Persistent Space

Persistent space is appropriate for long term storage. The two locations for persistent space are the home and project directories. The home and project directories have both file system Snapshots and tape_backup for data protection.

Home Directories

Every RCC user has a home directory located at /home/<CNetID>. The HOME environment variable points to this location. The home directory is accessible from all RCC compute systems and is generally used for storing frequently used items such as source code, binaries, and scripts. By default, a home directory is only accessible by its owner (mode 0700) and is suitable for storing files which do not need to be shared with others.

Project Directories

All RCC PI Groups are allocated a Project Directory located at /project/<PI CNetID> or /project2/<PI CNetID> where <PI CNetID> is the CNetID of your RCC PI account holder. These directories are accessible by all members of the PI Group and are generally used for storing files which need to be shared by members of the group. Additional storage in project directories is available through the Cluster Partnership Program, a Research I Allocation or Research II Allocation or, in certain circumstances, a Special Allocation.

The default permissions for files and directories created in a project directory allow group read/write with the group sticky bit set (mode 2770). The group ownership is set to the PI group.

Scratch Space

Shared Scratch Space

High performance shared scratch space can be accessed using the SCRATCH environment variable. This variable points to the correct path on both Midway1 and Midway2. This scratch space is intended to be used for reading or writing data required by jobs running on the cluster. Scratch space is neither snapshotted nor backed up.

Note

It is the responsibility of the user to ensure any important data in scratch space is moved to persistent storage. Scratch space is meant to be used for temporary, short-term storage only.

The default permissions for scratch space allow access only by its owner (mode 0700). The standard quota for the high performance scratch directory is 5 TB with a 100GB soft limit. The grace period that the soft limit may be exceeded is 30 days for shared scratch space.

Local Scratch Space

Most Midway1 compute nodes have a local hard disk available for scratch space for situations where that would be more appropriate. It is available in /scratch/local. Users should create a sub-directory in this location and use that directory for scratch space. All files in /scratch/local are deleted when the node is rebooted.

Note

Midway2 compute nodes are all diskless and do not have local scratch space. Please use the SCRATCH environment variable to use the shared scratch space.

File System Permissions

Let’s summarize the default file system permissions:

Directory Permissions
$HOME 0700 – Accessible only to the owner
$SCRATCH 0700 – Accessible only to the owner
/project/<PI CNetID> 2770 – Read/write for the project group
/project2/<PI CNetID> 2770 – Read/write for the project group

The default umask is 002. When new files or directories are created, the umask influences the default permissions of those files and directories. With the umask set to 002 all files and directories will be group readable and writable by default. In your home directory, the group ownership will be set to your personal group, which is the same as your CNetID, so you will still be the only user that can access your files and directories. In the project directories, the group sticky bit causes the group ownership to be the same as the directory. This means files created in a project directory will be readable and writable by the project group, which is typically what is wanted in those directories.

Here is an example of what this means in practice:

$ ls -ld $HOME /project/rcc
drwx------ 108 wettstein wettstein 32768 2013-01-15 10:51 /home/wettstein
drwxrws---  24 root      rcc-staff 32768 2013-01-15 10:48 /project/rcc
$ touch $HOME/newfile /project/rcc/newfile
$ ls -l /project/rcc/newfile $HOME/newfile
-rw-rw-r-- 1 wettstein wettstein 0 2013-01-15 10:48 /home/wettstein/newfile
-rw-rw-r-- 1 wettstein rcc-staff 0 2013-01-15 10:48 /project/rcc/newfile

Both files are readable and writable by the group owner due to the default umask, but the group owner differs due to the sticky bit being set on /project/rcc.

Note

This applies only to newly created files and directories. If files or directories are copied from elsewhere, the ownership and permission may not work like this. Contact RCC help if you need assistance with setting filesystem permissions.

Data Recovery and Backups

Snapshots

Automated snapshots of home and project directories are available in case of accidental file deletion or other problems. Currently snapshots are available for these time periods:

Directory Snapshot intervals Snapshot Path
$HOME 7 daily and 4 weekly /snapshots/home/SNAPSHOT/home/CNetID
/project/<any_folder> 7 daily and 2 weekly /snapshots/project/SNAPSHOT/project/<any_folder>
/project2/<any_folder> 7 daily and 4 weekly /snapshots/project2/SNAPSHOT/project2/<any_folder}

The snapshots for the home directory are available from both Midway1 and Midway2 login nodes. The snapshots for the /project/ directory are available only from Midway1 and the snapshots for the /project2/ directory are available only from Midway2 login nodes. The {SNAPSHOT} refers to the time of the backup, e.g. daily-YYYY-MM-DD.05h30 or hourly-YYYY-MM-DD.05h30. To view the available snapshots of the home directory, for example, use the command ls /snapshots/home

To restore a file from a snapshot, simply copy the file to where you want it with either cp or rsync.