Introduction to RCC for FINM32950

With your enrollment in the FINM32950 course you have been given access to an education allocation on the Research Computing Center (RCC) Midway compute cluster. Some useful information pertaining to the Midway compute environment is listed below.

Where to go for help

For technical questions (help logging in, etc) send a help request to help@rcc.uchicago.edu

The User Guide is available at http://docs.rcc.uchicago.edu

Logging into Midway

Access to RCC is provided via secure shell (SSH) login.

Your RCC account credentials are your CNetID and password:

Username CNetID  
Password CNetID password  
Hostname midway.rcc.uchicago.edu Midwway login nodes
Hostname midway2.rcc.uchicago.edu Midway2 login nodes

Note

When accessing the RCC midway HPC resources it is preferrable that users in this class do so through the midway2 login nodes (midway2.rcc.uchicago.edu) since the class will make use of the midway2 partitions.

Most UNIX-like operating systems (Mac OS X, Linux, etc) provide an SSH utility by default that can be accessed by typing the command ssh in a terminal. To login to Midway from a Linux/Mac computer, open a terminal and at the command line enter:

ssh <username>@midway.rcc.uchicago.edu

Note

RCC does not store your CNet password and we are unable to reset your password. If you need to reset your CNetID password you can do so through the following UChicago IT Services page: https://cnet.uchicago.edu/recertify/.

Windows users will first need to download an SSH client. We recommend you use mobaXterm, which will allow you to ssh to the remote Unix server and permits X11-forwarding for remote visualization. Use the hostname midway2.rcc.uchicago.edu and your CNetID username and password to access Midway through mobaXterm.

Accessing Software on Midway2

When you first log into Midway, you will be entered into a very barebones user environment with minimal software available.

The module system is a script based system used to configure the user environment to make available to the user selected software packages. To access software that is installed on Midway, you use the module system to load the corresponding software module into your environment.

Basic module commands:

Command Description
module avail lists all available software modules
module avail [name] lists modules matching [name]
module load [name] loads the named module
module unload [name] unloads the named module
module list lists the modules currently loaded for the user

Examples

Obtain a list of the currently loaded modules:

$ module list

Currently Loaded Modulefiles:
 1) serf/1.3.9        3) env/rcc       5) slurm/current
 2) subversion/1.9.4  4) git/2.10

Obtain a list of ALL available modules:

 $ module avail
------------------------------------------- /software/modulefiles2 --------------------------------------------
Anaconda2/4.3.0(default)                          intelmpi/4.0+intel-12.1
Anaconda3/4.1.1                                   intelmpi/5.0+intel-15.0
Anaconda3/4.3.0(default)                          intelmpi/5.1+intel-16.0(default)
MACS/1.4(default)                                 interproscan/5(default)
MEME/4.11(default)                                jags/4.2.0(default)
Minuit2/5.34(default)                             jasper/1.900(default)
PerformanceReports/7.0                            java/1.7
PyGMO/current                                     java/1.8(default)
PyGMO/current+gcc-6.2                             julia/0.4
PyGMO/current+intelmpi-5.1+intel-16.0(default)    julia/0.5
R/2.15                                            julia/0.5+intel-16.0
 ...
 ...
---------------------------------------------- /etc/modulefiles -----------------------------------------------
condor/7.8(default)    midway2                slurm/2.4              use.own
env/rcc                module-info            slurm/2.5
midway1                samba/3.6              slurm/current(default)
-------------------------------------------------- Aliases ----------------------------------------------------
-------------------------------------------------- Versions ---------------------------------------------------

This list is quite large. If you know the name of the software package you would like to use you can list all available versions of it by passing the software name as an argument. For example, to list all available cuda modules:

$ module avail cuda

-------------------------------- /software/modulefiles2 ---------------------------------
cuda/6.5(default) cuda/7.5          cuda/8.0
-------------------------------- /etc/modulefiles ---------------------------------------

Similarly, one can obtain a list of all available python modules:

$ module avail python

-------------------------------- /software/modulefiles2 --------------------------------------------------------------------------------------------
python/2.7.12(default)   python/2.7.12+intel-16.0 python/2.7.12-nompi      python/2.7.13+gcc-6.2    python/3.5.2             python/3.5.2+intel-16.0
-------------------------------- /etc/modulefiles --------------------------------------------------------------------------------------------------

To load a module, you need to issue the load command. For example, to load the default python version:

$ module load python

$ python --version
Python 2.7.9

List the currently loaded modules:

$ module list

Currently Loaded Modulefiles:
1) serf/1.3.9        3) git/2.10          5) mkl/11.3          7) python/2.7.12
2) subversion/1.9.4  4) env/rcc           6) openmpi/2.0.1     8) slurm/current

To unload the python module:

$ module unload python

Load a non-default specific python module (e.g. python/3.5.2+intel-16.0)

$ module load  python/3.5.2+intel-16.0

The Midway Cluster Environment

The Research Compuing Center has two heterogenous linux clusters (Midway1 and Midway2). Midway1 is the original linux cluster with approximately 5,300 communal CPU cores and 1.8PB of storage. Midway2 has approximately 10,000 communal CPU cores and 1.6PB of storage. Both clusters contain several different types of nodes that either have more memory, an accelerator (GPU), or different CPU microarchitecture. For example, Midway2 has the following community shared partitions: broadwl, broadwl-lc, bigmem2, and gpu2. Midway1 and Midway2 are shared resources used by the entire University community. Sharing computational resources creates unique challenges:

  • Jobs must be scheduled in a fair manner.
  • Resource consumption needs to be accounted.
  • Access needs to be controlled.

Thus, a scheduler is used to manage job submissions to the cluster. RCC uses the Slurm resource manager to schedule jobs and provide interactive access to compute nodes.

When you first log into Midway2 you will be connected to a login node (midway2-login1 or midway2-login2). Login nodes are not intended to be used for computationally intensive work. Instead, login nodes should be used for managing files, submitting jobs, etc. If you are going to be running a computationally intensive program, you must do this work on a compute node by either obtaining an interactive session or submitting a job through the scheduler. However, you are free to run very short, non-computationally intensive jobs on the login nodes as is often necessary when you are working on and debugging your code. If you are unsure if you job will be computationally intensive (large memory or CPU usage, long running time, etc), get a session on a compute node and work there.

There are two ways to send your work to a Midway compute node:

  1. sinteractive - Request access to a compute node and log into it
  2. sbatch - Write a script which defines commands that need to be executed and let SLURM run them on your behalf (This is generally what will be done for this course).

Working interactively on a compute node

To request an interactive session on a compute node use the sinteractive command:

sinteractive

When this command is executed, you will be connected to one of Midway’s compute nodes where you can then go about running your programs. The default disposition of the sinteractive command is to provide you access for 2 hours to a compute node with 1 CPU and 2GB of memory. The sinteractive command provides many more options for configuring your session. For example, if you want to get access to a compute node with 1 CPU and 4GB of memory for 3 hours, use the command:

sinteractive --account=finm32950 --cpus-per-task=1 --mem-per-cpu=4096 --time=03:00:00

It may take up to 60 seconds or more for your interactive session to be initialized (assuming there is an available resource that meets your specified requirements).

Submitting a job to the scheduler

An alternative to working interactively with a compute node is to submit the work you want carried out to the scheduler through an sbatch script. An example sbatch script is shown below that schedules the use of a broadwell partition node:

#!/bin/bash
#SBATCH --job-name=h2o
#SBATCH --output=test-%j.out
#SBATCH --error=test-%j.err
#SBATCH --time=0:10:00
#SBATCH --account=finm32950
#SBATCH --partition=broadwl
#SBATCH --ntasks-per-node=28 # number of cores to use -- broadwl has 28 maximum
#SBATCH --nodes=1

file=test
input="$file.i"
output="$file.r"

# load your modules here
module load intelmpi

# execute your job here
mpirun -n 16 my-code.x < $input > $output

SBATCH scripts contain two major elements. After the #!/bin/bash line, a series of #SBATCH parameters are defined. These are read by the scheduler, SLURM, and relay information about what specific hardware is required to execute the job, how long that hardware is required, and where the output and error (stdout and stderr streams) should be written to. If resources are available the job may start less than one second following submission. When the queue is busy and the resource request is substantial the job may be placed in line with other jobs awaiting execution.

The %j wildcard included in the output and error file names will cause Slurm to append a unique number to the end of each file. This will prevent your output and error files from being over written if this script is run multiple times in the same directory.

The second major element of an sbatch script is the user defined commands. When the resource request is granted the script is executed just as if it were run interactively (i.e. if you had typed in the commands one after the next at the command line).

Sbatch scripts execute in the directory from which they were submitted. In the above example, we are assuming that this script is located in the same directory where my-code.x is located.

Interact With Your Submitted Jobs

Submitted jobs status is viewable and alterable by several means. The primary slurm command squeue allows the user to monitor jobs.

For example if one simply runs squeue without any options, a list of all pending jobs on Midway1 and Midway2 will be shown followed by all running jobs.

squeue

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
30349874   xenon1t cax_4843 mklinton CF       0:01      1 midway2-0420
30347167      cron anita_ci   cozzyd PD       0:00      1 (BeginTime)
30291771      cron cron_Hea rwilliam PD       0:00      1 (BeginTime)
30321986      cron cron_api rwilliam PD       0:00      1 (BeginTime)
30237421      cron cron_api rwilliam PD       0:00      1 (BeginTime)
30346634      cron cron_api rwilliam PD       0:00      1 (BeginTime)
30349494      cron cron_chi rwilliam PD       0:00      1 (BeginTime)
30349498      cron cron_rcc rwilliam PD       0:00      1 (BeginTime)
30349643      cron wait_til nbrawand PD       0:00      1 (BeginTime)
...
...
30343979 weare-din     auto         ejli  R    1:03:22      4 midway2-[0473-0476]
30095029_9 weare-din contract   simonfre  R 4-14:36:47      1 midway2-0483
30095029_10 weare-din contract  simonfre  R 4-14:36:47      1 midway2-0483
30095029_11 weare-din contract  simonfre  R 4-14:36:47      1 midway2-0483
30198160_0 weare-din contract   simonfre  R 3-13:26:50      1 midway2-0483
30198160_1 weare-din contract   simonfre  R 3-13:26:50      1 midway2-0483
30198160_4 weare-din contract   simonfre  R 3-13:26:50      1 midway2-0483
30198160_5 weare-din contract   simonfre  R 3-13:26:50      1 midway2-0483
30344980   kicp-ht _interac     manzotti  R      53:04      1 midway152
30330522   kicp-ht  MI+_w_1      motloch  R    3:23:10      1 midway151

The above tells us:

Name Description
JOBID Job ID #, unique reference number for each job
PARTITION Type of node job is running/will run on
NAME Name for the job, defaults to slurm-JobID
USER User who submitted job
ST State of the job – CF (configuring), PD (pending), R (running)
TIME Time used by the job in D-HH:MM:SS
NODES Number of Nodes utilized
NODELIST(REASON) List of Nodes in use, or reason the job has not started running

As there are usually a very large number of jobs in the queue, the output of squeue must often be filtered to show you only specific jobs that are of interest to you. To view only the jobs that you have submitted use the command:

squeue -u $USER

or you could similarly use the following command which will additionally list the rank in the queue of your job:

rcchelp squeue

To cancel a job that you have submitted, first obtain the job’s JobID number by using the squeue -u $USER command. Then issue the command:

scancel <JobID>

or you can cancel ALL of your jobs at the same time (be sure you really want to do this!) with the command:

scancel -u <yourCNetID>

Accessing and Transferring Files

RCC provides a number of methods for transferring data in/out of Midway. For relatively small amounts of data, we recommend the scp command. For non-trivial file transfers, we recommend using Globus Online for fast, secure and reliable transfers. When working on the UChicago network it is also possible to mount the Midway file systems using Samba.

Command Line - SCP

Most UNIX-like operating systems (Mac OS X, Linux, etc) provide a scp command which can be accessed from the command line. To transfer files from your local computer to your home directory on Midway, open a terminal window and issue the command:

Single files: $ scp file1 ... <CNetID@>midway.rcc.uchicago.edu:
Directories:  $ scp -r dir1 ... <CNetID@>midway.rcc.uchicago.edu:

When prompted, enter your CNet password.

Windows users will need to download an SCP client such as WinSCP that provides a GUI interface for transferring files via scp.

Windows GUI - WinSCP

WinSCP is a scp client software that can be used to move files to and from Midway and a Windows machine. WinSCP can be obtained from http://www.winscp.net.

Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting.

../_images/winscp-login.png

If prompted to accept the server’s host key, select “yes.”

The main WinSCP window allows you to move files from your local machine (left side) to Midway (right side).

../_images/winscp-main.png

Mac GUI - SFTP Clients

There are a number of graphical SFTP clients available for Mac. FileZilla for example is a freely available SFTP client (https://filezilla-project.org/).

Use the hostname midway.rcc.uchicago.edu and your CNet credentials when connecting.

Samba

Samba allows uses to connect to (or “mount”) their home directory on their local computer so that the file system on Midway appears as if it were directly connected to the local machine. This method of accessing your RCC home and project space is only available from within the UChicago campus network. From off-campus you will need to connect through the UChicago virtual private network.

Your Samba account credentials are your CNetID and password:

Username: ADLOCAL\<CNetID>
Password: CNet password
Hostname: midwaysmb.rcc.uchicago.edu

Note

Make sure to prefix your username with ADLOCAL\

On a Windows computer, use the “Map Network Drive” functionality and the following UNC paths:

Home:    \\midwaysmb.rcc.uchicago.edu\homes
Project: \\midwaysmb.rcc.uchicago.edu\project

On a Mac OS X, use these URLs to connect:

Home:    smb://midwaysmb.rcc.uchicago.edu/homes
Project: smb://midwaysmb.rcc.uchicago.edu/project

To connect on a Mac OS X computer:

  • Use the Connect to Server utility in Finder
../_images/finder-connect_to_server1.jpg
  • Enter one of the URLs from above in the input box for Server Address.
  • When prompted for a username and password, select Registered User.
  • Enter ADLOCAL\YourCNetID for the username and enter your CNet password.