Running Interactive and Batch Jobs at the Research Computing Center
Abstract
Slurm is a widely used job scheduler on high-performance computing systems. Efficient resource management is crucial to achieve productivity in the complex environment of RCC computing clusters. This workshop aims to equip users with a clear understanding of all the available compute partitions at the RCC, how to identify the right resource to use for your specific job, how to configure a Slurm job, how to use Slurm commands, how to submit a job, and how to avoid common mistakes that may cause a job to wait for a long time in the queue before running or fail to run altogether.
Objectives:
By the end of the workshop attendees will:
- learn the various Midway resources and partitions for running jobs
- learn Slurm commands, how to create a Slurm batch script, and how to submit batch jobs
- acquire a good understanding of RCC software module system and run time environments
- learn how to submit serial single processor and parallel (OpenMP and MPI) multiple processor jobs
- learn how to submit GPU jobs
- learn how to submit a job that is carried on several times by a given code, differing only in the initial value of some high-level parameter for each run (Slurm job array)
- learn how to pack jobs and schedule independent processes inside a Slurm job allocation
- learn how to submit Message passing parallel jobs (MPI), multi-threading (OpenMP), and hybrid jobs.
- learn how to request a Slurm interactive session
- learn best practices and how to debug Slurm script
Duration: 2 hours
Level: Introductory
Prerequisites: Knowledge of Slurm is helpful. An RCC account is required.
Thursday, February 15, 2024 - 14:00 to 16:00