Scheduling via SLURM

All jobs on the general purpose cluster request resources via SLURM. SLURM, is open source software that allocates resources to users for their computations, provides a framework for starting, executing and monitoring compute jobs, and arbitrates contention for resources by managing a queue of pending work. SLURM is widely used in the high performance computing (HPC) landscape and it is likely you will encounter it outside of our systems.

General Purpose Computing

All resources on the general purpose cluster are submitted using the SLURM scheduler. For more information, please read the Frequently asked Questions. Jobs can be submitted from the following headnodes:

head.arcc.albany.edu

headnode7.rit.albany.edu

Or from the large memory machine:

lmm.rit.albany.edu

Resource information

All users have access to the "batch" partition for general purpose computing.

The batch partition is comprised of 544 CPUs and 21 compute nodes. Note that a job can only request 3 nodes and may only be active for 14 days. If you need an exception to this, please contact arcc@albany.edu.

$ sinfo -p batch -o "%n, %c, %m" | sort

PARTITION, HOSTNAMES, CPUS, MEMORY
batch*, rhea-01, 24, 64133
batch*, rhea-02, 24, 64133
batch*, rhea-03, 24, 64133
batch*, rhea-04, 32, 96411
batch*, rhea-05, 32, 96411
batch*, rhea-06, 32, 96411
batch*, rhea-07, 40, 128619
batch*, rhea-08, 40, 128619
batch*, rhea-09, 48, 257627
batch*, rhea-10, 48, 257566
batch*, uagc12-01, 12, 64166
batch*, uagc12-02, 12, 64166
batch*, uagc12-03, 12, 64166
batch*, uagc12-04, 12, 64166
batch*, uagc12-05, 32, 128703
batch*, uagc19-01, 20, 94956
batch*, uagc19-02, 20, 94956
batch*, uagc19-03, 20, 94956
batch*, uagc19-04, 20, 94956
batch*, uagc19-05, 20, 94956
batch*, uagc19-06, 20, 94956

Frequently asked questions

SLURM documentation can be found at the SLURM website (https://slurm.schedmd.com); but below are answers to frequently asked questions which demonstrate several useful SLURM commands.

How can I view the current status, or resources available, of batch nodes?

sinfo is commonly used to few the status of a give cluster or node, or how many resources are available to schedule.

Viewing available resources

$ sinfo -p batch -o "%n, %a, %C, %e, %O"

HOSTNAMES, AVAIL, CPUS(A/I/O/T), FREE_MEM, CPU_LOAD
rhea-01, up, 1/23/0/24, 47457, 1.02
rhea-07, up, 8/32/0/40, 106761, 8.03
rhea-08, up, 8/32/0/40, 111833, 8.07
rhea-10, up, 8/40/0/48, 238471, 8.01
rhea-09, up, 48/0/0/48, 243033, 45.69
rhea-04, up, 32/0/0/32, 50843, 20.25
rhea-02, up, 0/24/0/24, 61907, 0.00
rhea-03, up, 0/24/0/24, 61530, 0.00
rhea-05, up, 0/32/0/32, 94105, 0.02
rhea-06, up, 0/32/0/32, 93951, 0.00
uagc12-01, up, 0/12/0/12, 62691, 0.00
uagc12-02, up, 0/12/0/12, 62672, 0.00
uagc12-03, up, 0/12/0/12, 62867, 0.05
uagc12-04, up, 0/12/0/12, 62862, 0.00
uagc12-05, up, 0/32/0/32, 127211, 0.00
uagc19-01, up, 0/20/0/20, 93496, 0.03
uagc19-02, up, 0/20/0/20, 93489, 0.00
uagc19-03, up, 0/20/0/20, 93482, 0.00
uagc19-04, up, 0/20/0/20, 93570, 0.00
uagc19-05, up, 0/20/0/20, 93579, 0.00
uagc19-06, up, 0/20/0/20, 93583, 0.00

Note that %a reports CPUS as allocated/idle/other/available. In this example, rhea-09 has all of it's cores allocated (48 out of 48), and is showing a CPU load of 45.68 (or that 45.68 cores are active). Whereas, many of the other nodes have lower utilization. We can use this information to make smart decisions about how many resources we request.

How can I view jobs currently running, and waiting in queue?

squeue will show jobs currently waiting in the queue or running, for all partitions that you have access to.

viewing the job queue

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            140574     batch  g.slurm  [netid] PD       0:00      1 (Resources)
            140486     batch  g.slurm  [netid]  R   21:53:54      1 rhea-04
            140290     batch   run.sh  [netid]  R 2-19:09:35      1 rhea-01
            140216     batch shell1_5  [netid]  R 3-08:48:18      1 rhea-09
            135093     batch     g.sh  [netid]  R 28-19:56:31     1 rhea-08
            135087     batch     g.sh  [netid]  R 28-20:43:49     1 rhea-10
            135090     batch     g.sh  [netid]  R 28-20:49:42     1 rhea-07

At the time this command was run, there were 7 jobs running or waiting in queue. JOBID 140574 is waiting in the queue due to inadequate available resources, while the other jobs have been running for a few days.

How can I view the resources requested for an active job?

scontrol show job [jobid] will generate a report with information about how a job was scheduled.

Note that once a job is completed, this report can no longer be generated via scontrol. See How do I view the resources used by my job? for accessing similar information upon job completion.

Viewing a running job's allocation

$ scontrol show job ######
 
JobId=###### JobName=g.slurm
   UserId=[netid](52639) GroupId=faculty(972) MCS_label=N/A
   Priority=1 Nice=0 Account=rit QOS=normal
   JobState=PENDING Reason=Resources Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=14-00:00:00 TimeMin=N/A
   SubmitTime=2019-02-13T07:48:25 EligibleTime=2019-02-13T07:48:25
   StartTime=2019-02-14T11:53:10 EndTime=2019-02-28T11:53:10 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2019-02-13T08:42:58
   Partition=batch AllocNode:Sid=headnode7:86819
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=(null)
   NumNodes=1 NumCPUs=32 NumTasks=1 CPUs/Task=32 ReqB:S:C:T=0:0:*:*
   TRES=cpu=32,mem=87.50G,node=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=32 MinMemoryCPU=2800M MinTmpDiskNode=0
   Features=avx2 DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/path/to/command/
   WorkDir=/path/to/workdir/
   StdErr=/path/to/stderr/
   StdIn=/dev/null
   StdOut=/path/to/stdout/
   Power=

Here, the job requested 32 CPUs on one node, with 87.5GB of memory, at 2019-02-13T07:48:25, with a constraint of Features=avx2.

 NumNodes=1 NumCPUs=32 NumTasks=1 CPUs/Task=32 ReqB:S:C:T=0:0:*:*
   TRES=cpu=32,mem=87.50G,node=1

 Features=avx2

How can I view what restrictions are imposed on jobs?

$ scontrol show partition batch
 
PartitionName=batch
   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
   AllocNodes=ALL Default=YES QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=3 MaxTime=14-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=rhea-[01-10],uagc19-[01-06],uagc12-[01-05]
   PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=FORCE:1
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=544 TotalNodes=21 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

batch has some important restrictions. A job can only request 3 nodes and will run for 14 days before being automatically terminated. If you need an exception to this rule, please contact arcc@albany.edu

How do I schedule a non-interactive job?

There are many ways to schedule jobs via slurm. For non-interactive jobs, we recommend using sbatch with a shell script that runs your script. We will use #SBATCH commands to allocate the appropriate resources required for our script. Below is an example workflow of how to submit a python script via sbatch to batch.

First ssh into head.arcc.albany.edu. On windows, you can use an ssh client such as PuTTY, on mac, simply use the terminal. Replace [netid] below with your username and type in your password at the prompt. You will not see your password, but it is being typed.

$ ssh [netid]@head.arcc.albany.edu

Warning: Permanently added the ECDSA host key for IP address '169.226.65.82' to the list of known hosts.
[netid]@head.arcc.albany.edu's password:
 
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Wed Jan 30 13:49:20 2019 from lmm.rit.albany.edu
================================================================================
 This University at Albany computer system is reserved for authorized use only.
              http://www.albany.edu/its/authorizeduse.htm
Headnodes:
 head.arcc.albany.edu
 headnode7.rit.albany.edu
 headnode.rit.albany.edu - LEGACY SUPPORT
General Purpose Computing:
 lmm.rit.albany.edu - Large memory
x2go headnode:
 eagle.arcc.albany.edu
   Questions / Assistance - arcc@albany.edu
================================================================================

Next, change directories to /network/rit/misc/software/examples/slurm/
```
$ cd /network/rit/misc/software/examples/slurm/
```

/network/rit/misc/software/examples/slurm/run.sh contains #SBATCH commands that will request the appropriate amount of resources for our python code, then execute the code.

$ more run.sh

#!/bin/bash
#SBATCH -p batch
#SBATCH --cpus-per-task=4
#SBATCH --mem-per-cpu=100
#SBATCH --mail-type=ALL
#SBATCH --mail-user=%u@albany.edu
#SBATCH -o /network/rit/home/%u/example-slurm-%j.out

# Now, run the python script
/network/rit/misc/software/examples/slurm/simple_multiprocessing.py

--cpus-per-task=4 tells SLURM how many cores we want to allocate on one node

--mem-per-cpu=100 tells SLURM how much memory to allocate per core (see also --mem)

In total, we are requesting 4 cores and 400MB of memory for this simple python code

To submit the job, we simply run sbatch run.sh. Keep note of the Job ID that is output to the terminal, it will be different that what is shown below.
```
$ sbatch run.sh
Submitted batch job 140584
```
Note that you can use squeue to view the job status

The job will output a file to your home directory called ~/example-slurm-[jobid].out. We will view it using the "more" command. You should see output similar to below.

$ more ~/example-slurm-140584.out
USER [netid] was granted 4 cores and 100 MB per node on [hostname].
The job is current running with job # [jobid]
 Process D waiting 3 seconds
 Process D Finished.
 Process C waiting 1 seconds
 Process C Finished.
 Process E waiting 4 seconds
 Process E Finished.
 Process A waiting 5 seconds
 Process A Finished.
 Process B waiting 2 seconds
 Process B Finished.
 Process F waiting 5 seconds
 Process F Finished.

Congratulations, you just ran your first job on the cluster!

How do I schedule an interactive job?

An "interactive" job means that you will have access to a terminal so that you can run "interactively" on the cluster. To achieve this, we will use srun. Interactive sessions are useful for debugging code, or making sure certain software compiles correctly.

First ssh into head.arcc.albany.edu. On windows, you can use an ssh client such as PuTTY, on mac, simply use the terminal. Replace [netid] below with your username and type in your password at the prompt. You will not see your password, but it is being typed.

$ ssh [netid]@head.arcc.albany.edu

Warning: Permanently added the ECDSA host key for IP address '169.226.65.82' to the list of known hosts.
[netid]@head.arcc.albany.edu's password:
 
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Wed Jan 30 13:49:20 2019 from lmm.rit.albany.edu
================================================================================
 This University at Albany computer system is reserved for authorized use only.
              http://www.albany.edu/its/authorizeduse.htm
Headnodes:
 head.arcc.albany.edu
 headnode7.rit.albany.edu
 headnode.rit.albany.edu - LEGACY SUPPORT
General Purpose Computing:
 lmm.rit.albany.edu - Large memory
x2go headnode:
 eagle.arcc.albany.edu
   Questions / Assistance - arcc@albany.edu
================================================================================

Next, allocate resources on the cluster for your interactive session. We will request a session that will last for 1 hour, with 4 cpus and 400 mb of memory. Note your job number will be different
```
$ srun --partition=batch --nodes=1 --time=01:00:00 --cpus-per-task=4 --mem=400 --pty $SHELL -i

$ hostname
uagc12-01.arcc.albany.edu
```

Now we are running a terminal session on a specific node on the cluster. Notice in step 2, that the hostname command output a host other than head.arcc.albany.edu.

$ cd /network/rit/misc/software/examples/slurm/
$ ./simple_multiprocessing.py
 
USER ns742711 was granted 4 cores and None MB per node on uagc12-01.
The job is current running with job # 140590
 Process D waiting 3 seconds
 Process D Finished.
 Process A waiting 5 seconds
 Process A Finished.
 Process C waiting 1 seconds
 Process C Finished.
 Process E waiting 4 seconds
 Process E Finished.
 Process B waiting 2 seconds
 Process B Finished.
 Process F waiting 5 seconds
 Process F Finished.

When you are finished, type exit and then use scancel to relinquish the allocation
```
$ exit 
$ scancel 140590
salloc: Job allocation 140590 has been revoked.
```

How do I view the resources used by my job?

sacct is useful to view accounting information on completed jobs. Read the documentation for all output fields.

$ sacct -u ns742711 -j 139907 -o "Nodelist, JobID, AllocNodes, AllocTRES%30, MaxVMSize, MaxVMSizeTask, AveVMSize, TotalCPU, Elapsed"

       NodeList        JobID AllocNodes                      AllocTRES  MaxVMSize  MaxVMSizeTask  AveVMSize   TotalCPU    Elapsed
--------------- ------------ ---------- ------------------------------ ---------- -------------- ---------- ---------- ----------
        rhea-09 139907                1 cpu=24,mem=60000M,energy=1844+                                      13-00:45:+   14:50:14
        rhea-09 139907.batch          1       cpu=24,mem=60000M,node=1  54764616K              0  54506520K 13-00:45:+   14:50:1

This job ran on rhea-09, and it's max memory size was ~52 GB. That that I requested 60000MB, so I could refine this job to request slightly less memory. It ran for 14:50:14 and used about 350 CPU hours.

Can I restrict my job to a certain CPU architecture?

Yes! Use the --constraint flag in #SBATCH. To few available architecture on individual nodes use scontrol show node

$ scontrol show node uagc19-06
NodeName=uagc19-06 Arch=x86_64 CoresPerSocket=10
   CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.00
   AvailableFeatures=intel,skylake,sse4_2,avx,avx2,avx512
   ActiveFeatures=intel,skylake,sse4_2,avx,avx2,avx512
   Gres=(null)
   NodeAddr=uagc19-06.arcc.albany.edu NodeHostName=uagc19-06 Version=17.11
   OS=Linux 4.14.35-1844.0.7.el7uek.x86_64 #2 SMP Wed Dec 12 19:48:02 PST 2018
   RealMemory=94956 AllocMem=0 FreeMem=93582 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=4086 Weight=256 Owner=N/A MCS_label=N/A
   Partitions=batch
   BootTime=2019-02-11T10:15:23 SlurmdStartTime=2019-02-11T10:15:48
   CfgTRES=cpu=20,mem=94956M,billing=20
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s