Content Comparison

All jobs on the general purpose cluster request resources via SLURM. SLURM, is open source software that allocates resources to users for their computations, provides a framework for starting, executing and monitoring compute jobs, and arbitrates contention for resources by managing a queue of pending work. SLURM is widely used in the high performance computing (HPC) landscape and it is likely you will encounter it outside of our systems. For more information please see https://slurm.schedmd.com/

General Purpose Computing

...

Info
batch has some important restrictions. A job can only request 3 nodes and will run for 14 days before being automatically terminated. If you need an exception to this rule, please contact arcc@albany.edu

How can I request access to more nodes, or a longer time limit?

On a case by case basis, ARCC will grant users temporary access to more than the default job limitations. Please contact arcc@albany.edu if you would like to request access to more nodes, or a longer time limit.

...

How do I schedule an interactive job?

An "interactive" job means that you will have access to a terminal so that you can run "interactively" on the cluster. To achieve this, we will use srun. Interactive sessions are useful for debugging code, or making sure certain software compiles correctly.

First ssh into head.arcc.albany.edu. On windows, you can use an ssh client such as PuTTY, on mac, simply use the terminal. Replace [netid] below with your username and type in your password at the prompt. You will not see your password, but it is being typed.

Code Block

language	bash

$ ssh [netid]@head.arcc.albany.edu

Warning: Permanently added the ECDSA host key for IP address '169.226.65.82' to the list of known hosts.
[netid]@head.arcc.albany.edu's password:
 
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Wed Jan 30 13:49:20 2019 from lmm.rit.albany.edu
================================================================================
 This University at Albany computer system is reserved for authorized use only.
              http://www.albany.edu/its/authorizeduse.htm
Headnodes:
 head.arcc.albany.edu
 headnode7.rit.albany.edu
 headnode.rit.albany.edu - LEGACY SUPPORT
General Purpose Computing:
 lmm.rit.albany.edu - Large memory
x2go headnode:
 eagle.arcc.albany.edu
   Questions / Assistance - arcc@albany.edu
================================================================================

Next, allocate resources on the cluster for your interactive session. We will request a session that will last for 1 hour, with 4 cpus and 400 mb of memory. Note your job number will be different
Code Block
language bash
$ srun --partition=batch --nodes=1 --time=01:00:00 --cpus-per-task=4 --mem=400 --pty $SHELL -i $ hostname uagc12-01.arcc.albany.edu

Now we are running a terminal session on a specific node on the cluster. Notice in step 2, that the hostname command output a host other than head.arcc.albany.edu.

Code Block

language	bash

$ cd /network/rit/misc/software/examples/slurm/
$ ./simple_multiprocessing.py
 
USER ns742711 was granted 4 cores and None MB per node on uagc12-01.
The job is current running with job # 140590
 Process D waiting 3 seconds
 Process D Finished.
 Process A waiting 5 seconds
 Process A Finished.
 Process C waiting 1 seconds
 Process C Finished.
 Process E waiting 4 seconds
 Process E Finished.
 Process B waiting 2 seconds
 Process B Finished.
 Process F waiting 5 seconds
 Process F Finished.

When you are finished, type exit and then use scancel to relinquish the allocation
Code Block
$ exit $ scancel 140590 salloc: Job allocation 140590 has been revoked.

...

ow do I view the resources used by a completed job?

sacct is useful to view accounting information on completed jobs. Read the documentation for all output fields.

...

Info
This job ran on rhea-09, and it's max memory size was ~52 GB. That that I requested 60000MB, so I could refine this job to request slightly less memory. It ran for 14:50:14 and used about 350 CPU hours.

Can I restrict my job to a certain CPU architecture?

Yes! Use the --constraint flag in #SBATCH. To few available architecture on individual nodes use scontrol show node

Code Block

language	bash

$ scontrol show node uagc19-06
NodeName=uagc19-06 Arch=x86_64 CoresPerSocket=10
   CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.00
   AvailableFeatures=intel,skylake,sse4_2,avx,avx2,avx512
   ActiveFeatures=intel,skylake,sse4_2,avx,avx2,avx512
   Gres=(null)
   NodeAddr=uagc19-06.arcc.albany.edu NodeHostName=uagc19-06 Version=17.11
   OS=Linux 4.14.35-1844.0.7.el7uek.x86_64 #2 SMP Wed Dec 12 19:48:02 PST 2018
   RealMemory=94956 AllocMem=0 FreeMem=93582 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=4086 Weight=256 Owner=N/A MCS_label=N/A
   Partitions=batch
   BootTime=2019-02-11T10:15:23 SlurmdStartTime=2019-02-11T10:15:48
   CfgTRES=cpu=20,mem=94956M,billing=20
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

How can I allocate GPU resources?

You can request access to the GPUs on --partition=batch or --partition=ceashpc by adding the following flag:

--gres=gpu:1 # For one GPU

--gres=gpu:2 # For two GPUs, given the NVIDIA K80's are 2 K40's with 24GB of shared memory

How can I run jupyter notebook on the cluster?

...

https://jupyterlab.arcc.albany.edu ; please see How-to: Using Jupyterhub for more information

If you need more resources, or longer than an eight hour time limit, you can run jupyter notebook interactively

First, ssh into head.arcc.albany.edu and run; then enter a password at the prompt (note that you will not see your password, but it is being registered)
Code Block
language bash
/network/rit/misc/software/jupyterhub/miniconda3/bin/jupyter notebook password

Next, you can either run jupyter notebook interactively with srun, or you can submit the process via sbatch script located at /network/rit/misc/software/examples/slurm/spawn_jhub.sh (see below)

Spawning jupyter notebook interactively using ARCC's anaconda (you may change the path to your own conda distribution)

Code Block

language	bash

srun --partition=batch --nodes=1 --time=01:00:00 --cpus-per-task=4 --mem=400 --pty $SHELL -i
unset XDG_RUNTIME_DIR
/network/rit/misc/software/jupyterhub/miniconda3/bin/jupyter notebook --no-browser --ip=0.0.0.0

You should see a jupyter output related to launching the server. Once it is complete, you should see output that looks like:

Code Block

language	bash

[I 08:31:49.694 NotebookApp] http://(uagc19-02.rit.albany.edu or 127.0.0.1):8889/
[I 08:31:49.694 NotebookApp] Use Control-C to stop this server and
shut down all kernels (twice to skip confirmation).

Open up a web browser and navigate to the suggested location, in the example we would navigate to uagc19-02.rit.albany.edu:8889 , enter the configured password at the prompt, and that you are all set!

Spawning jupyter notebook via sbatch using ARCCs anaconda (you may change the path to your own conda distribution):

ssh into head.arcc.albany.edu and copy the file below to your home directory and submit the script with sbatch.

Code Block

language	bash

# Copy the file
cp /network/rit/misc/software/examples/slurm/spawn_jupyter.sh ~/spawn_jupyter.sh
 
# change the directory to the home directory
cd ~/
 
# submit the script
sbatch spawn_jupyter.sh

Info
Note that you will want to edit the script to request the amount of resources that you need

This script will create an output file called juptyer.[jobid].log. Open up this file, replacing [jobid] with the allocation number you were given (you can get this by looking at squeue) and you will see output that looks like:

Code Block

language	bash
firstline	1
linenumbers	true

USER [netid] was granted 1 cores and  MB per node on uagc12-02.
The job is current running with job #144168.\n
[I 10:06:31.758 NotebookApp] JupyterLab extension loaded from /network/rit/misc/software/jupyterhub/miniconda3/lib/python3.6/site-packages/jupyterlab
[I 10:06:31.758 NotebookApp] JupyterLab application directory is /network/rit/misc/software/jupyterhub/miniconda3/share/jupyter/lab
[I 10:06:31.779 NotebookApp] Serving notebooks from local directory: /network/rit/home/[netid]
[I 10:06:31.779 NotebookApp] The Jupyter Notebook is running at:
[I 10:06:31.780 NotebookApp] http://(uagc12-02.arcc.albany.edu or 127.0.0.1):8888/
[I 10:06:31.780 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Open up a web browser, and point to the location noted in the second to last line, in the above example, http://uagc12-02.arcc.albany.edu:8888, enter your password, and you are all set!

Version	Old Version 31	New Version 32
Changes made by	Former user	Former user
Saved on	Apr 15, 2019	Jun 24, 2019

Versions Compared

Key

Table of Contents

General Purpose Computing

How can I request access to more nodes, or a longer time limit?

How do I schedule an interactive job?

Can I restrict my job to a certain CPU architecture?

How can I allocate GPU resources?

How can I run jupyter notebook on the cluster?