Table of Contents |
---|
...
All jobs on the general purpose cluster request resources via SLURM. SLURM, is open source software that allocates resources to users for their computations, provides a framework for starting, executing and monitoring compute jobs, and arbitrates contention for resources by managing a queue of pending work. SLURM is widely used in the high performance computing (HPC) landscape and it is likely you will encounter it outside of our systems. For more information please see https://slurm.schedmd.com/
General Purpose Computing
...
Info |
---|
batch has some important restrictions. A job can only request 3 nodes and will run for 14 days before being automatically terminated. If you need an exception to this rule, please contact arcc@albany.edu |
How can I request access to more nodes, or a longer time limit?
On a case by case basis, ARCC will grant users temporary access to more than the default job limitations. Please contact arcc@albany.edu if you would like to request access to more nodes, or a longer time limit.
...
Info |
---|
This job ran on rhea-09, and it's max memory size was ~52 GB. That that I requested 60000MB, so I could refine this job to request slightly less memory. It ran for 14:50:14 and used about 350 CPU hours. |
Can I restrict my job to a certain CPU architecture?
Yes! Use the --constraint flag in #SBATCH. To few available architecture on individual nodes use scontrol show node
Code Block | ||
---|---|---|
| ||
$ scontrol show node uagc19-06 NodeName=uagc19-06 Arch=x86_64 CoresPerSocket=10 CPUAlloc=0 CPUErr=0 CPUTot=20 CPULoad=0.00 AvailableFeatures=intel,skylake,sse4_2,avx,avx2,avx512 ActiveFeatures=intel,skylake,sse4_2,avx,avx2,avx512 Gres=(null) NodeAddr=uagc19-06.arcc.albany.edu NodeHostName=uagc19-06 Version=17.11 OS=Linux 4.14.35-1844.0.7.el7uek.x86_64 #2 SMP Wed Dec 12 19:48:02 PST 2018 RealMemory=94956 AllocMem=0 FreeMem=93582 Sockets=2 Boards=1 State=IDLE ThreadsPerCore=1 TmpDisk=4086 Weight=256 Owner=N/A MCS_label=N/A Partitions=batch BootTime=2019-02-11T10:15:23 SlurmdStartTime=2019-02-11T10:15:48 CfgTRES=cpu=20,mem=94956M,billing=20 AllocTRES= CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s |
How can I run jupyter notebook on the cluster?
There are two ways to spawn jupyter notebooks on the server:
- https://jupyterlab.arcc.albany.edu ; please see Jupyterhub for more information
If you need more resources, or longer than an eight hour time limit, you can run jupyter notebook interactively
First, ssh into head.arcc.albany.edu and run; then enter a password at the prompt (note that you will not see your password, but it is being registered)
Code Block language bash /network/rit/misc/software/jupyterhub/miniconda3/bin/jupyter notebook password
Next, you can either run jupyter notebook interactively with srun, or you can submit the process via sbatch script located at /network/rit/misc/software/examples/slurm/spawn_jhub.sh (see below)
Spawning jupyter notebook interactively using ARCC's anaconda (you may change the path to your own conda distribution)
Code Block language bash srun --partition=batch --nodes=1 --time=01:00:00 --cpus-per-task=4 --mem=400 --pty $SHELL -i unset XDG_RUNTIME_DIR /network/rit/misc/software/jupyterhub/miniconda3/bin/jupyter notebook --no-browser --ip=0.0.0.0
You should see a bunch of output launching the server and at the bottom something that looks like:
Code Block language bash [I 08:31:49.694 NotebookApp] http://(uagc19-02.rit.albany.edu or 127.0.0.1):8889/ [I 08:31:49.694 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Open up a web browser and point to the location it says, in the example we would navigate to uagc19-02.rit.albany.edu:8889 , enter the configured password at the prompt, and that is it!
- Spawning jupyter notebook via sbatch using ARCCs anaconda (you may change the path to your own conda distribution):