All jobs on the general purpose cluster request resources via SLURM. SLURM, is open source software that allocates resources to users for their computations, provides a framework for starting, executing and monitoring compute jobs, and arbitrates contention for resources by managing a queue of pending work. SLURM is widely used in the high performance computing (HPC) landscape and it is likely you will encounter it outside of our systems.
Table of Contents |
---|
General Purpose Computing
...
First ssh into head.arcc.albany.edu. On windows, you can use an ssh client such as PuTTY, on mac, simply use the terminal. Replace [netid] below with your username and type in your password at the prompt. You will not see your password, but it is being typed.
Code Block language bash $ ssh [netid]@head.arcc.albany.edu Warning: Permanently added the ECDSA host key for IP address '169.226.65.82' to the list of known hosts. [netid]@head.arcc.albany.edu's password: Warning: No xauth data; using fake authentication data for X11 forwarding. Last login: Wed Jan 30 13:49:20 2019 from lmm.rit.albany.edu ================================================================================ This University at Albany computer system is reserved for authorized use only. http://www.albany.edu/its/authorizeduse.htm Headnodes: head.arcc.albany.edu headnode7.rit.albany.edu headnode.rit.albany.edu - LEGACY SUPPORT General Purpose Computing: lmm.rit.albany.edu - Large memory x2go headnode: eagle.arcc.albany.edu Questions / Assistance - arcc@albany.edu ================================================================================
Next, allocate resources on the cluster for your interactive session. We will request a session that will last for 1 hour, with 4 cpus and 400 mb of memory. Note your job number will be different
Code Block language bash $ srun --partition=batch --nodes=1 --time=01:00:00 --cpus-per-task=4 --mem=400 --pty bash$SHELL -i $ hostname uagc12-01.arcc.albany.edu
Now we are running a terminal session on a specific node on the cluster. Notice in step 2, that the hostname command output a host other than head.arcc.albany.edu.
Code Block language bash $ cd /network/rit/misc/software/examples/slurm/ $ ./simple_multiprocessing.py USER ns742711 was granted 4 cores and None MB per node on uagc12-01. The job is current running with job # 140590 Process D waiting 3 seconds Process D Finished. Process A waiting 5 seconds Process A Finished. Process C waiting 1 seconds Process C Finished. Process E waiting 4 seconds Process E Finished. Process B waiting 2 seconds Process B Finished. Process F waiting 5 seconds Process F Finished.
When you are finished, type exit and then use scancel to relinquish the allocation
Code Block $ exit $ scancel 140590 salloc: Job allocation 140590 has been revoked.
...
Info |
---|
This job ran on rhea-09, and it's max memory size was ~52 GB. That that I requested 60000MB, so I could refine this job to request slightly less memory. It ran for 14:50:14 and used about 350 CPU hours. |
Can I restrict my job to a certain CPU architecture?
Yes! Use the --constraint flag in #SBATCH. To few available architecture on individual nodes use scontrol show node
...