Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

SLURM

All jobs on the general purpose cluster request resources via SLURM. SLURM, is open source software that allocates resources to users for their computations, provides a framework for starting, executing and monitoring compute jobs, and arbitrates contention for resources by managing a queue of pending work. SLURM is widely used in the high performance computing (HPC) landscape and it is likely you will encounter it outside of our systems. For more information please see https://slurm.schedmd.com/ 

General Purpose Computing

...

  1. First ssh into head.arcc.albany.edu. On windows, you can use an ssh client such as PuTTY, on mac, simply use the terminal. Replace [netid] below with your username and type in your password at the prompt. You will not see your password, but it is being typed. 

    Code Block
    languagebash
    $ ssh [netid]@head.arcc.albany.edu
    
    Warning: Permanently added the ECDSA host key for IP address '169.226.65.82' to the list of known hosts.
    [netid]@head.arcc.albany.edu's password:
     
    Warning: No xauth data; using fake authentication data for X11 forwarding.
    Last login: Wed Jan 30 13:49:20 2019 from lmm.rit.albany.edu
    ================================================================================
     This University at Albany computer system is reserved for authorized use only.
                  http://www.albany.edu/its/authorizeduse.htm
    Headnodes:
     head.arcc.albany.edu
     headnode7.rit.albany.edu
     headnode.rit.albany.edu - LEGACY SUPPORT
    General Purpose Computing:
     lmm.rit.albany.edu - Large memory
    x2go headnode:
     eagle.arcc.albany.edu
       Questions / Assistance - arcc@albany.edu
    ================================================================================
  2. Next, allocate resources on the cluster for your interactive session. We will request a session that will last for 1 hour, with 4 cpus and 400 mb of memory. Note your job number will be different

    Code Block
    languagebash
    $ srun --partition=batch --nodes=1 --time=01:00:00 --cpus-per-task=4 --mem=400 --pty $SHELL -i
    
    $ hostname
    uagc12-01.arcc.albany.edu
  3. Now we are running a terminal session on a specific node on the cluster. Notice in step 2, that the hostname command output a host other than head.arcc.albany.edu.

    Code Block
    languagebash
    $ cd /network/rit/misc/software/examples/slurm/
    $ ./simple_multiprocessing.py
     
    USER ns742711 was granted 4 cores and None MB per node on uagc12-01.
    The job is current running with job # 140590
     Process D waiting 3 seconds
     Process D Finished.
     Process A waiting 5 seconds
     Process A Finished.
     Process C waiting 1 seconds
     Process C Finished.
     Process E waiting 4 seconds
     Process E Finished.
     Process B waiting 2 seconds
     Process B Finished.
     Process F waiting 5 seconds
     Process F Finished.
  4. When you are finished, type exit and then use scancel to relinquish the allocation

    Code Block
    $ exit 
    $ scancel 140590
    salloc: Job allocation 140590 has been revoked.

How do I view the resources used by

...

a completed job?

sacct is useful to view accounting information on completed jobs. Read the documentation for all output fields.

...

Info

This job ran on rhea-09, and it's max memory size was ~52 GB. That that I requested 60000MB, so I could refine this job to request slightly less memory. It ran for 14:50:14 and used about 350 CPU hours.

Can I restrict my job to a certain CPU architecture?

 Yes! Use the --constraint flag in #SBATCH. To few available architecture on individual nodes use scontrol show node

...