Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Introduction

  2. Overview of Available Systems

  3. HPC Cluster

  4. DGX Cloud Cluster (H100)

  5. DGX On-Prem Cluster (A100)

  6. Storage Resources and Management

  7. Requesting Access to UAlbany Supercomputing Resources

  8. Connection and Access Guide

  9. Working with SLURM

  10. Using Container Images with DGX Environments

  11. DemoTroubleshooting and FAQ

  12. Additional Resources

1. Introduction

What is a Supercomputer?

...

Connect to the head node at head.arcc.albany.edu using SSH. Do you need just a JupyterLab session? Then say no more! The HPC offers an extremely convenient way to start a JupyterLab session through the JupyterHub server, so you don’t have to SSH or do anything like that - just access the following link and be happy: http https://jupyterlab.its.albany.edu/. Have any questions? We’ve got your back, take a look at this great Wiki page on how to use this tool: JupyterHub Service Offering.

...

On macOS or Linux:
You're in luck! These systems come with SSH built in. Just open a terminal window and type:

Code Block
languagenone
ssh your_netid@hostname

On Windows:
You'll need to download an SSH client first. We recommend PuTTY (it's free!), but VS Code's Remote - SSH extension is also a great option if you're already using VS Code (also free).

...

The most common way to run jobs is by creating a script and submitting it with sbatch. A basic script looks something like this:

Code Block
languagebashnone
#!/bin/bash
#SBATCH --job-name=my_awesome_analysis
#SBATCH --output=results_%j.out
#SBATCH --error=results_%j.err
#SBATCH --time=01:00:00
#SBATCH --gpusgres=gpu:1

# Run your actual program
python my_analysis.py

...

  1. Browse the NGC catalog to find the container you need

  2. In your SLURM job script, specify the container directly:

    Code Block
    languagenone
    #SBATCH --container-image='docker://nvcr.io/nvidia/pytorch:25.01-py3'

...

Since containers have their own isolated filesystem, you'll need to explicitly mount your storage directories:

Code Block
languagenone
#SBATCH --container-mounts=/network/rit/dgx/dgx_[your_lab_here]:/mnt/dgx_[your_lab_here]

...

Let's walk through a practical example that you'll likely use all the time - setting up a Jupyter notebook session on the DGX On-Prem cluster. This script creates an interactive JupyterLab environment where you can develop and test your code with all the perks of our powerful GPUs. It automatically generates a secure password and gives you a URL to access your notebook from your browser.

Code Block
languagenone
#!/bin/bash

#SBATCH --job-name=jupyter
#SBATCH --output=jupyter-%j.out
#SBATCH --error=jupyter-%j.err
#SBATCH --time=8:00:00
#SBATCH --gres=gpu:1
#SBATCH --container-image='docker://nvcr.io/nvidia/pytorch:24.09-py3'
#SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab

# Get the DGX node name
node_name="$SLURMD_NODENAME"

echo -e "JupyterLab is being loaded..."

# Generate a random port number between 8000 and 8999
port=$((RANDOM % 1000 + 8000))

# Generate a random password (alphanumeric, 6 characters)
password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 6)

# Build the Jupyter URL
jupyter_url="http://${node_name}.its.albany.edu:${port}"

# Print session details
echo -e "\nYour JupyterLab session is available at: ${jupyter_url}\n"
echo -e "Your password is: ${password}\n"
echo -e "Please copy and paste the link into your browser and use the password to log in.\n"
echo -e "================================================================================\n"

# Start JupyterLab session
jupyter lab --allow-root --no-browser --NotebookApp.token="${password}" --NotebookApp.allow_origin='*' --NotebookApp.log_level='CRITICAL' --notebook-dir=/mnt --port=$port

...

Need some visual assistance? Not a problem! Take a look at the video below and see how to start a JupyterLab session on the DGX On-Prem.

Iframe
scrollingno
srchttps://www.youtube.com/embed/nreHpxo8i0U?si=f8sVI5xOCeCdZAzm
width100%stylee
frameborderhide
dirltr
height315

12. Additional Resources

DonOnce again, don't worry - you're not alone on this supercomputing journey! We've created a wealth of resources to help you make the most of UAlbany's computational power, and our documentation is constantly being updated.

Our AI Tutorials page is your one-stop shop for getting started. Think of it as the trailhead that connects to all the important paths through our supercomputing landscape. You'll find guides for each cluster, step-by-step instructions, and best practices developed by folks who've already blazed these trailslike a comprehensive table of contents on everything related to our supercomputing resources. It’s like a cheat sheet, where you can quickly navigate to the topic you most need help with.

Looking for ready-to-run examples? Check out the Code Tutorials section. We've prepared sample Python scripts and Jupyter notebooks that are specifically designed for our DGX environments. It's like having a cookbook full of recipes that are guaranteed to work in our kitchen!

...