Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The DGX H100 cloud is terminal based rather than GUI based. To access this resource, you will need to create SSH key pairs on LMM. This guide will walk you through how to do this. please have your PI submit the DGX Cloud request form and include your lab’s netIDs to be provisioned.

Make sure you are connected to the UAlbany VPN or are on UAlbany Wifi. Firstly, log into LMM and navigate to your home directory via ‘cd’.

...

Next you will generate your SSH key pairs. Make sure to replace NETID with your own netID in lowercase, and replace YOUR_EMAIL@albany.edu with your own UAlbany email. It would also be a good idea to copy these over to your personal machine, especially in the case of Windows. This will be useful when forwarding a jupyter notebook instance to your localhost:8888.

Code Block
ssh-keygen -t ed25519 -b 4096 -f ~/.ssh/NETID-ed25519-dgxc -C "YOUR_EMAIL@albany.edu"

After entering this, you see ‘Creating ECDSA key for ssh’ on your terminal. Your SSH key will now be under your home directory in a hidden folder called .ssh. There are two keys, one is public and one is private. To log into the cloud, you must supply the private key. Make a copy of these files and also place them into your local machine’s .ssh folder through PowerShell (Windows) or Terminal (Mac/Linux). You will need this in order to open a jupyter notebook on localhost later.

Code Block
ssh -i ~/.ssh/netID-ed25519-dgxc netID@207. yourNetID@207.211.163.76

Here you will be prompted to remember this host and enter your passphrase, which is your password equivalent for logging into the cloud. Do not forget this passphrase! The IP for the login node is 207.211.163.76and invoking ‘hostname’ should return ‘slogin001’. Don’t forget to load slurm as a module here, otherwise commands such as squeue/sbatch will not work.

...

And you should see a link to your jupyter notebook. Opening this now will not work as you need to now port forward your local computer to access the notebook. To do this, open your terminal/powershell window and invoke the following where you replace netID with your own netID, adjust the path to your own private key, and replace JOBGPU GPU_ID with the gpu node name or GPU ID of your job (ex: gpu001, gpu002, etc).

Code Block
ssh -L 8888:JOBGPUGPU_ID:8888 netID@207.211.163.76 -i C:\Users\netID\.ssh.\netID-ed25519-dgxc

This will log you in to the DGX Cloud and forward port 8888 on your local machine to port 8888 of the GPU that your job is running on. You can then open your notebook address on your local machine.

Uploading Data

https://docs.nvidia.com/dgx-cloud/slurm/latest/cluster-user-guide.html#moving-data-into-your-dgx-cloud-cluster

...