How To: Nvidia DGX Cloud

In this article you will learn how to access the Nvidia DGX Cloud resources available from the University.

Compute Request Form wip

Getting Started:

To begin, log into NVIDIA AI Enterprise Base Command using your University email (Web SSO)

Once you successfully log in, select the University At Albany (SUNY) as your organization, and then select your team. If you do not have a team, please reach out to ITS to add you to your lab's team. You may still access resources, but your lab's data and workspace will be unavailable until you are placed into their team.

Generating an API Key

In order to upload your data and even use the nvidia command line, you will need to generate an API Key. This key allows you to log into your workspace as yourself, so it is important to not lose it or give out to people.

Once you have selected a team and logged in, you will be on the Base Command homepage. On the left hand side, select 'BASE COMMAND' and from the dropdown, select 'Dashboard' to bring yourself to the overview of the system.

Scroll to the bottom right and click 'Setup' in the section labeled 'Download CLI and Generate API Key.

On the next page, click 'Generate API Key'. Save this key to a text file for easy access on your machine. If you lose your key you can generate a new one on this page again. Next you will install the Nvidia CLI

Installing the Nvidia CLI (Command Line Interface)

Next, download the CLI according to your system specifications. To find your system specifications, press the Windows key and type 'System Information'.

Your system information on a Windows machine will look like this. In this case, this is a 64 bit installation of Windows since it is x64-based.

Next select the appropriate download of the CLI and install it.

Once the install is complete, you will configure the terminal and check if it has installed correctly.

Configuring your Terminal

Next you will need to configure your terminal such that you can upload your data in a neat and usable format. Press the windows key and type 'Windows PowerShell' to open a powershell terminal. Powershell has many similarities to a linux terminal that you may be familiar with from accessing exiting University resources but looks different due to syntax. You can still use commands such as 'cd' and 'ls' just like you can in linux.

In your terminal, call the following to check if Nvidia CLI has installed correctly:

ngc --version

This should return the version of the CLI that you installed. Your username should appear here as well. You can change directories by invoking:

cd C:\Users\your_username\

To list the contents of your directory, invoke the command 'ls'.

Next, you will configure your ngc such that you can upload your data to your workspace and access it within a job. To begin, invoke:

ngc config set

You will be prompted for your API key that you generated earlier. You can copy (ctrl+c) and then paste (ctrl+v) in your powershell terminal to submit your API key.

Next it will prompt you for your CLI output type, select ascii by typing in:

ascii

If you entered a different option or accidentally skipped this entry, you can invoke 'ngc config set' again to pick your choices again. Hitting enter without any input will skip the prompt, so you do not need to re-enter your API key unless you need to.

Next you will be asked to enter your organization, enter the following:

University at Albany (SUNY)

Next it will ask you to enter your team name. This should be your lab team in which you will be working in. Lastly the terminal will prompt you for 'ace'. Enter the following in order to set your architecture:

univ-of-albany-iad2-ace

If done successfully, you will see something similar to the following:

You should see your username and lab name in the appropriate spaces. Congrats on setting up the terminal! You are now ready to upload your data.

Uploading Data

From your computer:

In your powershell terminal, navigate to where your data is stored. To do this you can change directories via:

cd \Users\your_username\path\to\your\data\

You can also obtain this path by opening a file explorer and copy-pasting the address at the top into your terminal.

Make sure your data is not zipped or archived. If your data is zipped or in .tar format, it will upload as such and will not be as accessible on the cloud. Unzip/untar your data before uploading. The option under --source should be the file or folder that contains your files. The --desc option has a descriptor and name section. Lastly the --share option will designate to which team you want to upload the data to if you are a part of multiple teams. Add a 2nd --share option if you want to upload to another team at the same time.

ngc dataset upload --source <dir> --desc "my data" <dataset_name> --share <team_name>

An example command to upload a series of csv files in a folder containing them labled 'world happiness' to the team awan_lab, would look like the following:

EXAMPLE

ngc dataset upload --source world_happiness --desc "csvs of world happiness data ranging from 2015 to 2019" world_happiness --share awan_lab

Now if you return to the base command dashboard, and look under 'Datasets', you should see your just uploaded file/folder. You can upload more files in the same fashion and they will appear in the same manner. The data you upload in this fashion is immutable, thus you do not need to worry about accidentally editing the data during your work. In practice, you should make a copy of the data when executing code, and have the code interface with the copy. This would be accomplished in a manner such as:

python

import pandas as pd
df = pd.read_csv("/mount/data/folder/name_of_your_data.csv")
print(df)

In this fashion, your original data is never truly changed, ensuring reproducibility of your work. If you are doing data cleaning, you can do so in a workspace and save it as a new csv, or clean the data locally if that is easier.

From your lab folder:

uhhh figure this out, maybe install nvidia CLI somewhere on lmm?

Starting a Job

The default time limit of a job is 30 days. There are 2 ways to start a job on DGX cloud, via web interface or via CLI. The web interface is graphical, easy to use, and also generates a CLI prompt for you to use if you wish. In this example, we will submit a job to launch a jupyternotebook instance where we can access our data from inside the notebook.

From the Web Interface:

In the create job section, there are templates created by people in your team, Nvidia, or UAlbany ITS. In this example, we'll be using a template to launch a jupyter notebook session. Here we see a templates tab, and one template available for us to use. The name of this template is tf_3.1_jupyter_notebook, uses 1 GPU, 1 node, and a container image made by Nvidia for tensorflow 3.1.

Once you load the template, you can edit the options in the web interface to suit your needs. You can swap the number of GPUs/nodes or container type to better suit your computing needs. Scroll down to 'Inputs' and you can see which datasets and workspaces you can load. You can load multiple of each.

From the CLI:

ngc batch run --name "Job-univ-of-albany-iad2-ace-622835" --priority NORMAL --order 50 --preempt RUNONCE --min-timeslice 2592000s --total-runtime 2592000s --ace univ-of-albany-iad2-ace --instance dgxa100.80g.1.norm --commandline "jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/" --result /results --image "nvaie/tensorflow-3-1:23.03-tf1-nvaie-3.1-py3" --org tt6xxv6at61b --port 8888