Info
ATTENTION: This documentation is deprecated. As of Spring 2025, DGX Cloud is transitioning to a new cluster powered by H100 GPUs. For current documentation of this updated terminal-based environment, please visit: DGX Cloud: H100 How-To.

In this article, you will learn how to access the NVIDIA DGX Cloud resources available from the University. These resources are available at the request of faculty only.

Before getting started please ensure you are familiar with the following:

The University's ssh Protocols

The University's Research Storage

Please note: Research Groups are allowed 1-2 GPUs and 20TB of storage. Users who abuse this system will be notified and may be suspended

...

To begin, log into NVIDIA AI Enterprise Base Command using your University email (Web SSO)Image Removed

...

Once you successfully log in, select the University At Albany (SUNY) as your organization, and then select your team. If you do not have a team, please reach out to ITS to add you to your lab's team. You may still access resources, but your lab's data and workspace will be unavailable until you are placed into their team.Image Removed

...

Generating an API Key

In order to upload your data and even use the Nvidia command line, you will need to generate an API Key. This key allows you to log into your workspace as yourself, so it is important to not lose it or give it out to people.

Once you have selected a team and logged in, you will be on the Base Command homepage. On the left-hand side, select 'BASE COMMAND' and from the dropdown, select 'Dashboard' to bring yourself to the overview of the system.Image Removed

...

Scroll to the bottom right and click 'Setup' in the section labeled 'Download CLI and Generate API Key.Image Removed

...

On the next page, click 'Generate API Key'. Save this key to a text file for easy access on your machine. If you lose your key you can generate a new one on this page again. Next, you will install the Nvidia CLI

...

You should see the following output to your terminal.Image Removed

...

Configuring your Terminal on LMM

...

If done successfully, you will see something similar to the following:Image Removed

...

You should see your username and lab name in the appropriate spaces. If you mis-entered a value, you can use invoke 'ngc config set' to go through each step. Pressing enter without any input will not overwrite previously inputted information, thus you can hit enter to skip portions you entered correctly. To clear the entire config, invoke 'ngc config clear'.

If you do not see 'univ-of-albany-iad2-ace', ITS has yet to add you to a team such that you can access this part. Please wait for ITS to add you to a team and then regenerate your API key.

Congrats on setting up the terminal! You are now ready to upload your data.

...

An example command to upload a series of CSV files in a folder containing them labeled 'world happiness' to the team awan_lab, would look like the following:

EXAMPLE

Code Block

title	EXAMPLE

ngc dataset upload --source world_happiness --desc "csvs of world happiness data ranging from 2015 to 2019" world_happiness --share awan_lab

Now if you return to the base command dashboard, and look under 'Datasets', you should see you've just uploaded a file/folder. You can upload more files in the same fashion and they will appear in the same manner. The data you upload in this fashion is immutable, thus you do not need to worry about accidentally editing the data during your work. In practice, you should make a copy of the data when executing code, and have the code interface with the copy. This would be accomplished in a manner such as:

...

python

linenumbers

Code Block
true	import pandas as pd df = pd.read_csv("/mount/data/folder/name_of_your_data.csv") print(df)

...

In the create job section, there are templates created by people in your team, Nvidia, or UAlbany ITS. In this example, we'll be using a template to launch a Jupyter Notebook session. Here we see a templates tab, and one template available for us to use. The name of this template is tf_3.1_jupyter_notebook, uses 1 GPU, 1 node, and a container image made by Nvidia for TensorFlow 3.1.Image Removed

...

Once you load the template, you can edit the options in the web interface to suit your needs. You can swap the number of GPUs/nodes or container type to better suit your computing needs. Scroll down to 'Inputs' and you can see which datasets and workspaces you can load, and load multiple datasets/workspaces. Both datasets and workspaces can contain data that you would use, but there are some key differences to know about them.

...

This directory exists in each job and can be used to direct outputs into. Doing so ensures that your outputs are available post job completion.

**Do not save notebooks or files in

...

directories that are not under /mount or under /results

...

**

You can also download datasets and workspaces, and convert them to results as well to reupload re-upload into other spaces. For example, if you are finished working on a script in a workspace, you might download and then reupload it as a dataset such that you have an immutable copy of the script to reference. Once you have selected a dataset or workspace to include, a text box will appear under the 'Mount Point' column. Here you can enter /mount/data, or any other custom path to your data or workspace. In your Jupyter Notebook, you can access this data using this path. Scrolling down even further will show a /results path for any output you may generate.

...

The container is similar to a conda environment where packages that are relevant to your work are pre-loaded and ready to use. In this job, we are using nvaie/tensorflow-3-1 with the specified tag. We will open the notebook on port 8888.Image Removed

...

Code Block

jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/
--or--
jupyter lab --ip=0.0.0.0 --allow-root --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/

...

Finally, to start the job scroll down to the Launch Job section. The default runtime for a job is 30 days (2592000 seconds).Image Removed

...

Job Priority should always be set to Normal. Changing this priority can disrupt jobs for other users, and if everyone sets the priority to High, then no one is prioritized. Please respect your colleagues by not using higher-priority values in this field. ITS may terminate jobs that disrupt the useability of computing resources.

...

Code Block

ngc batch run --name "Job-univ-of-albany-iad2-ace-622835" --priority NORMAL --order 50 --preempt RUNONCERESUMABLE --min-timeslice 2592000s --total-runtime 2592000s --ace univ-of-albany-iad2-ace --instance dgxa100.80g.1.norm --commandline "jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/" --result /results --image "nvaie/tensorflow-3-1:23.03-tf1-nvaie-3.1-py3" --org tt6xxv6at61b --datasetid dataset_ID_here:/mount/data --workspace workspace_ID_here:/mount/workspace:RW --port 8888

...

Click the newly created job to see the Overview page. Here you can see the generating command that spawned the job, Telemetry monitoring about the job's performance, open ports for any related services, among many other features. To access the actual Jupyter Notebook, click on the URL/Hostname under Service Mapped Ports. Please note that anyone with the URL can acess your work and data. ITS reminds you to not share sensitive information such as generated URLs or API Keys.Image Removed

...

Once you open the link, you will be greeted by the standard Jupyter Notebook launch page. From here you can open your uploaded code or start a new ipynb. Your data will be found in the same path that you specified for mounting, in our case this is the /mount/data and /mount/workspace folders. If you are making a new jupyternb, save your notebook within /mount/workspaces so that you may edit and access it later. You will not be able to save your notebook in the /mount/data folder as that is read only.Image Removed

...

Image Removed

Lastly, to access your data in a notebook, you can simply invoke:

...

This will print the number of devices along with the kind of device.Image Removed

...

Here we can see 1 GPU is available. You can see more available if you selected multiple GPUs upon job creation.

...

This same page can be observed during the job itself to see live updates to the system.Image Removed

...

Happy coding!

...

Version	Old Version 28	New Version Current
Changes made by	Awan, Waqas S	Vieira Sobrinho, Jose
Saved on	Apr 12, 2024	Feb 10, 2025

Content Comparison

Versions Compared

Key

Please note: Research Groups are allowed 1-2 GPUs and 20TB of storage. Users who abuse this system will be notified and may be suspended

Generating an API Key

Configuring your Terminal on LMM

EXAMPLE

python

**Do not save notebooks or files in

directories that are not under /mount or under /results

**