Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Next it will ask you to enter your team name. This should be your lab team in which you will be working in. Lastly the terminal will prompt you for 'ace'. Enter the following in order to set your architectureAccelerated Computing Environment:

Code Block
univ-of-albany-iad2-ace

...

Once you load the template, you can edit the options in the web interface to suit your needs. You can swap the number of GPUs/nodes or container type to better suit your computing needs. Scroll down to 'Inputs' and you can see which datasets and workspaces you can load, and load multiple datasets/workspaces. Both datasets and workspaces can contain data that you would use, but there are some key differences to know about them.

Datasets

These are read-only and will be the same between sessions. This is useful or reference data that you don't want to change. Files such as CSVs are useful to put here for your code to reference and load into memory from. You can load multiple of each

Workspaces

Files in this artifact are readable and writable. This space is useful for living files that are being edited and worked on. Files such as jupyter notebooks (.ipynb), python scripts, etc, are useful to put here as you work on them between sessions.


You can also download datasets and workspaces, and convert them to results as well to reupload into other spaces. For example, if you are finished working on a script in a workspace, you might download and then reupload as a dataset such that you have an immutable copy of the script to reference. Once you have selected a dataset or workspace to include, a text box will appear under the 'Mount Point' column. Here you can enter /mount/data, or any other custom path to your data or workspace. In your jupyter notebook, you can access this data using this path. Scrolling down even further will show a /results path for any output you may generate.


Containers

The container is similar to a conda environment where packages that are relevant to your work are pre-loaded and ready to use. In this job we are using nvaie/tensorflow-3-1 with the specified tag. We will open the notebook on port 8888.

Image Added

Code Block
jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/

More information can be found here.


Starting the Job

Finally, to start the job scroll down to the Launch Job section.

Image Added

Job Priority should always be set to Normal. Changing this priority can disrupt jobs for other users, and if everyone sets priority to High, then no one is prioritized. Please respect your colleagues by not using higher priority values in this field. ITS may terminate jobs that disrupt the useability of compute resources.

Job Order will run jobs in the order specified, ranging from 1 to 99. If you submit two jobs, one with order 2, and another with order 1, the job with order 1 will execute first. This ordering is only relevant to you as the user, and does not affect other users. The default value if left blank is 50. In the CLI, one can set job order using the --order flag, here is an example job sumitted with order 66.

Code Block
ngc batch run --name test-order (job details...) --order=66

You can view order values by invoking the following:

Code Block
ngc batch list --column order

You can also copy-paste the generated CLI command from the web interface directly into your terminal as well.

From the CLI:

Code Block
ngc batch run --name "Job-univ-of-albany-iad2-ace-622835" --priority NORMAL --order 50 --preempt RUNONCE --min-timeslice 2592000s --total-runtime 2592000s --ace univ-of-albany-iad2-ace --instance dgxa100.80g.1.norm --commandline "jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/" --result /results --image "nvaie/tensorflow-3-1:23.03-tf1-nvaie-3.1-py3" --org tt6xxv6at61b --port 8888

...