Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Finally, to start the job scroll down to the Launch Job section. The default runtime for a job is 30 days (2592000 seconds).

Job Priority should always be set to Normal. Changing this priority can disrupt jobs for other users, and if everyone sets priority to High, then no one is prioritized. Please respect your colleagues by not using higher priority values in this field. ITS may terminate jobs that disrupt the useability of compute resources.

...

You can also copy-paste the generated CLI command from the web interface directly into your terminal as well, though you must specifcy your dataset and workspace ID and paths.

From the CLI:

Code Block
ngc batch run --name "Job-univ-of-albany-iad2-ace-622835" --priority NORMAL --order 50 --preempt RUNONCE --min-timeslice 2592000s --total-runtime 2592000s --ace univ-of-albany-iad2-ace --instance dgxa100.80g.1.norm --commandline "jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/" --result /results --image "nvaie/tensorflow-3-1:23.03-tf1-nvaie-3.1-py3" --org tt6xxv6at61b --datasetid dataset_ID_here:/mount/data --workspace workspace_ID_here:/mount/workspace:RW --port 8888

...

You can specify your datasets and workspace wth the respectve --dataset and --workspace flags, and then use the ID for each. The option RW for --workspace denotes Read and Write permissions.

Once you are satisfied with your job options, you can click 'Launch Job' to start the job. Your job will then appear at the top of the list of running jobs.Image Added

Click the newly created job to see the Overview page. Here you can see the generating command that spawned the job, Telemetry monitoring about the job's performance, open ports for any related services, among many other features. To access the actual jupyter notebook, click on the URL/Hostname under Service Mapped Ports. Please note that anyone with the URL can acess your work and data. ITS reminds you to not share sensitive information such as generated URLs or API Keys.

Image Added

Once you open the link, you will be greeted by the standard jupyter notebook launch page. From here you can open your uploaded code or start a new ipynb. Your data will be found in the same path that you specified for mounting, in our case this is the /mount/data and /mount/workspace folders.

Image Added

Image Added

Lastly to access your data in a notebook, you can simple invoke:

Code Block
import pandas as pd
data = pd.read_csv('/mount/data/your_data.csv')
data

Image Added

From here, you can work on your code as you want. You can invoke the following to get a list of devices available to you:

Code Block
gpus = tf.config.experimental.list_physical_devices('GPU')
num_gpus = len(gpus)
print(num_gpus)
print(gpus)

This will print the number of devices along with the kind of device.

Image Added

Here we can see 1 GPU is available. You can see more available if you selected multiple GPUs upon job creation.

Closing the Notebook

Once you are done working on your code, File→Save to save your work. Then File→Shut Down to close the notebook and end the session. It will take a few minutes for the compute resources to become available again as the system saves work and clears memory.


Results & Logs

You can now look at the job and see that it has the 'Finished Success' status. From here you can download results if any were generated, and also obtain a log-file where changes and errors are documented.

This same page can be observed during the job itself to see live updates to the system.

Image Added


Happy coding!