...
Finally, to start the job scroll down to the Launch Job section. The default runtime for a job is 30 days (2592000 seconds).
Job Priority should always be set to Normal. Changing this priority can disrupt jobs for other users, and if everyone sets priority to High, then no one is prioritized. Please respect your colleagues by not using higher priority values in this field. ITS may terminate jobs that disrupt the useability of compute resources.
...
You can also copy-paste the generated CLI command from the web interface directly into your terminal as well, though you must specifcy your dataset and workspace ID and paths.
From the CLI:
Code Block |
---|
ngc batch run --name "Job-univ-of-albany-iad2-ace-622835" --priority NORMAL --order 50 --preempt RUNONCE --min-timeslice 2592000s --total-runtime 2592000s --ace univ-of-albany-iad2-ace --instance dgxa100.80g.1.norm --commandline "jupyter lab --allow-root -port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/" --result /results --image "nvaie/tensorflow-3-1:23.03-tf1-nvaie-3.1-py3" --org tt6xxv6at61b --datasetid dataset_ID_here:/mount/data --workspace workspace_ID_here:/mount/workspace:RW --port 8888 |
...
You can specify your datasets and workspace wth the respectve --dataset and --workspace flags, and then use the ID for each. The option RW for --workspace denotes Read and Write permissions.
Once you are satisfied with your job options, you can click 'Launch Job' to start the job. Your job will then appear at the top of the list of running jobs.
Click the newly created job to see the Overview page. Here you can see the generating command that spawned the job, Telemetry monitoring about the job's performance, open ports for any related services, among many other features. To access the actual jupyter notebook, click on the URL/Hostname under Service Mapped Ports. Please note that anyone with the URL can acess your work and data. ITS reminds you to not share sensitive information such as generated URLs or API Keys.
Once you open the link, you will be greeted by the standard jupyter notebook launch page. From here you can open your uploaded code or start a new ipynb. Your data will be found in the same path that you specified for mounting, in our case this is the /mount/data and /mount/workspace folders.
Lastly to access your data in a notebook, you can simple invoke:
Code Block |
---|
import pandas as pd
data = pd.read_csv('/mount/data/your_data.csv')
data |
From here, you can work on your code as you want. You can invoke the following to get a list of devices available to you:
Code Block |
---|
gpus = tf.config.experimental.list_physical_devices('GPU')
num_gpus = len(gpus)
print(num_gpus)
print(gpus) |
This will print the number of devices along with the kind of device.
Here we can see 1 GPU is available. You can see more available if you selected multiple GPUs upon job creation.
Closing the Notebook
Once you are done working on your code, File→Save to save your work. Then File→Shut Down to close the notebook and end the session. It will take a few minutes for the compute resources to become available again as the system saves work and clears memory.
Results & Logs
You can now look at the job and see that it has the 'Finished Success' status. From here you can download results if any were generated, and also obtain a log-file where changes and errors are documented.
This same page can be observed during the job itself to see live updates to the system.
Happy coding!