Jupyter Notebooks are excellent for building a strong foundation for your project. However, they can be resource-intensive and less practical for running multiple models with different hyperparameters simultaneously. In such cases, non-interactive approaches are more efficient. While the steps may vary slightly between DGX Cloud and On-Prem environments, you will need to convert your Notebook into a standard Python script for these methods.
Export a Python Script from Jupyter Notebook Using the Command Line
You can export your notebook to a .py
file using the command line with the nbconvert
tool. This tool comes pre-installed with Jupyter Notebook, so you don’t need to install anything extra - it should work right out of the box.
...
Depending on your code, you may need to make adjustments after you export the script. For example, with the Multiclass Classification model we implemented previously, it’s advisable to either comment out charting and output display or redirect these outputs to appropriate files. These and other changes will be addressed next.
Refactoring Your Script for a Non-Interactive Approach
When transitioning to a non-interactive approach, it’s important to ensure that results (and any other relevant outputs) are properly stored. With that in mind, we will continue from where we left off with our Multiclass Classification model and refactor it for a non-interactive setup. The goal is to make it flexible enough to run for different hyperparameters in a single execution. To achieve this, the first step is to identify the parameters we want to tune:
...
Please note that this guide is not a one-size-fits-all approach. The way to refactor your project will depend on your specific code, the libraries you use, and other factors. Nevertheless, this example should give you a good idea of the possibilities and methods for adapting your project.
DGX On-Prem - Submitting a Job on SLURM
Before moving forward, please ensure you have all the necessary access in place and have reviewed the steps in the DGX On-Prem How-To.
If you have any questions on how to connect, please refer to How-to: Connect via SSH.
If you are not familiar with SLURM, please refer to How-to: Scheduling via SLURM.
Step 1 - Connect to the Head Node
First, connect to the head node via SSH at dgx-head01.its.albany.edu
. Once connected, make sure SLURM is loaded by running the following command.
...
This step is especially important since we’ll be using a container to run the script, so be sure not to skip it.
Step 2 - Create an SBATCH Script
Next, you'll create an SBATCH script that pulls a PyTorch container from the NVIDIA Container Registry and runs your custom script. You can use any text editor, but we'll use VIM for this example.
...
Ctrl + S
to saveCtrl + Q
to quit
Step 3 - Submit and Monitor the Job
To submit the job, use the following command.
...
To cancel a submitted job, use scancel <job_id>
. When your job completes, you should see a /results
folder with the outputs of our script.
DGX Cloud - Submitting a Job on NGC
Before moving forward, please ensure you have all the necessary access in place and have reviewed the steps in the How-to: NVIDIA DGX Cloud. There are two ways to submit a job.
...
To remain consistent with the DGX On-Prem instructions, we will submit this job using the CLI. You can run the NGC CLI from lmm.its.albany.edu
, dgx-head01.its.albany.edu
, or your own machine (installation required). In this example, we will connect to lmm.its.albany.edu
and use the CLI from there. Once again, detailed setup instructions for the NGC CLI are available in How-to: NVIDIA DGX Cloud.
Step 1 - Connect to LMM
First, connect to the head node via SSH at lmm.its.albany.edu
. If you have any questions on how to connect, please refer to How-to: Connect via SSH. It's recommended to work from your lab directory, so make sure to navigate there first.
...
This step is especially important since we’ll be mounting the NGC workspace on this directory, so be sure not to skip it.
Step 2 - Upload the Script to NGC Workspace
Now, from the terminal, first create a directory to mount your workspace to. Then use the ngc workspace mount
command to mount the workspace to this new directory - mount as readable and writable via the --mode RW
flag and argument to allow data to be copied to it.
...
If you have any questions on how to use ngc mount
or ngc unmount
, please refer to the official NVIDIA documentation on this topic.
Step 3 - Submit and Monitor the Job
The next and final step is to submit your job through the ngc batch run
command.
...