Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Jupyter Notebooks are excellent for building a strong foundation for your project. However, they can be resource-intensive and less practical for running multiple models with different hyperparameters simultaneously. In such cases, non-interactive approaches are more efficient. While the steps may vary slightly between DGX Cloud and On-Prem environments, you will need to convert your Notebook into a standard Python script for these methods.

Export a Python Script from Jupyter Notebook Using the Command Line

You can export your notebook to a .py file using the command line with the nbconvert tool. This tool comes pre-installed with Jupyter Notebook, so you don’t need to install anything extra - it should work right out of the box.

...

Depending on your code, you may need to make adjustments after you export the script. For example, with the Multiclass Classification model we implemented previously, it’s advisable to either comment out charting and output display or redirect these outputs to appropriate files. These and other changes will be addressed next.

Refactoring Your Script for a Non-Interactive Approach

When transitioning to a non-interactive approach, it’s important to ensure that results (and any other relevant outputs) are properly stored. With that in mind, we will continue from where we left off with our Multiclass Classification model and refactor it for a non-interactive setup. The goal is to make it flexible enough to run for different hyperparameters in a single execution. To achieve this, the first step is to identify the parameters we want to tune:

...

Please note that this guide is not a one-size-fits-all approach. The way to refactor your project will depend on your specific code, the libraries you use, and other factors. Nevertheless, this example should give you a good idea of the possibilities and methods for adapting your project.

DGX On-Prem - Submitting a Job on SLURM

Before moving forward, please ensure you have all the necessary access in place and have reviewed the steps in the DGX On-Prem How-To.

Step 1 - Connect to the Head Node

First, connect to the head node via SSH at dgx-head01.its.albany.edu. Once connected, make sure SLURM is loaded by running the following command.

...

This step is especially important since we’ll be using a container to run the script, so be sure not to skip it.

Step 2 - Create an SBATCH Script

Next, you'll create an SBATCH script that pulls a PyTorch container from the NVIDIA Container Registry and runs your custom script. You can use any text editor, but we'll use VIM for this example.

...

  • Ctrl + S to save

  • Ctrl + Q to quit

Step 3 - Submit and Monitor the Job

To submit the job, use the following command.

...

To cancel a submitted job, use scancel <job_id>. When your job completes, you should see a /results folder with the outputs of our script.

DGX Cloud - Submitting a Job on NGC

Before moving forward, please ensure you have all the necessary access in place and have reviewed the steps in the How-to: NVIDIA DGX Cloud. There are two ways to submit a job.

...

To remain consistent with the DGX On-Prem instructions, we will submit this job using the CLI. You can run the NGC CLI from lmm.its.albany.edu, dgx-head01.its.albany.edu, or your own machine (installation required). In this example, we will connect to lmm.its.albany.edu and use the CLI from there. Once again, detailed setup instructions for the NGC CLI are available in How-to: NVIDIA DGX Cloud.

Step 1 - Connect to LMM

First, connect to the head node via SSH at lmm.its.albany.edu. If you have any questions on how to connect, please refer to How-to: Connect via SSH. It's recommended to work from your lab directory, so make sure to navigate there first.

...

This step is especially important since we’ll be mounting the NGC workspace on this directory, so be sure not to skip it.

Step 2 - Upload the Script to NGC Workspace

Now, from the terminal, first create a directory to mount your workspace to. Then use the ngc workspace mount command to mount the workspace to this new directory - mount as readable and writable via the --mode RW flag and argument to allow data to be copied to it.

...

If you have any questions on how to use ngc mount or ngc unmount, please refer to the official NVIDIA documentation on this topic.

Step 3 - Submit and Monitor the Job

The next and final step is to submit your job through the ngc batch run command.

...