Building a Multiclass Classification Model with PyTorch and Running on NVIDIA DGX
In this tutorial, you will discover how to use PyTorch to develop neural network models for multi-class classification problems and run them on NVIDIA DGX hardware. This guide will walk you through the fundamentals and provide you with the tools to build machine learning models.
Fundamentals
If you're interested in understanding the fundamentals behind this application, feel free to explore this section. Otherwise, you can jump straight into the code.
Code
Firstly, we need to import the required libraries for this project. In this setup:
We import PyTorch libraries for building and training our neural network.
We include data manipulation libraries like NumPy and Pandas.
We import scikit-learn for data preprocessing and evaluation metrics.
We import Matplotlib for data visualization.
Make sure to select a PyTorch container (e.g., nvidia/pytorch:24.07-py3
) in order to have these dependencies automatically resolved.
DGX Cloud - Select from the container dropdown when creating a job.
DGX On-Prem - Specify the the container image attribute on your SLURM job (
#SBATCH --container-image='docker://nvcr.io/nvidia/pytorch:24.07-py3'
).
import copy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.datasets import make_classification
from sklearn.metrics import f1_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
Before we proceed with our model development, it's crucial to understand the hardware resources available to us. This will allow us to optimize our code accordingly, making the most out of the available resources. There are three possible scenarios we need to account for:
CPU
Single GPU
Multiple GPUs
Understanding our hardware setup allows us to make informed decisions about aspects such as whether to use DataParallel
or DistributedDataParallel
for multi-GPU training.
For more information on these two classes, please refer to DataParallel and DistributedDataParallel.
When running on a container on either DGX Cloud or On-Prem, we will always have at least one GPU available.
DGX Cloud - Select the number of GPUs from the computer resource selection when creating a job.
DGX On-Prem - Specify the the number of GPUs attribute on your SLURM job (
#SBATCH --gres=gpu:1
).
From now on, we'll use the device
variable to ensure our data and model are on the correct hardware for optimal performance.
# Check GPU Availability
if torch.cuda.is_available():
device = torch.device('cuda')
print('Using Device:', device)
print('Available GPUs:', torch.cuda.device_count())
else:
device = torch.device('cpu')
print('Using Device:', device)
In this tutorial, we will be leveraging the datasets available on the UCI Machine Learning Repository. The UCI ML Repository provides a convenient library to download datasets, making it easy to access a wide range of machine learning problems. To load the dataset, we'll use the ucimlrepo
library.
For this example, we will use the Iris dataset. The goal of this dataset is to classify iris flowers into three species (setosa, versicolor, and virginica) based on the length and width of their sepals and petals.
# Install UCI ML Repo Lib
!pip install ucimlrepo
from ucimlrepo import fetch_ucirepo
# Import Iris from UCI ML Repo
ucirepo = fetch_ucirepo(id=53)
# Data (as Pandas Dataframes)
X = ucirepo.data.features
y = ucirepo.data.targets
# Target Variable (Class)
target = 'class'
This code will fetch the dataset and load it into pandas DataFrames. X
contains the feature data, and y
contains the target labels.
The following code gathers and prints some basic information about the dataset, displays the first few rows, and creates a bar plot of the class distribution. This initial exploration helps us understand the structure and balance of our dataset.
For the Iris dataset, these are the numbers you should expect.
Number of Instances | Number of Features | Number of Classes |
---|---|---|
150 | 4 | 3 |
Furthermore, we can analyze how individuals are distributed among different classes.
Instances are evenly distributed across the different classes.
Now let's reshape and encode the input (X
) and output (y
) data to prepare it optimally for the neural network.
Next, we convert the data from NumPy arrays into PyTorch tensors, which are the primary data structures used in PyTorch for efficient computation. By converting X
and `y into tensors, we enable the neural network to perform operations like matrix multiplication and backpropagation.
Now we need to allocate portions of our dataset for training, testing, and validation of the neural network. The training set will be used to teach the model, the validation set will help tune the model during development, and the test set will evaluate its performance on unseen data. By splitting the dataset:
60% of the data is assigned to training (
X_train
,y_train
), allowing the model to learn patterns and relationships within the data.20% is assigned to validation (
X_val
,y_val
), enabling us to fine-tune the model and adjust hyperparameters.20% is set aside for testing (
X_test
,y_test
), allowing us to assess the model's generalization ability on new, unseen data.
This approach first assigns 60% of the data to training, then divides the remaining 40% equally between validation and testing (20% each). Feel free to adjust the train_size
parameter to see how it impacts the results. As a rule of thumb, a larger training set generally helps the model learn better, but it's important to keep enough data for testing to accurately evaluate performance.
The moment do define our model has finally arrived! We will set up our neural network with an input layer, one hidden layer, and an output layer to handle the classification task.
Activation Function (ReLU): We use the ReLU function to introduce non-linearity, enabling the network to learn complex patterns.
Hidden Layer: This layer takes the input features and reduces them to half the number of features (
n_features/2
). The hidden layer is where the network begins to learn abstract representations of the input data.Output Layer: Finally, the output layer reduces the hidden layer's outputs to the number of classes (
n_classes
). This layer produces the final predictions for each input.
In the forward pass, the input data is passed through the hidden layer, transformed by the activation function, and then processed by the output layer to produce the model's predictions.
Next, we need to configure the parameters for training our neural network.
Learning Rate (
lr = 0.01
): This controls how much to adjust the network's weights with each step. A smaller learning rate means more gradual updates, which can lead to more stable training.Momentum (
momentum = 0.9
): Momentum helps accelerate gradients vectors in the right directions, leading to faster converging.Number of Epochs (
n_epochs = 500
): This defines how many times the entire dataset will pass through the network during training. More epochs generally lead to better learning, up to a point where the model might start overfitting.Batch Size (
batch_size = 10
): This determines how many samples the network will process before updating its weights. Smaller batch sizes can make training more computationally efficient and help the model generalize better.
Feel free to adjust these parameters to see how they affect the training process and the model's performance. Fine-tuning these hyperparameters is often necessary to achieve the best results.
Now we will instantiate our neural network and configure it based on the available hardware to ensure optimal performance.
First, we create an instance of our Multiclass
neural network. If a GPU is available (torch.cuda.is_available()
), we transfer the model to the GPU (model.to(device)
) for faster computation. If multiple GPUs are available, we enable parallel processing with nn.DataParallel(model)
to further accelerate training.
We use CrossEntropyLoss
as our loss function, which is well-suited for multi-class classification tasks, calculating the difference between the predicted and actual labels. Then we initialize the optimizer as Stochastic Gradient Descent (SGD
) with the previously defined learning rate (lr
) and momentum. This optimizer will update the model's parameters during training based on the gradients of the loss function.
These steps ensure that the model is ready to begin training efficiently, leveraging the best available hardware and an appropriate optimization strategy.
It's time to begin the training process, where the model will learn from the training data over multiple epochs. The training is conducted over a set number of epochs (n_epochs
). Each epoch represents one complete pass through the training data. The data is processed in smaller batches (batch_size
), which helps manage memory usage and can lead to better generalization. After each epoch, the model is evaluated on the test set to assess its performance.
This process iteratively improves the model's ability to generalize to unseen data by refining its parameters based on the training data while monitoring its performance on the test data. After training is completed, we restore the best model weights identified during the training process. This ensures that we are using the most effective version of the model for any further tasks.
Finally, we visualize the results and print the overall training time.
Adding Data Labels: The
add_data_labels
function adds numerical labels to specific data points on the plots to make it easier to interpret the results visually.Plotting Loss: We plot the cross-entropy loss over the epochs for both the training and test sets. This helps us visualize how the model's error decreased during training and how well it generalized to unseen data. Adding data labels highlights the loss at regular intervals.
Plotting Accuracy: Similarly, we plot the accuracy for both the training and test sets over the epochs. This shows how the model's ability to correctly classify inputs improved over time. Data labels are added to display accuracy values at regular intervals.
These visualizations and metrics provide a comprehensive overview of the model's performance, making it easier to assess whether the training process was successful and where further improvements might be needed.
If the code run successfully, these are the type of results you should expect.
  |   |
To achieve the best results in the shortest time, ensure your code is properly utilizing the GPU. GPU workloads can be exponentially faster, often by an order of magnitude, compared to CPU-based processing.
If you have any questions regarding how to run a Jupyter notebook on NVDIA DGX, please refer to the following resources.