Running a Pre-trained Image Classification Model on IBM-AIU

IBM-AIU is optimized for data inference and currently does not support model training. At this time, each user is limited to 1 AIU. This guide provides instructions for running pre-trained models, such as ResNet50, on IBM-AIU hardware.

Background of ResNet50

ResNet50 is a deep convolutional neural network (CNN) architecture that was developed by Microsoft Research in 2015. It is a variant of the popular ResNet architecture, which stands for “Residual Network.” The “50” in the name refers to the number of layers in the network, which is 50 layers deep.

How it works

ResNet50 uses skip connections to add outputs from previous layers to the outputs of stacked layers. This allows the model to learn an identity function, which ensures that the higher layers of the model do not perform worse than the lower layers.

How it is used

ResNet50 is often used in transfer learning scenarios, where a pre-trained model is fine-tuned on smaller datasets for specific tasks. For example, a pre-trained ResNet-50 model can be used to classify images into 1,000 object categories.

Benefits

ResNet50 delivers accurate training results with faster processing time. It's superior to CNN because it improves network convergence and learning ability.

Code

Firstly, we need to import the required libraries for the project.

import pandas as pd
import numpy as np
import torch
from torchvision import models, transforms
from PIL import Image, UnidentifiedImageError
import urllib
import os
from torch_sendnn import torch_sendnn # Import the IBM AIU Compiler Backend

The libraries listed above are pre-installed in the IBM-AIU container. If you need additional libraries that aren't included, please install them as needed. You can install packages using pip after logging into the pod, or directly within your Python code. The following code provides an example of how to check if the library 'openpyxl' is installed and install it if necessary within the Python code.

import subprocess
import sys

# Install the library using subprocess
def install_package(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

def check_and_install(package_name):
    try:
        __import__(package_name)
        print(f"{package_name} is already installed.")
    except ImportError:
        print(f"{package_name} is not installed. Installing now...")
        install_package(package_name)
        print(f"{package_name} installed successfully.")

# check and install 'ucimlrepo'
check_and_install('openpyxl')

Next, we will load and preprocess the test images for inference. In this example, we will use images in JPG format, and we recommend using pictures of animals.

# Image preprocessing transformation
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Function to load and preprocess multiple images
def load_images(image_paths):
    images = []
    for image_path in image_paths:
        try:
            image = Image.open(image_path)
            image = preprocess(image)
            images.append(image)
        except (IOError, UnidentifiedImageError):
            print(f"Skipping non-image file: {image_path}")  # Skip non-image files
    return torch.stack(images)

# Paths of the images to load
image_folder = "Path to the folder of all images"
# Collect all images' paths  
image_files = [os.path.join(image_folder, img) for img in os.listdir(image_folder)]
# Load images and preprocess
images = load_images(image_files)

We also need to download the ImageNet class labels to use with the ResNet50 model. The label file at the URL https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt contains the class labels for the ImageNet dataset, which is a widely used benchmark in computer vision. Each line in the file corresponds to a class label associated with the ImageNet dataset. When you run an image through a pre-trained model (like ResNet50) on ImageNet, the output of the model will be class indices (usually numeric values). You can use these indices to look up the corresponding class names from this label file.

# Download ImageNet class labels
url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
response = urllib.request.urlopen(url)
class_idx = [line.strip() for line in response]
print("class_idx size: ", len(class_idx))

Now, we can load the model and deploy it to the IBM-AIU for inference. The backend parameter has two values: 'sendnn', which loads the model onto the IBM-AIU, and 'inductor', which keeps the model on the CPU.

# Load the pre-trained ResNet-18 model using the 'weights' argument
model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V2)
# Set model to evaluation mode
model.eval() 

# Move model to AIU
aiu_model = torch.compile(model, backend="sendnn")

Currently, the AIU requires initial data to start processing, and the amount of this startup data dictates the batch size for prediction. For instance, if one piece of startup data is used, predictions will process one at a time. If two pieces of startup data are used, predictions will process in pairs. In this example, we begin with a single data point.

#warm up
image_tensor = images[0].unsqueeze(0)  # Change shape to (1, C, H, W)

with torch.no_grad():
     aiu_output = aiu_model(image_tensor)

# Print predicted labels
_, aiu_predicted_indice = aiu_output.max(dim=1)
aiu_predicted_label = class_idx[aiu_predicted_indice.item()]
print(f"Predicted Label: {aiu_predicted_label}")

Since we are using only a single data point to start, we can only process one data point at a time during the actual prediction process.

#real predict
aiu_predict_result = []

for i in range(len(images)):
    # Add a batch dimension
    image_tensor = images[i].unsqueeze(0)  # Change shape to (1, C, H, W)
    
    with torch.no_grad():
         aiu_output = aiu_model(image_tensor)
   
    # Collect predicted labels for each image
    _, aiu_predicted_indice = aiu_output.max(dim=1)
    aiu_predicted_label = class_idx[aiu_predicted_indice.item()]
    aiu_predict_result.append(aiu_predicted_label)
    
print("Predicted Labels: ", aiu_predict_result)

We have successfully executed the model. If you have additional requirements, please adjust it according to your specific needs. Please note that uploading test data to the AIU is not yet permitted in current stage.

More about ResNet50

The accuracy of the pre-trained ResNet50 model on the ImageNet dataset is typically around 76% to 77% top-1 accuracy and about 93% to 94% top-5 accuracy.

Top-1 Accuracy: This measures the percentage of images for which the model's top prediction (the single most likely class) matches the true label.
Top-5 Accuracy: This measures the percentage of images for which the true label is among the model's top five predictions.

These figures can vary slightly depending on the specific implementation and any fine-tuning or additional preprocessing steps that might be applied. Additionally, if the model is evaluated on different datasets or in different contexts, the accuracy may differ.