Page Comparison

...

Panel

panelIconId	atlassian-info
panelIcon	:info:
bgColor	#F4F5F7

Things to Know Before You Start

The AIU is optimized specifically for the inference process of AI workloads. Any training performed on this environment will rely solely on its CPUs, which may result in suboptimal performance.
As per the latest official documentation, the IBM AIU SDK supports only PyTorch models. Models relying on other libraries, such as TensorFlow, will default to the CPU, which may result in suboptimal performance.

The University at Albany has a cluster of IBM AIU prototype chips available for its students, faculty, and researchers to work on various AI technologies. In this way, please note that these AIU chips are currently prototypes, and the environment in which they operate is experimental. As such, hardware and software configurations are subject to change as development continues.

Connecting to the AIU Cluster

First, connect to the AIU head cluster (aiu-headnode.its.albany.edu) via SSH using your preferred method. Next, log in to OpenShift using the oc login command.

...

Code Block
cd /network/rit/lab/<your_lab>/IBM-AIU-Files

Deploying a Pod

To test deploying a single AIU pod, start by creating a 1aiu.yaml file using the following example YAML. Replace <your pod name> with the desired name for your pod.

...

Code Block
Defaulted container "c1" out of: c1, aiu-monitor [56551@jun-pod ~]$

Inference Using PyTorch Models

As part of the software stack within the container, PyTorch2 is available for use.

Custom Models From HuggingFace

This example will allow the user to point to a pre-trained model on HuggingFace and to run that model with an IBM AIU that is assigned to a container within an OpenShift pod. Within the /opt/ibm/aiu/examples directory, there is a torch_roberta.py script that will download, compile, and run a RoBERTa model from HuggingFace. The script provides context to the model along with a question, as shown below.

...

If you want to use a different model from HuggingFace, there are several deepset models available. You'll need an account to select a different model and update lines 5 and 6 (the tokenizer and model, respectively) as needed. You may need to adjust your code slightly, but the overall approach remains the same. Feel free to copy and paste torch_roberta.py and use it as a boilerplate for your own tests.

Your Own Models

To leverage the AIU Cluster for the inference part of your AI workloads, you need to disable gradient calculation in the PyTorch model and compile it with the appropriate backend, as outlined in the IBM documentation. The following code snippet provides a simple way to adjust your model code and run it on this hardware.

...

As previously mentioned, the AIU is optimized specifically for the inference process of AI workloads. While it is technically possible to train models in this environment, it is not recommended, as it relies solely on CPUs. This can lead to suboptimal performance, particularly for larger models and datasets.

Wrapping Up

When you finish, make sure to release the allocated resources. Please note that the environment storage is not persistent; therefore, every time you start or stop it, all the files you modified will be lost. Be sure to save all your work. Make sure to exit the pod by running exit. Then, run the following commands.

...

Version	Old Version 1	New Version 2
Changes made by	Karker, Nicholas	Tan, Jun
Saved on	Jan 14, 2025	Jan 14, 2025

Versions Compared

Key