...
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
Things to Know Before You Start
|
The University at Albany has a cluster of IBM AIU prototype chips available for its students, faculty, and researchers to work on various AI technologies. In this way, please note that these AIU chips are currently prototypes, and the environment in which they operate is experimental. As such, hardware and software configurations are subject to change as development continues.
Connecting to the AIU Cluster
First, connect to the AIU head cluster (aiu-headnode.its.albany.edu) via SSH using your preferred method. Next, log in to OpenShift using the oc login
command.
...
Code Block |
---|
cd /network/rit/lab/<your_lab>/IBM-AIU-Files |
Deploying a Pod
To test deploying a single AIU pod, start by creating a 1aiu.yaml
file using the following example YAML. Replace <your pod name>
with the desired name for your pod.
...
Code Block |
---|
Defaulted container "c1" out of: c1, aiu-monitor [56551@jun-pod ~]$ |
Inference Using PyTorch Models
As part of the software stack within the container, PyTorch2 is available for use.
Custom Models From HuggingFace
This example will allow the user to point to a pre-trained model on HuggingFace and to run that model with an IBM AIU that is assigned to a container within an OpenShift pod. Within the /opt/ibm/aiu/examples
directory, there is a torch_roberta.py
script that will download, compile, and run a RoBERTa model from HuggingFace. The script provides context to the model along with a question, as shown below.
...
If you want to use a different model from HuggingFace, there are several deepset models available. You'll need an account to select a different model and update lines 5 and 6 (the tokenizer
and model
, respectively) as needed. You may need to adjust your code slightly, but the overall approach remains the same. Feel free to copy and paste torch_roberta.py
and use it as a boilerplate for your own tests.
Your Own Models
To leverage the AIU Cluster for the inference part of your AI workloads, you need to disable gradient calculation in the PyTorch model and compile it with the appropriate backend, as outlined in the IBM documentation. The following code snippet provides a simple way to adjust your model code and run it on this hardware.
...
As previously mentioned, the AIU is optimized specifically for the inference process of AI workloads. While it is technically possible to train models in this environment, it is not recommended, as it relies solely on CPUs. This can lead to suboptimal performance, particularly for larger models and datasets.
Wrapping Up
When you finish, make sure to release the allocated resources. Please note that the environment storage is not persistent; therefore, every time you start or stop it, all the files you modified will be lost. Be sure to save all your work. Make sure to exit the pod by running exit
. Then, run the following commands.
...