Ollama + Open WebUI on DGX On-Prem: Your Personal Local LLM Experience
Table of Contents
1. Introduction
Welcome to your guide to running powerful large language models (LLMs) right here on UAlbany's DGX On-Prem system! This tutorial will walk you through setting up Ollama (a local LLM server) along with Open WebUI (a user-friendly interface) on our high-performance computing infrastructure.
What will you get out of this?
Your own private LLM environment running on UAlbany's powerful GPUs.
A clean, intuitive web interface to interact with state-of-the-art models.
Complete privacy - all your prompts and data stay within the university network.
The ability to run Llama 3.2 (and other models) without sending sensitive data to external services.
Think of this as running your own personal ChatGPT-like service, but with the processing happening right here on campus instead of on someone else's servers. It's perfect for research projects, exploring AI capabilities, or just learning about how these systems work!
2. Prerequisites
Before we dive in, let's make sure you have everything you need.
Required Access
Access to UAlbany's DGX On-Prem cluster.
If you don't have access yet, you'll need to request it through your faculty advisor or principal investigator by completing the DGX On-Prem Computation Request Form.
Knowledge Requirements
Basic familiarity with terminal/command line (don't worry, we'll guide you through each step).
Basic understanding of SLURM job submission (if you've never used SLURM before, check out the How to Schedule via SLURM page).
Files Needed
Just the script provided in this tutorial (ollama.slurm) - the script will take care of all the required dependencies.
You'll be able to copy-paste it directly or download it.
3. Understanding What We're Building
Before we jump into the technical details, let's understand what we're setting up.
Ollama: Ollama is like a personal assistant that lives on our servers instead of in the cloud. It's an inference server that can run advanced AI models (like Llama 3.2) locally, meaning all your interactions stay right here on campus. Think of Ollama as the "brain" of our setup - it's what actually processes your prompts and generates responses. To know more about Ollama, please check its GitHub page.
Open WebUI: While Ollama is powerful, typing commands in a terminal isn't everyone's cup of tea. That's where Open WebUI comes in - it's a sleek, web-based interface that makes interacting with Ollama as simple as using a website. It gives you a ChatGPT-like experience but powered by your chosen models running on our infrastructure. To know more about Open WebUI, please check its documentation.
How They Work Together: When you run our script, it sets up both Ollama and Open WebUI on the same DGX node. Ollama runs in the background handling all the AI processing, while Open WebUI provides a user-friendly way to chat with the AI. The script also configures JupyterLab as a maintenance tool (but we won't focus much on that part).
4. The Script Explained
The script is available here: . Please go ahead and download it to your computer and open it. It might look scary at first, but I can assure you it’s not. That said, let's demystify what our script is doing! While it might look complex at first glance, it's essentially just automating all the setup steps that would be tedious to do manually.
#!/bin/bash
#SBATCH --job-name=ollama
#SBATCH --output=ollama-%j.out
#SBATCH --error=ollama-%j.err
#SBATCH --time=8:00:00
#SBATCH --gpus=1
#SBATCH --container-image='docker://nvcr.io#nvidia/pytorch:24.11-py3'
#SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab
This top section is telling SLURM (our job scheduler) how to set up the environment.
We're naming our job "ollama".
We're requesting one GPU for 8 hours.
We're using NVIDIA's PyTorch container (which has all the ML libraries pre-installed).
We're connecting our lab's storage so files persist after the job ends.
The middle section includes several clever functions that check if components already exist.
# Function to check if demo folder already exists
check_demo_folder() {
if [ -d "${demo_location}" ]; then
return 0
fi
return 1
}
# Function to check if Ollama is already installed
check_ollama() {
if [ -f "${demo_location}/ollama/bin/ollama" ]; then
return 0
fi
return 1
}
These functions save time by not reinstalling software that's already there. It's like checking if you already have ingredients before going shopping. The main part of the script is comprised of the following steps.
Sets up random ports for the services.
Creates a demo folder if needed.
Downloads and sets up Ollama if it's not already installed.
Downloads the Llama 3.2 model (only if it's not already downloaded).
Creates a Python virtual environment for Open WebUI.
Installs and starts Open WebUI.
Configures JupyterLab as an additional tool.
When the script finishes running, it displays URLs and login information for accessing your services.
5. Step-by-Step Tutorial
Now let's put this into action! By the way, if you need some visual assistance, take a look at the video below and see how to get this tutorial working.
As you may notice, the first time you run the script might take a while as there is a ton of dependencies do be downloaded.
When you run the script a second time, it will run much faster as the dependencies will be already in place.
a. Access the DGX System
First, connect to the DGX On-Prem cluster.
ssh your_netid@dgx-head01.its.albany.edu
You'll need to enter your NetID password. If you're off-campus, make sure you're connected to the VPN first.
b. Create and Save the Script
Once logged in, navigate to your lab folder and create a new file for our script.
vim ollama.slurm
This opens the vim text editor. Press i
to enter insert mode, then copy-paste the entire script content (provided at the end of this wiki) into the editor.
Press Esc
to exit insert mode, then type :wq
and press Enter
to save and exit.
c. Customize the Script
Before running the script, you need to modify it for your specific lab directories.
vim ollama.slurm
Look for the following line (near the top).
#SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab
Change it to your lab's paths as follows.
#SBATCH --container-mounts=/network/rit/dgx/dgx_YOUR_LAB_NAME:/mnt/dgx_lab,/network/rit/lab/YOUR_LAB_NAME:/mnt/lab
Replace YOUR_LAB_NAME
with your actual lab name.
d. Submit the Job
Now, submit your job to SLURM using the following command.
sbatch ollama.slurm
SLURM will assign your job a number and start it when resources are available. You can check its status with the following command.
squeue -u your_netid
e. Check the Output
Once your job starts running, you can monitor its progress by different ways (such as the following).
tail -f ollama-*.out
This shows the output in real-time. You'll see messages as Ollama and Open WebUI are set up.
f. Get Connection Information
When setup is complete (usually takes 5-10 minutes depending on download speeds), the output file will contain URLs and login information.
cat ollama-*.out
Look for a section that looks like the following.
================================================================================
Ollama API is available at: http://dgx06.its.albany.edu:8123
Open WebUI is available at: http://dgx06.its.albany.edu:8456
Make sure to update the Ollama port in the Open WebUI settings to 8123.
For more information please visit: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/
================================================================================
Note: Your URLs will have different port numbers since they're randomly generated.
6. Accessing and Using Your Services
Now that everything is set up, it's time to start using your local LLM!
Using Open WebUI
Copy the Open WebUI URL from your output file and paste it into your browser.
You'll see a login screen - for first-time setup…
Create an account (this is just stored locally).
Go to Admin Panel → Settings → Connections → Manage Ollama API Connections.
Change the port with the Ollama API port value provided in your output file (e.g., from http://localhost:11434 to http://localhost:8123).
Click "Save" and that’s it.
Now you can start chatting! Select "llama3.2" from the model dropdown and start asking questions.
Making API Calls Directly to Ollama
If you prefer programmatic access, you can interact with Ollama directly - remember to use your own URLs and port numbers.
Using curl
curl -X POST http://dgx06.its.albany.edu:8123/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in simple terms",
"stream": false
}'
Using Python
import requests
def ask_ollama(prompt, model="llama3.2"):
url = "http://dgx06.its.albany.edu:8123/api/generate" # Replace with your actual URL
payload = {
"model": model,
"prompt": prompt,
"stream": False
}
response = requests.post(url, json=payload)
return response.json()["response"]
# Example usage
answer = ask_ollama("What are the main challenges in artificial intelligence research?")
print(answer)
Using Postman
Create a new
POST
request to your Ollama URL.Set the
Content-Type
header toapplication/json
.In the body, add the following.
{ "model": "llama3.2", "prompt": "Your question here", "stream": false }
Send the request and view the response.
JupyterLab Access (For Maintenance)
While we won't focus much on this, you can access JupyterLab using the URL and password provided in the output file. This is primarily useful if you need to troubleshoot or modify files directly.
7. Advanced Customization
Using Different Models
Pulling Other Models from Ollama's Library:
The script currently pulls Llama 3.2, but you can easily modify it to use other models. Edit the script and find the following line.
${demo_location}/ollama/bin/ollama pull llama3.2
Replace llama3.2
with any model from Ollama's library.
${demo_location}/ollama/bin/ollama pull mistral
# or
${demo_location}/ollama/bin/ollama pull gemma
# or
${demo_location}/ollama/bin/ollama pull phi
You can see the full list of available models at Ollama's model library. You can also pull new models straight from the Open WebUI interface or from the terminal using JupyterLab.
By the way, Ollama supports LLaVA, a large multimodal model that provides image to text capabilities. This means you can upload images via Open WebUI and ask the model to describe them, which is also a very cool thing to give it a try!
Using Custom GGUF Models
If you have your own fine-tuned models in GGUF format, you can use them with Ollama by creating a custom Modelfile.
Create a Modelfile in your lab directory.
FROM /path/to/your/model.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9
Import it into Ollama.
${demo_location}/ollama/bin/ollama create mymodel -f /path/to/Modelfile
Then use it just like any other model through the WebUI or API.
If you don’t want to modify the script, you can also create your own model files straight from the Open WebUI interface or from the terminal using JupyterLab.
Adjusting Resource Requests
Requesting Multiple GPUs
For larger models or better performance, modify the following line in the script.
#SBATCH --gpus=1
Change it as needed.
#SBATCH --gpus=2 # or 4, 8, etc.
Note that requesting more GPUs means you might wait longer for your job to start.
Extending Runtime
Need more than 8 hours? Modify the following line.
#SBATCH --time=8:00:00
Change it as needed.
#SBATCH --time=24:00:00 # for 24 hours
Persistent Storage Strategies
Managing Model Files
The script saves downloaded models to the following path.
${demo_location}/ollama/models
This folder persists between sessions, so models are only downloaded once. You can manage storage in different ways.
To check model sizes, run the following command.
du -h ${demo_location}/ollama/models
To remove unused models, run the following command.
${demo_location}/ollama/bin/ollama rm model_name
If you're working with multiple large models, be mindful of your lab storage quota. You can always delete and re-download models as needed.
8. Cleanup and Next Steps
Ending Your Session
Your SLURM job will automatically terminate after the time specified (default: 8 hours). If you want to end it early, just try the following steps.
Find your job ID.
squeue -u your_netid
Cancel the job.
scancel job_id
Saving Your Work
For Open-WebUI, the SQLite database serves as the backbone for user management, chat history, file storage, and various other core functionalities. In this way, conversations should persist between sessions and jobs as long as you don’t delete the demo folder.
Next Steps for Exploration
Once you're comfortable with the basic setup, consider the following steps.
Experimenting with different models to compare performance.
Integrating Ollama into your research workflow.
Exploring the API capabilities for automation.
9. External Resources
10. Full Script Reference
Here's the complete script () for easy copying.
#!/bin/bash
#SBATCH --job-name=ollama
#SBATCH --output=ollama-%j.out
#SBATCH --error=ollama-%j.err
#SBATCH --time=8:00:00
#SBATCH --gpus=1
#SBATCH --container-image='docker://nvcr.io#nvidia/pytorch:24.11-py3'
#SBATCH --container-mounts=/network/rit/dgx/dgx_YOUR_LAB_NAME:/mnt/dgx_lab,/network/rit/lab/YOUR_LAB_NAME:/mnt/lab
# Function to check if demo folder already exists
check_demo_folder() {
if [ -d "${demo_location}" ]; then
return 0
fi
return 1
}
# Function to check if Ollama is already installed
check_ollama() {
if [ -f "${demo_location}/ollama/bin/ollama" ]; then
return 0
fi
return 1
}
# Function to check if virtual environment exists
check_venv() {
if [ -d "${demo_location}/open-webui" ] && [ -f "${demo_location}/open-webui/bin/activate" ]; then
return 0
fi
return 1
}
# Function to check if Open WebUI is installed
check_openwebui() {
if pip list | grep -q "open-webui"; then
return 0
fi
return 1
}
# Get the DGX node name
node_name="$SLURMD_NODENAME"
echo -e "\nThe Ollama + Open WebUI Demo is starting..."
# Generate random port numbers between 8000 and 8999
ollama_port=$((RANDOM % 1000 + 8000))
openwebui_port=$((RANDOM % 1000 + 8000))
jupyter_port=$((RANDOM % 1000 + 8000))
# Build the URLs
ollama_url="http://${node_name}.its.albany.edu:${ollama_port}"
openwebui_url="http://${node_name}.its.albany.edu:${openwebui_port}"
jupyter_url="http://${node_name}.its.albany.edu:${jupyter_port}"
# Setup demo location
demo_location="/mnt/dgx_lab/ollama-demo"
# Check and create demo folder if needed
echo -e "\nChecking for demo folder..."
if ! check_demo_folder; then
echo -e "\nNot found: creating demo folder..."
mkdir -p ${demo_location}
echo -e "\n🔵 Demo folder created at ${demo_location}."
else
echo -e "\n⚪ Demo folder already exists at ${demo_location}."
fi
# Setup Ollama models directory
mkdir -p ${demo_location}/ollama/models
# Download Ollama
echo -e "\nChecking for Ollama installation..."
if ! check_ollama; then
echo -e "\nNot found: downloading Ollama..."
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ${demo_location}/ollama/ollama-linux-amd64.tgz
tar -xzf ${demo_location}/ollama/ollama-linux-amd64.tgz -C ${demo_location}/ollama
echo -e "\n🔵 Ollama downloaded and extracted to ${demo_location}/ollama."
else
echo -e "\n⚪ Ollama already installed at ${demo_location}/ollama."
fi
# Fix Ollama environment variables
export OLLAMA_HOST=0.0.0.0:${ollama_port}
export OLLAMA_MODELS=${demo_location}/ollama/models
# Start Ollama
echo -e "\nStarting Ollama server..."
nohup ${demo_location}/ollama/bin/ollama serve > /dev/null 2>&1 &
echo -e "\n🟢 Ollama server started on ${ollama_url}."
# Download Llama 3.2
echo -e "\nChecking for Llama 3.2 model..."
sleep 10
if ! ${demo_location}/ollama/bin/ollama show llama3.2 &>/dev/null; then
echo -e "\nNot found: downloading Llama 3.2 model..."
${demo_location}/ollama/bin/ollama pull llama3.2
echo -e "\n🔵 Llama 3.2 model downloaded."
else
echo -e "\n⚪ Llama 3.2 model already available."
fi
# Create virtual environment
echo -e "\nChecking for virtual environment..."
if ! check_venv; then
echo -e "\nNot found: creating virtual environment for Open WebUI..."
python3 -m venv ${demo_location}/open-webui
echo -e "\n🔵 Virtual environment created."
else
echo -e "\n⚪ Virtual environment already exists at ${demo_location}/open-webui."
fi
# Activate virtual environment
echo -e "\nActivating virtual environment..."
source ${demo_location}/open-webui/bin/activate
echo -e "\n🟢 Virtual environment activated."
# Install Open WebUI
echo -e "\nChecking for Open WebUI installation..."
if ! check_openwebui; then
echo -e "\nNot found: installing Open WebUI..."
pip install open-webui===0.6.2
echo -e "\n🔵 Open WebUI installed."
else
echo -e "\n⚪ Open WebUI is already installed."
fi
# Start Open WebUI
echo -e "\nStarting Open WebUI server..."
export WEBUI_SECRET_KEY="t0p-s3cr3t"
nohup open-webui serve --port=${openwebui_port} > /dev/null 2>&1 &
echo -e "\n🟢 Open WebUI server started on ${openwebui_url}."
# Generate a random password for JupyterLab (alphanumeric, 6 characters)
jupyter_password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 6)
# Print session details
echo -e "\n================================================================================\n"
echo -e "Ollama API is available at: ${ollama_url}\n"
echo -e "Open WebUI is available at: ${openwebui_url}\n"
echo -e "Make sure to update the Ollama port in the Open WebUI settings to ${ollama_port}.\n"
echo -e "For more information please visit: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/"
echo -e "\n================================================================================\n"
echo -e "JupyterLab is available at: ${jupyter_url}\n"
echo -e "Your password is: ${jupyter_password}\n"
echo -e "Please copy and paste the link into your browser and use the password to log in."
echo -e "\n================================================================================\n"
# Start JupyterLab session
jupyter lab --allow-root --no-browser --NotebookApp.token="${jupyter_password}" --NotebookApp.allow_origin='*' --NotebookApp.log_level='CRITICAL' --notebook-dir=/mnt --port=$jupyter_port
Remember to replace YOUR_LAB_NAME
with your actual lab name before using the script.
Happy LLM-ing!