Ollama + Open WebUI on DGX On-Prem: Your Personal Local LLM Experience

1. Introduction

Welcome to your guide to running powerful large language models (LLMs) right here on UAlbany's DGX On-Prem system! This tutorial will walk you through setting up Ollama (a local LLM server) along with Open WebUI (a user-friendly interface) on our high-performance computing infrastructure.

What will you get out of this?

Your own private LLM environment running on UAlbany's powerful GPUs.
A clean, intuitive web interface to interact with state-of-the-art models.
Complete privacy - all your prompts and data stay within the university network.
The ability to run Llama 3.2 (and other models) without sending sensitive data to external services.

Think of this as running your own personal ChatGPT-like service, but with the processing happening right here on campus instead of on someone else's servers. It's perfect for research projects, exploring AI capabilities, or just learning about how these systems work!

2. Prerequisites

Before we dive in, let's make sure you have everything you need.

Required Access

Access to UAlbany's DGX On-Prem cluster.
If you don't have access yet, you'll need to request it through your faculty advisor or principal investigator by completing the DGX On-Prem Computation Request Form.

Knowledge Requirements

Basic familiarity with terminal/command line (don't worry, we'll guide you through each step).
Basic understanding of SLURM job submission (if you've never used SLURM before, check out the How to Schedule via SLURM page).

Files Needed

Just the script provided in this tutorial (ollama.slurm) - the script will take care of all the required dependencies.
You'll be able to copy-paste it directly or download it.

3. Understanding What We're Building

Before we jump into the technical details, let's understand what we're setting up.

Ollama: Ollama is like a personal assistant that lives on our servers instead of in the cloud. It's an inference server that can run advanced AI models (like Llama 3.2) locally, meaning all your interactions stay right here on campus. Think of Ollama as the "brain" of our setup - it's what actually processes your prompts and generates responses. To know more about Ollama, please check its GitHub page.

Open WebUI: While Ollama is powerful, typing commands in a terminal isn't everyone's cup of tea. That's where Open WebUI comes in - it's a sleek, web-based interface that makes interacting with Ollama as simple as using a website. It gives you a ChatGPT-like experience but powered by your chosen models running on our infrastructure. To know more about Open WebUI, please check its documentation.

How They Work Together: When you run our script, it sets up both Ollama and Open WebUI on the same DGX node. Ollama runs in the background handling all the AI processing, while Open WebUI provides a user-friendly way to chat with the AI. The script also configures JupyterLab as a maintenance tool (but we won't focus much on that part).

4. The Script Explained

The script is available here: . Please go ahead and download it to your computer and open it. It might look scary at first, but I can assure you it’s not. That said, let's demystify what our script is doing! While it might look complex at first glance, it's essentially just automating all the setup steps that would be tedious to do manually.

#!/bin/bash

#SBATCH --job-name=ollama
#SBATCH --output=ollama-%j.out
#SBATCH --error=ollama-%j.err
#SBATCH --time=8:00:00
#SBATCH --gpus=1
#SBATCH --container-image='docker://nvcr.io#nvidia/pytorch:24.11-py3'
#SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab

This top section is telling SLURM (our job scheduler) how to set up the environment.

We're naming our job "ollama".
We're requesting one GPU for 8 hours.
We're using NVIDIA's PyTorch container (which has all the ML libraries pre-installed).
We're connecting our lab's storage so files persist after the job ends.

The middle section includes several clever functions that check if components already exist.

# Function to check if demo folder already exists
check_demo_folder() {
    if [ -d "${demo_location}" ]; then
        return 0
    fi
    return 1
}

# Function to check if Ollama is already installed
check_ollama() {
    if [ -f "${demo_location}/ollama/bin/ollama" ]; then
        return 0
    fi
    return 1
}

These functions save time by not reinstalling software that's already there. It's like checking if you already have ingredients before going shopping. The main part of the script is comprised of the following steps.

Sets up random ports for the services.
Creates a demo folder if needed.
Downloads and sets up Ollama if it's not already installed.
Downloads the Llama 3.2 model (only if it's not already downloaded).
Creates a Python virtual environment for Open WebUI.
Installs and starts Open WebUI.
Configures JupyterLab as an additional tool.

When the script finishes running, it displays URLs and login information for accessing your services.

5. Step-by-Step Tutorial

Now let's put this into action! By the way, if you need some visual assistance, take a look at the video below and see how to get this tutorial working.

As you may notice, the first time you run the script might take a while as there is a ton of dependencies do be downloaded.
When you run the script a second time, it will run much faster as the dependencies will be already in place.

a. Access the DGX System

First, connect to the DGX On-Prem cluster.

ssh your_netid@dgx-head01.its.albany.edu

You'll need to enter your NetID password. If you're off-campus, make sure you're connected to the VPN first.

b. Create and Save the Script

Once logged in, navigate to your lab folder and create a new file for our script.

vim ollama.slurm

This opens the vim text editor. Press i to enter insert mode, then copy-paste the entire script content (provided at the end of this wiki) into the editor.

Press Esc to exit insert mode, then type :wq and press Enter to save and exit.

c. Customize the Script

Before running the script, you need to modify it for your specific lab directories.

vim ollama.slurm

Look for the following line (near the top).

#SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab

Change it to your lab's paths as follows.

#SBATCH --container-mounts=/network/rit/dgx/dgx_YOUR_LAB_NAME:/mnt/dgx_lab,/network/rit/lab/YOUR_LAB_NAME:/mnt/lab

Replace YOUR_LAB_NAME with your actual lab name.

d. Submit the Job

Now, submit your job to SLURM using the following command.

sbatch ollama.slurm

SLURM will assign your job a number and start it when resources are available. You can check its status with the following command.

squeue -u your_netid

e. Check the Output

Once your job starts running, you can monitor its progress by different ways (such as the following).

tail -f ollama-*.out

This shows the output in real-time. You'll see messages as Ollama and Open WebUI are set up.

f. Get Connection Information

When setup is complete (usually takes 5-10 minutes depending on download speeds), the output file will contain URLs and login information.

cat ollama-*.out

Look for a section that looks like the following.

================================================================================

Ollama API is available at: http://dgx06.its.albany.edu:8123

Open WebUI is available at: http://dgx06.its.albany.edu:8456

Make sure to update the Ollama port in the Open WebUI settings to 8123.

For more information please visit: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/

================================================================================

Note: Your URLs will have different port numbers since they're randomly generated.

6. Accessing and Using Your Services

Now that everything is set up, it's time to start using your local LLM!

Using Open WebUI

Copy the Open WebUI URL from your output file and paste it into your browser.
You'll see a login screen - for first-time setup…
- Create an account (this is just stored locally).
- Go to Admin Panel → Settings → Connections → Manage Ollama API Connections.
- Change the port with the Ollama API port value provided in your output file (e.g., from http://localhost:11434 to http://localhost:8123).
- Click "Save" and that’s it.
Now you can start chatting! Select "llama3.2" from the model dropdown and start asking questions.

Making API Calls Directly to Ollama

If you prefer programmatic access, you can interact with Ollama directly - remember to use your own URLs and port numbers.

Using curl

curl -X POST http://dgx06.its.albany.edu:8123/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in simple terms",
  "stream": false
}'

Using Python

import requests

def ask_ollama(prompt, model="llama3.2"):
    url = "http://dgx06.its.albany.edu:8123/api/generate"  # Replace with your actual URL
    payload = {
        "model": model,
        "prompt": prompt,
        "stream": False
    }
    
    response = requests.post(url, json=payload)
    return response.json()["response"]

# Example usage
answer = ask_ollama("What are the main challenges in artificial intelligence research?")
print(answer)

Using Postman

Create a new POST request to your Ollama URL.
Set the Content-Type header to application/json.
In the body, add the following.
{ "model": "llama3.2", "prompt": "Your question here", "stream": false }
Send the request and view the response.

JupyterLab Access (For Maintenance)

While we won't focus much on this, you can access JupyterLab using the URL and password provided in the output file. This is primarily useful if you need to troubleshoot or modify files directly.

7. Advanced Customization

Using Different Models

Pulling Other Models from Ollama's Library:

The script currently pulls Llama 3.2, but you can easily modify it to use other models. Edit the script and find the following line.

${demo_location}/ollama/bin/ollama pull llama3.2

Replace llama3.2 with any model from Ollama's library.

${demo_location}/ollama/bin/ollama pull mistral
# or
${demo_location}/ollama/bin/ollama pull gemma
# or
${demo_location}/ollama/bin/ollama pull phi

You can see the full list of available models at Ollama's model library. You can also pull new models straight from the Open WebUI interface or from the terminal using JupyterLab.

By the way, Ollama supports LLaVA, a large multimodal model that provides image to text capabilities. This means you can upload images via Open WebUI and ask the model to describe them, which is also a very cool thing to give it a try!

Using Custom GGUF Models

If you have your own fine-tuned models in GGUF format, you can use them with Ollama by creating a custom Modelfile.

Create a Modelfile in your lab directory.
FROM /path/to/your/model.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9
Import it into Ollama.
${demo_location}/ollama/bin/ollama create mymodel -f /path/to/Modelfile
Then use it just like any other model through the WebUI or API.

If you don’t want to modify the script, you can also create your own model files straight from the Open WebUI interface or from the terminal using JupyterLab.

Adjusting Resource Requests

Requesting Multiple GPUs

For larger models or better performance, modify the following line in the script.

#SBATCH --gpus=1

Change it as needed.

#SBATCH --gpus=2 # or 4, 8, etc.

Note that requesting more GPUs means you might wait longer for your job to start.

Extending Runtime

Need more than 8 hours? Modify the following line.

#SBATCH --time=8:00:00

Change it as needed.

#SBATCH --time=24:00:00 # for 24 hours

Persistent Storage Strategies

Managing Model Files

The script saves downloaded models to the following path.

${demo_location}/ollama/models

This folder persists between sessions, so models are only downloaded once. You can manage storage in different ways.

To check model sizes, run the following command.
du -h ${demo_location}/ollama/models
To remove unused models, run the following command.
${demo_location}/ollama/bin/ollama rm model_name

If you're working with multiple large models, be mindful of your lab storage quota. You can always delete and re-download models as needed.

8. Cleanup and Next Steps

Ending Your Session

Your SLURM job will automatically terminate after the time specified (default: 8 hours). If you want to end it early, just try the following steps.

Find your job ID.
squeue -u your_netid
Cancel the job.
scancel job_id

Saving Your Work

For Open-WebUI, the SQLite database serves as the backbone for user management, chat history, file storage, and various other core functionalities. In this way, conversations should persist between sessions and jobs as long as you don’t delete the demo folder.

Next Steps for Exploration

Once you're comfortable with the basic setup, consider the following steps.

Experimenting with different models to compare performance.
Integrating Ollama into your research workflow.
Exploring the API capabilities for automation.

9. External Resources

10. Full Script Reference

Here's the complete script () for easy copying.

#!/bin/bash

#SBATCH --job-name=ollama
#SBATCH --output=ollama-%j.out
#SBATCH --error=ollama-%j.err
#SBATCH --time=8:00:00
#SBATCH --gpus=1
#SBATCH --container-image='docker://nvcr.io#nvidia/pytorch:24.11-py3'
#SBATCH --container-mounts=/network/rit/dgx/dgx_YOUR_LAB_NAME:/mnt/dgx_lab,/network/rit/lab/YOUR_LAB_NAME:/mnt/lab

# Function to check if demo folder already exists
check_demo_folder() {
    if [ -d "${demo_location}" ]; then
        return 0
    fi
    return 1
}

# Function to check if Ollama is already installed
check_ollama() {
    if [ -f "${demo_location}/ollama/bin/ollama" ]; then
        return 0
    fi
    return 1
}

# Function to check if virtual environment exists
check_venv() {
    if [ -d "${demo_location}/open-webui" ] && [ -f "${demo_location}/open-webui/bin/activate" ]; then
        return 0
    fi
    return 1
}

# Function to check if Open WebUI is installed
check_openwebui() {
    if pip list | grep -q "open-webui"; then
        return 0
    fi
    return 1
}

# Get the DGX node name
node_name="$SLURMD_NODENAME"

echo -e "\nThe Ollama + Open WebUI Demo is starting..."

# Generate random port numbers between 8000 and 8999
ollama_port=$((RANDOM % 1000 + 8000))
openwebui_port=$((RANDOM % 1000 + 8000))
jupyter_port=$((RANDOM % 1000 + 8000))

# Build the URLs
ollama_url="http://${node_name}.its.albany.edu:${ollama_port}"
openwebui_url="http://${node_name}.its.albany.edu:${openwebui_port}"
jupyter_url="http://${node_name}.its.albany.edu:${jupyter_port}"

# Setup demo location
demo_location="/mnt/dgx_lab/ollama-demo"

# Check and create demo folder if needed
echo -e "\nChecking for demo folder..."
if ! check_demo_folder; then
    echo -e "\nNot found: creating demo folder..."
    mkdir -p ${demo_location}
    echo -e "\n🔵 Demo folder created at ${demo_location}."
else
     echo -e "\n⚪ Demo folder already exists at ${demo_location}."
fi

# Setup Ollama models directory
mkdir -p ${demo_location}/ollama/models

# Download Ollama
echo -e "\nChecking for Ollama installation..."
if ! check_ollama; then
    echo -e "\nNot found: downloading Ollama..." 
    curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ${demo_location}/ollama/ollama-linux-amd64.tgz
    tar -xzf ${demo_location}/ollama/ollama-linux-amd64.tgz -C ${demo_location}/ollama
    echo -e "\n🔵 Ollama downloaded and extracted to ${demo_location}/ollama."
else
    echo -e "\n⚪ Ollama already installed at ${demo_location}/ollama."
fi

# Fix Ollama environment variables
export OLLAMA_HOST=0.0.0.0:${ollama_port}
export OLLAMA_MODELS=${demo_location}/ollama/models

# Start Ollama
echo -e "\nStarting Ollama server..."
nohup ${demo_location}/ollama/bin/ollama serve > /dev/null 2>&1 &
echo -e "\n🟢 Ollama server started on ${ollama_url}."

# Download Llama 3.2
echo -e "\nChecking for Llama 3.2 model..."
sleep 10
if ! ${demo_location}/ollama/bin/ollama show llama3.2 &>/dev/null; then
    echo -e "\nNot found: downloading Llama 3.2 model..."
    ${demo_location}/ollama/bin/ollama pull llama3.2
    echo -e "\n🔵 Llama 3.2 model downloaded."
else
    echo -e "\n⚪ Llama 3.2 model already available."
fi

# Create virtual environment
echo -e "\nChecking for virtual environment..."
if ! check_venv; then
    echo -e "\nNot found: creating virtual environment for Open WebUI..."
    python3 -m venv ${demo_location}/open-webui
    echo -e "\n🔵 Virtual environment created."
else 
    echo -e "\n⚪ Virtual environment already exists at ${demo_location}/open-webui."
fi

# Activate virtual environment
echo -e "\nActivating virtual environment..."
source ${demo_location}/open-webui/bin/activate
echo -e "\n🟢 Virtual environment activated."

# Install Open WebUI
echo -e "\nChecking for Open WebUI installation..."
if ! check_openwebui; then
    echo -e "\nNot found: installing Open WebUI..."
    pip install open-webui===0.6.2
    echo -e "\n🔵 Open WebUI installed."
else
    echo -e "\n⚪ Open WebUI is already installed."
fi

# Start Open WebUI
echo -e "\nStarting Open WebUI server..."
export WEBUI_SECRET_KEY="t0p-s3cr3t"
nohup open-webui serve --port=${openwebui_port} > /dev/null 2>&1 &
echo -e "\n🟢 Open WebUI server started on ${openwebui_url}."

# Generate a random password for JupyterLab (alphanumeric, 6 characters)
jupyter_password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 6)

# Print session details
echo -e "\n================================================================================\n"
echo -e "Ollama API is available at: ${ollama_url}\n"
echo -e "Open WebUI is available at: ${openwebui_url}\n"
echo -e "Make sure to update the Ollama port in the Open WebUI settings to ${ollama_port}.\n"
echo -e "For more information please visit: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/"
echo -e "\n================================================================================\n"
echo -e "JupyterLab is available at: ${jupyter_url}\n"
echo -e "Your password is: ${jupyter_password}\n"
echo -e "Please copy and paste the link into your browser and use the password to log in."
echo -e "\n================================================================================\n"

# Start JupyterLab session
jupyter lab --allow-root --no-browser --NotebookApp.token="${jupyter_password}" --NotebookApp.allow_origin='*' --NotebookApp.log_level='CRITICAL' --notebook-dir=/mnt --port=$jupyter_port

Remember to replace YOUR_LAB_NAME with your actual lab name before using the script.

Happy LLM-ing!

askIT