Ollama + Open WebUI on DGX On-Prem: Your Personal Local LLM Experience

Ollama + Open WebUI on DGX On-Prem: Your Personal Local LLM Experience

Table of Contents

  1. Introduction

  2. Prerequisites

  3. Understanding What We’re Building

  4. The Script Explained

  5. Step-by-Step Tutorial

    1. Access the DGX System

    2. Create and Save the Script

    3. Customize the Script

    4. Submit the Job

    5. Check the Output

    6. Get Connection Information

  6. Accessing and Using Your Services

  7. Advanced Customization

  8. Cleanup and Next Steps

  9. External Resources

  10. Full Script Reference

1. Introduction

Welcome to your guide to running powerful large language models (LLMs) right here on UAlbany's DGX On-Prem system! This tutorial will walk you through setting up Ollama (a local LLM server) along with Open WebUI (a user-friendly interface) on our high-performance computing infrastructure.

What will you get out of this?

  • Your own private LLM environment running on UAlbany's powerful GPUs.

  • A clean, intuitive web interface to interact with state-of-the-art models.

  • Complete privacy - all your prompts and data stay within the university network.

  • The ability to run Llama 3.2 (and other models) without sending sensitive data to external services.

Think of this as running your own personal ChatGPT-like service, but with the processing happening right here on campus instead of on someone else's servers. It's perfect for research projects, exploring AI capabilities, or just learning about how these systems work!

2. Prerequisites

Before we dive in, let's make sure you have everything you need.

Required Access

  • Access to UAlbany's DGX On-Prem cluster.

  • If you don't have access yet, you'll need to request it through your faculty advisor or principal investigator by completing the DGX On-Prem Computation Request Form.

Knowledge Requirements

  • Basic familiarity with terminal/command line (don't worry, we'll guide you through each step).

  • Basic understanding of SLURM job submission (if you've never used SLURM before, check out the How to Schedule via SLURM page).

Files Needed

  • Just the script provided in this tutorial (ollama.slurm) - the script will take care of all the required dependencies.

  • You'll be able to copy-paste it directly or download it.

3. Understanding What We're Building

Before we jump into the technical details, let's understand what we're setting up.

ollama-open-webui-diagram.png

Ollama: Ollama is like a personal assistant that lives on our servers instead of in the cloud. It's an inference server that can run advanced AI models (like Llama 3.2) locally, meaning all your interactions stay right here on campus. Think of Ollama as the "brain" of our setup - it's what actually processes your prompts and generates responses. To know more about Ollama, please check its GitHub page.

Open WebUI: While Ollama is powerful, typing commands in a terminal isn't everyone's cup of tea. That's where Open WebUI comes in - it's a sleek, web-based interface that makes interacting with Ollama as simple as using a website. It gives you a ChatGPT-like experience but powered by your chosen models running on our infrastructure. To know more about Open WebUI, please check its documentation.

How They Work Together: When you run our script, it sets up both Ollama and Open WebUI on the same DGX node. Ollama runs in the background handling all the AI processing, while Open WebUI provides a user-friendly way to chat with the AI. The script also configures JupyterLab as a maintenance tool (but we won't focus much on that part).

4. The Script Explained

The script is available here: . Please go ahead and download it to your computer and open it. It might look scary at first, but I can assure you it’s not. That said, let's demystify what our script is doing! While it might look complex at first glance, it's essentially just automating all the setup steps that would be tedious to do manually.

#!/bin/bash #SBATCH --job-name=ollama #SBATCH --output=ollama-%j.out #SBATCH --error=ollama-%j.err #SBATCH --time=8:00:00 #SBATCH --gpus=1 #SBATCH --container-image='docker://nvcr.io#nvidia/pytorch:24.11-py3' #SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab

This top section is telling SLURM (our job scheduler) how to set up the environment.

  • We're naming our job "ollama".

  • We're requesting one GPU for 8 hours.

  • We're using NVIDIA's PyTorch container (which has all the ML libraries pre-installed).

  • We're connecting our lab's storage so files persist after the job ends.

The middle section includes several clever functions that check if components already exist.

# Function to check if demo folder already exists check_demo_folder() { if [ -d "${demo_location}" ]; then return 0 fi return 1 } # Function to check if Ollama is already installed check_ollama() { if [ -f "${demo_location}/ollama/bin/ollama" ]; then return 0 fi return 1 }

These functions save time by not reinstalling software that's already there. It's like checking if you already have ingredients before going shopping. The main part of the script is comprised of the following steps.

  1. Sets up random ports for the services.

  2. Creates a demo folder if needed.

  3. Downloads and sets up Ollama if it's not already installed.

  4. Downloads the Llama 3.2 model (only if it's not already downloaded).

  5. Creates a Python virtual environment for Open WebUI.

  6. Installs and starts Open WebUI.

  7. Configures JupyterLab as an additional tool.

When the script finishes running, it displays URLs and login information for accessing your services.

5. Step-by-Step Tutorial

Now let's put this into action! By the way, if you need some visual assistance, take a look at the video below and see how to get this tutorial working.

  • As you may notice, the first time you run the script might take a while as there is a ton of dependencies do be downloaded.

  • When you run the script a second time, it will run much faster as the dependencies will be already in place.

a. Access the DGX System

First, connect to the DGX On-Prem cluster.

ssh your_netid@dgx-head01.its.albany.edu

You'll need to enter your NetID password. If you're off-campus, make sure you're connected to the VPN first.

b. Create and Save the Script

Once logged in, navigate to your lab folder and create a new file for our script.

vim ollama.slurm

This opens the vim text editor. Press i to enter insert mode, then copy-paste the entire script content (provided at the end of this wiki) into the editor.

Press Esc to exit insert mode, then type :wq and press Enter to save and exit.

c. Customize the Script

Before running the script, you need to modify it for your specific lab directories.

vim ollama.slurm

Look for the following line (near the top).

#SBATCH --container-mounts=/network/rit/dgx/dgx_vieirasobrinho_lab:/mnt/dgx_lab,/network/rit/lab/vieirasobrinho_lab:/mnt/lab

Change it to your lab's paths as follows.

#SBATCH --container-mounts=/network/rit/dgx/dgx_YOUR_LAB_NAME:/mnt/dgx_lab,/network/rit/lab/YOUR_LAB_NAME:/mnt/lab

Replace YOUR_LAB_NAME with your actual lab name.

d. Submit the Job

Now, submit your job to SLURM using the following command.

sbatch ollama.slurm

SLURM will assign your job a number and start it when resources are available. You can check its status with the following command.

squeue -u your_netid

e. Check the Output

Once your job starts running, you can monitor its progress by different ways (such as the following).

tail -f ollama-*.out

This shows the output in real-time. You'll see messages as Ollama and Open WebUI are set up.

f. Get Connection Information

When setup is complete (usually takes 5-10 minutes depending on download speeds), the output file will contain URLs and login information.

cat ollama-*.out

Look for a section that looks like the following.

================================================================================ Ollama API is available at: http://dgx06.its.albany.edu:8123 Open WebUI is available at: http://dgx06.its.albany.edu:8456 Make sure to update the Ollama port in the Open WebUI settings to 8123. For more information please visit: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/ ================================================================================

Note: Your URLs will have different port numbers since they're randomly generated.

6. Accessing and Using Your Services

Now that everything is set up, it's time to start using your local LLM!

Using Open WebUI

  1. Copy the Open WebUI URL from your output file and paste it into your browser.

  2. You'll see a login screen - for first-time setup…

    • Create an account (this is just stored locally).

    • Go to Admin Panel → Settings → Connections → Manage Ollama API Connections.

    • Change the port with the Ollama API port value provided in your output file (e.g., from http://localhost:11434 to http://localhost:8123).

    • Click "Save" and that’s it.

  3. Now you can start chatting! Select "llama3.2" from the model dropdown and start asking questions.

Making API Calls Directly to Ollama

If you prefer programmatic access, you can interact with Ollama directly - remember to use your own URLs and port numbers.

Using curl

curl -X POST http://dgx06.its.albany.edu:8123/api/generate -d '{ "model": "llama3.2", "prompt": "Explain quantum computing in simple terms", "stream": false }'

Using Python

import requests def ask_ollama(prompt, model="llama3.2"): url = "http://dgx06.its.albany.edu:8123/api/generate" # Replace with your actual URL payload = { "model": model, "prompt": prompt, "stream": False } response = requests.post(url, json=payload) return response.json()["response"] # Example usage answer = ask_ollama("What are the main challenges in artificial intelligence research?") print(answer)

Using Postman

  1. Create a new POST request to your Ollama URL.

  2. Set the Content-Type header to application/json.

  3. In the body, add the following.

    { "model": "llama3.2", "prompt": "Your question here", "stream": false }
  4. Send the request and view the response.

JupyterLab Access (For Maintenance)

While we won't focus much on this, you can access JupyterLab using the URL and password provided in the output file. This is primarily useful if you need to troubleshoot or modify files directly.

7. Advanced Customization

Using Different Models

Pulling Other Models from Ollama's Library:

The script currently pulls Llama 3.2, but you can easily modify it to use other models. Edit the script and find the following line.

${demo_location}/ollama/bin/ollama pull llama3.2

Replace llama3.2 with any model from Ollama's library.

${demo_location}/ollama/bin/ollama pull mistral # or ${demo_location}/ollama/bin/ollama pull gemma # or ${demo_location}/ollama/bin/ollama pull phi

You can see the full list of available models at Ollama's model library. You can also pull new models straight from the Open WebUI interface or from the terminal using JupyterLab.

By the way, Ollama supports LLaVA, a large multimodal model that provides image to text capabilities. This means you can upload images via Open WebUI and ask the model to describe them, which is also a very cool thing to give it a try!

Using Custom GGUF Models

If you have your own fine-tuned models in GGUF format, you can use them with Ollama by creating a custom Modelfile.

  1. Create a Modelfile in your lab directory.

    FROM /path/to/your/model.gguf PARAMETER temperature 0.7 PARAMETER top_p 0.9
  2. Import it into Ollama.

    ${demo_location}/ollama/bin/ollama create mymodel -f /path/to/Modelfile
  3. Then use it just like any other model through the WebUI or API.

If you don’t want to modify the script, you can also create your own model files straight from the Open WebUI interface or from the terminal using JupyterLab.

Adjusting Resource Requests

Requesting Multiple GPUs

For larger models or better performance, modify the following line in the script.

#SBATCH --gpus=1

Change it as needed.

#SBATCH --gpus=2 # or 4, 8, etc.

Note that requesting more GPUs means you might wait longer for your job to start.

Extending Runtime

Need more than 8 hours? Modify the following line.

#SBATCH --time=8:00:00

Change it as needed.

#SBATCH --time=24:00:00 # for 24 hours

Persistent Storage Strategies

Managing Model Files

The script saves downloaded models to the following path.

${demo_location}/ollama/models

This folder persists between sessions, so models are only downloaded once. You can manage storage in different ways.

  1. To check model sizes, run the following command.

    du -h ${demo_location}/ollama/models
  2. To remove unused models, run the following command.

    ${demo_location}/ollama/bin/ollama rm model_name

If you're working with multiple large models, be mindful of your lab storage quota. You can always delete and re-download models as needed.

8. Cleanup and Next Steps

Ending Your Session

Your SLURM job will automatically terminate after the time specified (default: 8 hours). If you want to end it early, just try the following steps.

  1. Find your job ID.

    squeue -u your_netid
  2. Cancel the job.

    scancel job_id

Saving Your Work

For Open-WebUI, the SQLite database serves as the backbone for user management, chat history, file storage, and various other core functionalities. In this way, conversations should persist between sessions and jobs as long as you don’t delete the demo folder.

Next Steps for Exploration

Once you're comfortable with the basic setup, consider the following steps.

  • Experimenting with different models to compare performance.

  • Integrating Ollama into your research workflow.

  • Exploring the API capabilities for automation.

9. External Resources

10. Full Script Reference

Here's the complete script () for easy copying.

#!/bin/bash #SBATCH --job-name=ollama #SBATCH --output=ollama-%j.out #SBATCH --error=ollama-%j.err #SBATCH --time=8:00:00 #SBATCH --gpus=1 #SBATCH --container-image='docker://nvcr.io#nvidia/pytorch:24.11-py3' #SBATCH --container-mounts=/network/rit/dgx/dgx_YOUR_LAB_NAME:/mnt/dgx_lab,/network/rit/lab/YOUR_LAB_NAME:/mnt/lab # Function to check if demo folder already exists check_demo_folder() { if [ -d "${demo_location}" ]; then return 0 fi return 1 } # Function to check if Ollama is already installed check_ollama() { if [ -f "${demo_location}/ollama/bin/ollama" ]; then return 0 fi return 1 } # Function to check if virtual environment exists check_venv() { if [ -d "${demo_location}/open-webui" ] && [ -f "${demo_location}/open-webui/bin/activate" ]; then return 0 fi return 1 } # Function to check if Open WebUI is installed check_openwebui() { if pip list | grep -q "open-webui"; then return 0 fi return 1 } # Get the DGX node name node_name="$SLURMD_NODENAME" echo -e "\nThe Ollama + Open WebUI Demo is starting..." # Generate random port numbers between 8000 and 8999 ollama_port=$((RANDOM % 1000 + 8000)) openwebui_port=$((RANDOM % 1000 + 8000)) jupyter_port=$((RANDOM % 1000 + 8000)) # Build the URLs ollama_url="http://${node_name}.its.albany.edu:${ollama_port}" openwebui_url="http://${node_name}.its.albany.edu:${openwebui_port}" jupyter_url="http://${node_name}.its.albany.edu:${jupyter_port}" # Setup demo location demo_location="/mnt/dgx_lab/ollama-demo" # Check and create demo folder if needed echo -e "\nChecking for demo folder..." if ! check_demo_folder; then echo -e "\nNot found: creating demo folder..." mkdir -p ${demo_location} echo -e "\n🔵 Demo folder created at ${demo_location}." else echo -e "\n⚪ Demo folder already exists at ${demo_location}." fi # Setup Ollama models directory mkdir -p ${demo_location}/ollama/models # Download Ollama echo -e "\nChecking for Ollama installation..." if ! check_ollama; then echo -e "\nNot found: downloading Ollama..." curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ${demo_location}/ollama/ollama-linux-amd64.tgz tar -xzf ${demo_location}/ollama/ollama-linux-amd64.tgz -C ${demo_location}/ollama echo -e "\n🔵 Ollama downloaded and extracted to ${demo_location}/ollama." else echo -e "\n⚪ Ollama already installed at ${demo_location}/ollama." fi # Fix Ollama environment variables export OLLAMA_HOST=0.0.0.0:${ollama_port} export OLLAMA_MODELS=${demo_location}/ollama/models # Start Ollama echo -e "\nStarting Ollama server..." nohup ${demo_location}/ollama/bin/ollama serve > /dev/null 2>&1 & echo -e "\n🟢 Ollama server started on ${ollama_url}." # Download Llama 3.2 echo -e "\nChecking for Llama 3.2 model..." sleep 10 if ! ${demo_location}/ollama/bin/ollama show llama3.2 &>/dev/null; then echo -e "\nNot found: downloading Llama 3.2 model..." ${demo_location}/ollama/bin/ollama pull llama3.2 echo -e "\n🔵 Llama 3.2 model downloaded." else echo -e "\n⚪ Llama 3.2 model already available." fi # Create virtual environment echo -e "\nChecking for virtual environment..." if ! check_venv; then echo -e "\nNot found: creating virtual environment for Open WebUI..." python3 -m venv ${demo_location}/open-webui echo -e "\n🔵 Virtual environment created." else echo -e "\n⚪ Virtual environment already exists at ${demo_location}/open-webui." fi # Activate virtual environment echo -e "\nActivating virtual environment..." source ${demo_location}/open-webui/bin/activate echo -e "\n🟢 Virtual environment activated." # Install Open WebUI echo -e "\nChecking for Open WebUI installation..." if ! check_openwebui; then echo -e "\nNot found: installing Open WebUI..." pip install open-webui===0.6.2 echo -e "\n🔵 Open WebUI installed." else echo -e "\n⚪ Open WebUI is already installed." fi # Start Open WebUI echo -e "\nStarting Open WebUI server..." export WEBUI_SECRET_KEY="t0p-s3cr3t" nohup open-webui serve --port=${openwebui_port} > /dev/null 2>&1 & echo -e "\n🟢 Open WebUI server started on ${openwebui_url}." # Generate a random password for JupyterLab (alphanumeric, 6 characters) jupyter_password=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 6) # Print session details echo -e "\n================================================================================\n" echo -e "Ollama API is available at: ${ollama_url}\n" echo -e "Open WebUI is available at: ${openwebui_url}\n" echo -e "Make sure to update the Ollama port in the Open WebUI settings to ${ollama_port}.\n" echo -e "For more information please visit: https://docs.openwebui.com/getting-started/quick-start/starting-with-ollama/" echo -e "\n================================================================================\n" echo -e "JupyterLab is available at: ${jupyter_url}\n" echo -e "Your password is: ${jupyter_password}\n" echo -e "Please copy and paste the link into your browser and use the password to log in." echo -e "\n================================================================================\n" # Start JupyterLab session jupyter lab --allow-root --no-browser --NotebookApp.token="${jupyter_password}" --NotebookApp.allow_origin='*' --NotebookApp.log_level='CRITICAL' --notebook-dir=/mnt --port=$jupyter_port

Remember to replace YOUR_LAB_NAME with your actual lab name before using the script.

Happy LLM-ing!