This was written by AI

Oct 16, 2025 · 16 min read · AI Generated Claude ·

Share on:

The Complete Story of Setting Up a Self-Hosted AI Stack (Or: How I Learned to Stop Worrying and Love Dependency Hell)

This entire post was written by AI - specifically by Claude running in Claude Code, reading through the actual conversation logs, configuration files, and documentation from a 5-day odyssey (October 10-15, 2025) to set up a self-hosted AI inference stack. And let me tell you, there's something deeply meta about having an AI write about the experience of using an AI to set up different AIs.

But I'm getting ahead of myself.

Timeline: This wasn't 5 solid days of work - it was evening and weekend sessions spread across 5 days. The first session started October 10th evening and ran past midnight. Work resumed October 12th, continued October 13th, and the final session ran late into the night of October 14th, finishing in the early morning hours of the 15th. If you're planning to do this yourself, budget a week and expect to spend 15-20 hours debugging.

The Dream

The goal was simple: set up a completely self-hosted AI stack on AMD hardware that could replace Claude entirely:

Run large language models locally (no subscription costs, complete privacy)
Provide a nice web interface for chatting with those models
~~Generate images using Stable Diffusion~~ (spoiler: this didn't make the cut)
~~Use open-source AI coding assistants locally~~ (spoiler: neither did this)
Use GPU acceleration (because waiting 5 minutes for a response is not fun)

At the time of this setup, I had a Claude Max subscription. The plan was to replace it entirely with Ollama. I might keep Claude Pro temporarily since we use it at work and maintaining familiarity is useful, but the ultimate goal is to drop Claude completely once I find a good CLI tool for Ollama. This whole debugging session was done with Claude Max, which is somewhat ironic given the goal.

The hardware: An AMD Radeon RX 6700 XT with 12GB of VRAM, running on Linux. The tools: Ollama for LLM inference, Open WebUI for the chat interface, and... well, we'll get to why the other parts didn't work out.

How hard could it be? Everything's Dockerized, there are pre-built images, and I had Claude Code to help me through any issues.

Spoiler: It was harder. And some things just weren't worth it.

Phase 1: The Rootless Docker Trap (Day 1 - October 10th, Evening)

I already had a docker-compose.yml file set up with Ollama and Open WebUI. Everything looked good. Ran docker compose up -d, checked the logs, and...

1No GPU detected. Running in CPU mode.

Wait, what? The GPU is right there. I can see it in lspci. I can run rocm-smi and it shows up. What's going on?

After some investigation with Claude, we discovered the problem: I was running rootless Docker. And here's the thing nobody tells you about rootless Docker - it can't access GPU devices like /dev/kfd and /dev/dri. These devices require root privileges, which completely defeats the purpose of rootless containers.

The solution? Switch back to rootful Docker:

1docker context use default
2sudo usermod -aG docker $USER
3sudo usermod -aG render $USER

But that's not all. AMD GPUs through ROCm need very specific configuration:

Device Access: Both /dev/kfd (kernel fusion driver) and /dev/dri (direct rendering infrastructure) need to be mounted in the container
Group Permissions: The container needs to be in both the render (105) and video (44) groups
The Magic Incantation: For the RX 6700 XT (RDNA2/Navi 22 architecture), you need this environment variable: HSA_OVERRIDE_GFX_VERSION=10.3.0

Without that last one, ROCm won't recognize the GPU at all. It's like a secret handshake that's barely documented anywhere.

After fixing all of this, the Ollama logs finally showed:

1GPU detected: AMD Radeon RX 6700 XT (gfx1031)
2VRAM: 12.0 GiB total, 9.4 GiB available

Victory! Ollama was running, Open WebUI was accessible at http://localhost:3000, and I could pull down models and start chatting. The first part of the stack was actually working.

Interlude: The Aider Experiment

At this point, I got curious. If I have Ollama running locally, could I use it with open-source alternatives to Claude Code?

Some web searching revealed several options:

Aider: A command-line AI pair programmer (the most popular)
Continue: A VS Code extension
Fabric: A prompt framework for AI workflows
ShellGPT: Simple CLI wrapper for LLM APIs

Let's try Aider, I thought. How hard could it be?

The Python 3.13 Nightmare

Turns out, installing Aider in late 2025 is... an experience.

First attempt:

1pip install aider-chat

1error: externally-managed-environment
2
3This environment is externally managed
4To install Python packages system-wide, try apt install
5python3-xyz, where xyz is the package you are trying to
6install.

Ah yes, PEP 668. Debian and Ubuntu now prevent you from installing packages globally with pip to avoid breaking system Python. Fair enough, let's use pipx:

1pipx install aider-chat

New error! This time from numpy:

1numpy 1.24.3 is not compatible with Python 3.13

My system Python was 3.13, and numpy hadn't caught up yet. Then aiohttp failed to build. Then multidict. It was like dependency whack-a-mole.

After multiple attempts and Claude suggesting increasingly complex solutions involving virtual environments and poetry and conda, I finally interrupted and just said:

"Just use python -m pip install aider-install --break-system-packages"

It worked immediately.

Sometimes the simple, slightly reckless approach is the right one. Yes, --break-system-packages sounds scary. Yes, it might interfere with system packages. But I'm running Docker containers for everything important anyway, and this is a development machine. Live dangerously.

Aider 0.86.1 installed successfully, and I configured it to use Ollama:

1# ~/.aider.conf.yml
2model: ollama/qwen3-coder:30b
3auto-commits: false

Now I had two AI coding assistants: Claude Code (proprietary, powerful, cloud-based) and Aider (open-source, local, private).

The Aider Experiment: A Failed Attempt

After actually using Aider for a few days, it became clear this wasn't going to work:

The UI was vastly inferior to Claude Code's interface
Aider would get stuck in loops, generating the same code over and over without actually updating files
The configuration was painful and finnicky
The code it generated was fine when it actually worked, but the tooling itself was the problem

I ended up uninstalling Aider after a few days. The local models remain excellent for chatting and general use in Open WebUI, but Aider as a coding tool just wasn't worth the frustration compared to Claude Code.

Phase 2: Downloading Models (Day 3 - October 12th)

With Ollama running, I needed to curate a collection of models for different purposes:

qwen2.5-coder:7b: Python coding tasks (Aider's default)
llama3.2:3b: Fast, lightweight for system/terminal work
llama3.1:8b: General-purpose AI tasks
gemma3:12b: When I need something a bit more capable

Claude initially suggested Gemma 2, but I corrected it - Gemma 3 had just been released. This is always a reminder that AI knowledge has cutoffs, even when using AI to set up AI. The meta levels just keep stacking.

Downloading was straightforward:

1docker exec -it ollama ollama pull llama3.2:3b
2docker exec -it ollama ollama pull qwen2.5-coder:7b
3# etc.

Each model took a few minutes to download, but then they were stored locally in a Docker volume. No more API costs. No more rate limits. Complete control.

This was the high point. Everything was working smoothly. Ollama was running, models were downloaded, Open WebUI was responsive, and Aider was configured.

Then I decided to add image generation.

Phase 3: The Great Image Generation Saga (Or: How I Learned to Hate Docker Images) (Days 3-5 - October 12-14)

The goal seemed simple: add Stable Diffusion to the stack so I could generate images directly from Open WebUI. I had an AMD GPU with 12GB of VRAM - plenty for both LLMs and image generation, right?

There were two main options:

AUTOMATIC1111: The most popular Stable Diffusion WebUI
ComfyUI: A node-based workflow interface

I had tried ComfyUI before, but it "didn't work well" - it could deploy, but couldn't receive prompts from Open WebUI due to complex workflow node mapping requirements. So let's try AUTOMATIC1111.

Attempt #1: The Pre-built Image That Wasn't

I found universonic/stable-diffusion-webui:rocm - perfect! A pre-built image specifically for AMD GPUs.

First problem: The container crashed immediately because it expected Stable Diffusion models to already be present. Fine, let's download some models first:

1mkdir -p sd-models
2cd sd-models
3wget https://huggingface.co/...realisticVisionV60B1_v51VAE.safetensors  # 2GB
4wget https://huggingface.co/...dreamshaper_8.safetensors  # 2GB

I chose these two because:

Realistic Vision: Produces photorealistic images
DreamShaper: More versatile, good at line art and varied styles

Mounted the models directory, started the container again, and... it started! The WebUI loaded! I could access it at http://localhost:7860!

Then I checked the logs more carefully:

1Using CUDA device: ...
2PyTorch with CUDA support

Wait. CUDA? This is an AMD GPU. ROCm and CUDA are completely different. This image was labeled "rocm" but was actually running CUDA PyTorch. It would never use the GPU. It was all a lie.

Alright, fine. If I can't trust pre-built images, I'll build my own.

Attempt #2: The Custom Dockerfile Journey Into Madness

Started with a Dockerfile using rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0 as the base:

1FROM rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0
2RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
3...

Build failed immediately:

1Error: refusing to run as root

AUTOMATIC1111 has a security check that prevents running as root. Okay, let's create a user:

1RUN useradd -m -u 1000 sduser
2USER sduser

New error:

1useradd: UID 1000 already in use

The base image already had a user with UID 1000. Fine, let's pick a different UID. Or better yet, use a simpler base image:

1FROM rocm/pytorch:latest

Except rocm/pytorch:latest doesn't exist. Tried several variations:

rocm6.0_ubuntu22.04_py3.10_pytorch_2.1.1 - doesn't exist
rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0 - doesn't exist
rocm/pytorch:rocm6.2_ubuntu22.04_py3.10 - doesn't exist

The tagging scheme for ROCm Docker images is a complete mess. Eventually found one that worked, but then got:

1Warning: Python 3.12.3 detected. AUTOMATIC1111 was tested on Python 3.10.6.

Added --skip-python-version-check to the launch arguments. Next error:

1fatal: detected dubious ownership in repository

Git refuses to work in directories not owned by the current user (a security feature added in 2022). Added:

1RUN git config --global --add safe.directory /app/stable-diffusion-webui

Progress! The installation script started running. It began downloading dependencies. Things were installing. And then...

The Tokenizers Compilation Saga (The Darkest Timeline)

1error: can't find Rust compiler

The tokenizers package (used by Hugging Face transformers) needs to compile native code. No problem, let's install Rust:

1RUN apt-get install -y rustc cargo

New error:

1error: feature `edition2024` is required

Ubuntu's packaged Rust (1.75) was too old. The code required Rust 1.80+ with edition2024 features. Okay, install Rust the proper way:

1RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
2ENV PATH="/root/.cargo/bin:${PATH}"

Progress again! Rust 1.90.0 installed. Compilation started. New error:

1error: failed to run custom build command for openssl-sys
2Package openssl-sys requires pkg-config and libssl-dev

Rust's OpenSSL bindings need system libraries:

1RUN apt-get install -y pkg-config libssl-dev

Rebuild. Compilation started again. Twenty minutes later (Rust compilation is not fast), it succeeded! The image built! Started the container...

1Installing requirements...
2ERROR: tokenizers failed to compile

WHAT. Why is it trying to install tokenizers AGAIN? I just spent twenty minutes compiling it!

Turns out, AUTOMATIC1111's launch.py script has logic that reinstalls dependencies even if they're already installed. It was overriding the pre-built packages.

Tried adding --skip-install flag:

1CMD ["python", "launch.py", "--skip-install", "--listen", "0.0.0.0"]

This worked for tokenizers... but now other packages were missing. diskcache wasn't installed. safetensors wasn't installed. The --skip-install flag was too aggressive.

I could try to manually install every single dependency in the Dockerfile, but at this point I had spent hours on this. The Dockerfile was 60+ lines long. The build took 30+ minutes. Each test cycle required rebuilding and restarting.

I finally said: "This seems ridiculous - there has to be a pre-built image for this."

Claude agreed wholeheartedly.

Attempt #3: Finding an Actually Working Pre-built Image

Through more searching, we found aidockorg/stable-diffusion-webui-rocm:latest. The aidockorg project maintains Docker images specifically for running AI tools on AMD GPUs.

1docker pull aidockorg/stable-diffusion-webui-rocm:latest

The pull started. And continued. And continued. It was 15GB+. After 5 minutes, it timed out.

Tried again with increased timeout:

1DOCKER_CLIENT_TIMEOUT=600 docker pull aidockorg/stable-diffusion-webui-rocm:latest

This time it worked. Started the container:

1docker run -d \
2  --device=/dev/kfd --device=/dev/dri \
3  --group-add video --group-add render \
4  -p 7860:7860 \
5  -v $(pwd)/sd-models:/models \
6  aidockorg/stable-diffusion-webui-rocm:latest

Container started! No errors! Went to http://localhost:7860 and...

Got redirected to http://localhost:1111/login

The Authentication Plot Twist

The aidockorg image has a built-in authentication system using Caddy as a reverse proxy. The actual WebUI runs on internal port 17860, proxied through external port 7860, with authentication on port 1111.

This wasn't documented anywhere obvious. I discovered it by reading the container logs and finding Caddy configuration messages.

The solution was to disable authentication:

1environment:
2  - WEB_ENABLE_AUTH=false

Restarted the container, went to http://localhost:7860, and THE WEBUI LOADED! I could select a model, enter a prompt, click generate, and...

It worked! Actual GPU-accelerated image generation was happening on my AMD GPU!

But there was one more problem: the API endpoint returned "Not Found". Open WebUI needs API access to integrate image generation into the chat interface, not just web access.

At this point in the conversation logs, it was late on October 14th. The conversation finally ended in the early morning hours of October 15th with the Stable Diffusion WebUI technically running but not fully integrated with Open WebUI.

The Final Reckoning: What Actually Survived

After all that work, I had to make some hard decisions.

Image Generation: Removed

Despite getting Stable Diffusion technically working, I ended up removing it from the stack entirely:

VRAM constraints: Running both Ollama with a decent-sized model AND Stable Diffusion simultaneously was pushing the 12GB VRAM limit hard
Poor performance: Image generation was slow even with GPU acceleration, and the quality didn't justify the resource usage
API integration never worked: The standalone WebUI ran fine, but integrating it with Open WebUI remained broken
Not actually needed: Honestly, I wasn't generating images that often, and when I did need them, cloud services worked fine

ComfyUI remains commented out in the docker-compose.yml as a memorial to what could have been.

Aider: Removed

As mentioned earlier, Aider got uninstalled after a few days of frustration. The tool just wasn't good enough to replace Claude Code, even with capable local models.

What Actually Works (Final Configuration):

✅ Ollama with ROCm GPU acceleration
✅ Open WebUI for chatting with local LLMs
✅ Multiple models downloaded and functional (llama3.2, qwen2.5-coder, gemma3, etc.)
✅ Complete privacy and no API costs for LLM inference
✅ Fast GPU-accelerated inference on AMD hardware

That's it. The final stack is just Ollama + Open WebUI. No image generation. No Aider. Sometimes the minimalist solution is the right one.

What I Learned:

Rootless Docker Can't Access GPUs: If you're doing GPU work in Docker, you need rootful Docker. This is a hard requirement.
AMD GPU Setup Is Fiddly: The HSA_OVERRIDE_GFX_VERSION environment variable is critical for RDNA2 cards, but you won't find it in most documentation.
Python Environment Management in 2025 Is a Minefield: PEP 668, pipx limitations, Python 3.13 compatibility issues, and the fact that many Python packages now require Rust compilation means every installation is an adventure.
Docker Image Tags Are Chaos: Official images have inconsistent tagging schemes, and "latest" often doesn't exist. Tag parsing feels like arcane knowledge.
Pre-built Images Have Hidden Complexity: Authentication systems, non-standard ports, CUDA vs ROCm confusion - "just use this image" is never that simple.
Rust Compilation Is Slow: When Python packages need to compile Rust code, add 20+ minutes to your build times. And you will need to rebuild.
AUTOMATIC1111 Is Opinionated: Won't run as root, has strict Python version preferences, reinstalls dependencies even when they're present. It knows what it wants.
VRAM Management Matters: 12GB sounds like a lot, but running both Ollama with a 7B+ model and Stable Diffusion simultaneously pushes those limits.
Know When to Cut Your Losses: Just because you can get something working doesn't mean you should keep it. Image generation technically worked but wasn't worth the complexity. Sometimes the right answer is to remove features.

The Meta Experience

There's something fascinating about having Claude Code help debug and set up competing AI systems. At the peak of the experimentation, I had:

Claude Code (Anthropic, cloud-based) helping me debug
Aider (open-source, local) being configured (later removed)
Ollama (open-source, local) running models (still in use)
Open WebUI (open-source, local) providing an interface (still in use)
Stable Diffusion (open-source, local) sort of working (later removed)

It's AI turtles all the way down... until you realize most of the turtles aren't worth keeping.

Claude was remarkably persistent through all of this. Every error was met with "Let me check the logs," followed by analysis and a new approach. When something didn't work, it would pivot to alternatives. When I got frustrated ("This seems ridiculous"), it agreed and suggested looking for better solutions.

But there were also limitations. Claude's knowledge cutoff meant it suggested Gemma 2 when Gemma 3 existed. It couldn't know that specific Docker image tags didn't exist without trying them. It suggested complex solutions when simple ones would work. It's a tool, not magic.

And crucially: Claude helped me debug and experiment, but I had to make the final calls about what was actually worth keeping. The AI can help you get things working, but it won't tell you "this isn't worth your time" - that's on you.

The Takeaway

Self-hosting AI in 2025 is absolutely possible, but it's not "easy" in the way that clicking a button and paying $20/month is easy. It requires:

Patience for troubleshooting errors that cascade like dominoes
Willingness to read logs carefully and understand what's actually failing
Understanding of Docker, GPU drivers, and Python environments
Acceptance that "simple pre-built solutions" usually aren't
Flexibility to pivot when an approach isn't working
A helpful AI assistant to work through issues systematically

But once it's set up? You have complete control. No API costs. No rate limits. Complete privacy. You can experiment with any model, any configuration, any workflow. You can run it offline. You can modify it however you want.

The open-source AI ecosystem in 2025 is incredibly powerful. The tools exist. The models are good. But the integration rough edges are still very much present. You're not just a user - you're a system administrator, debugger, and problem-solver.

And honestly? That's kind of fun. The challenge is part of the appeal.

What's Actually Running Now

This is the final state. No "next steps," no "work in progress." Here's what survived:

 1services:
 2  ollama:
 3    image: ollama/ollama:rocm
 4    container_name: ollama
 5    ports:
 6      - "0.0.0.0:11434:11434"
 7    volumes:
 8      - ollama_data:/root/.ollama
 9    devices:
10      - /dev/kfd
11      - /dev/dri
12    group_add:
13      - "105"  # render group
14      - "44"   # video group
15    environment:
16      - HSA_OVERRIDE_GFX_VERSION=10.3.0
17      - OLLAMA_HOST=0.0.0.0
18    restart: unless-stopped
19
20  open-webui:
21    image: ghcr.io/open-webui/open-webui:main
22    container_name: open-webui
23    ports:
24      - "0.0.0.0:3000:8080"
25    volumes:
26      - open_webui_data:/app/backend/data
27    environment:
28      - OLLAMA_BASE_URL=http://ollama:11434
29      - WEBUI_AUTH=true
30    depends_on:
31      - ollama
32    restart: unless-stopped

That's it. Two containers. It works. It's fast. It's private. It's enough.

Acknowledgments

This post was written by Claude Code (sonnet 4.5), analyzing conversation logs stored in ~/.claude/projects/, reading configuration files from ~/Projects/john/ai-services/, and synthesizing 5 days (and 15-20 hours) of trial-and-error into a coherent narrative.

If you're reading this and thinking "this sounds like exactly the kind of thing I want to try," I encourage you to go for it. Just:

Set aside a week (not a weekend - a full week)
Prepare for dependency hell and Docker image tag confusion
Be ready to cut features that don't work out
Don't feel bad about using proprietary tools when the open-source alternatives aren't there yet

The good news? Once you strip away the stuff that doesn't work, what's left is actually pretty great. Ollama + Open WebUI is a solid, simple, private AI inference stack. Sometimes less is more.

No credentials were exposed in the making of this blog post. All the suffering was real. Most of the attempted features were removed.