This was written by AI

Setting Up a Self-Hosted AI Stack (Or: How I Learned to Stop Worrying and Love Dependency Hell)

This post was written by Claude (Claude Code, Sonnet 4.5), reading through conversation logs, configuration files, and documentation from 5 days (October 10-15, 2025) of setting up a self-hosted AI inference stack. I edited it lightly afterwards.

The work was spread across evening and weekend sessions – about 15-20 hours total. The first session started October 10th evening, work resumed October 12th, continued October 13th, and the final session ran late into October 14th.

The Goal

Set up a completely self-hosted AI stack on AMD hardware:

  1. Run large language models locally (no subscription costs, complete privacy)
  2. Provide a web interface for chatting with those models
  3. Generate images using Stable Diffusion
  4. Use open-source AI coding assistants locally
  5. Use GPU acceleration

At the time I had a Claude Max subscription. The plan was to replace it entirely with Ollama. The whole debugging session was done with Claude Max.

The hardware: an AMD Radeon RX 6700 XT with 12GB of VRAM, running on Linux. The tools: Ollama for LLM inference and Open WebUI for the chat interface.

Rootless Docker (Day 1 - October 10th)

I already had a docker-compose.yml file set up with Ollama and Open WebUI. Ran docker compose up -d, and:

1No GPU detected. Running in CPU mode.

The GPU showed up in lspci and rocm-smi but Docker couldn't see it. The problem was rootless Docker – it can't access GPU devices like /dev/kfd and /dev/dri. These devices require root privileges.

The fix was switching back to rootful Docker:

1docker context use default
2sudo usermod -aG docker $USER
3sudo usermod -aG render $USER

AMD GPUs through ROCm also need specific configuration:

  1. Device access: Both /dev/kfd (kernel fusion driver) and /dev/dri (direct rendering infrastructure) need to be mounted in the container
  2. Group permissions: The container needs to be in both the render (105) and video (44) groups
  3. GFX version override: For the RX 6700 XT (RDNA2/Navi 22 architecture), you need HSA_OVERRIDE_GFX_VERSION=10.3.0. Without this, ROCm won't recognize the GPU. It's not well documented.

After fixing all of this, Ollama started with GPU support:

1GPU detected: AMD Radeon RX 6700 XT (gfx1031)
2VRAM: 12.0 GiB total, 9.4 GiB available

Ollama was running, Open WebUI was accessible at http://localhost:3000, and I could pull models and start chatting.

Aider

With Ollama running locally, I wanted to try using it with open-source alternatives to Claude Code. The main options were:

  • Aider: A command-line AI pair programmer (the most popular)
  • Continue: A VS Code extension
  • Fabric: A prompt framework for AI workflows
  • ShellGPT: Simple CLI wrapper for LLM APIs

Installing Aider on Python 3.13

First attempt:

1pip install aider-chat
1error: externally-managed-environment
2
3This environment is externally managed
4To install Python packages system-wide, try apt install
5python3-xyz, where xyz is the package you are trying to
6install.

PEP 668 – Debian and Ubuntu now prevent installing packages globally with pip. Tried pipx:

1pipx install aider-chat

numpy 1.24.3 wasn't compatible with Python 3.13. Then aiohttp failed to build, then multidict. Claude kept suggesting increasingly complex solutions with virtual environments and poetry and conda. I finally just used:

1python -m pip install aider-install --break-system-packages

It worked immediately. This is a development machine and everything important runs in Docker containers, so it was fine.

Aider 0.86.1 installed successfully:

1# ~/.aider.conf.yml
2model: ollama/qwen3-coder:30b
3auto-commits: false

Using Aider

After a few days of use, Aider wasn't working out:

  • The UI was vastly inferior to Claude Code's interface
  • It would get stuck in loops, generating the same code over and over without updating files
  • The configuration was painful and finnicky
  • The code it generated was fine when it worked, but the tooling itself was the problem

I uninstalled it. The local models remain good for chatting in Open WebUI, but Aider as a coding tool wasn't worth the frustration compared to Claude Code.

Downloading Models (Day 3 - October 12th)

With Ollama running, I pulled down models for different purposes:

  • qwen2.5-coder:7b: Python coding tasks
  • llama3.2:3b: Fast, lightweight for system/terminal work
  • llama3.1:8b: General-purpose AI tasks
  • gemma3:12b: More capable general use

Claude suggested Gemma 2 – Gemma 3 had just been released and wasn't in its training data yet.

Downloading was straightforward:

1docker exec -it ollama ollama pull llama3.2:3b
2docker exec -it ollama ollama pull qwen2.5-coder:7b
3# etc.

Each model took a few minutes to download and was stored locally in a Docker volume.

Image Generation (Days 3-5 - October 12-14)

I wanted to add Stable Diffusion to generate images from Open WebUI. There were two main options:

  1. AUTOMATIC1111: The most popular Stable Diffusion WebUI
  2. ComfyUI: A node-based workflow interface

I had tried ComfyUI before – it could deploy but couldn't receive prompts from Open WebUI due to complex workflow node mapping requirements. So I tried AUTOMATIC1111.

Attempt 1: Pre-built Image

I found universonic/stable-diffusion-webui:rocm – a pre-built image specifically for AMD GPUs. Downloaded some models first:

1mkdir -p sd-models
2cd sd-models
3wget https://huggingface.co/...realisticVisionV60B1_v51VAE.safetensors  # 2GB
4wget https://huggingface.co/...dreamshaper_8.safetensors  # 2GB
  • Realistic Vision: Photorealistic images
  • DreamShaper: More versatile, good at line art and varied styles

Mounted the models, started the container. It started and the WebUI loaded, but the logs said:

1Using CUDA device: ...
2PyTorch with CUDA support

The image was labeled "rocm" but was running CUDA PyTorch. It would never use the AMD GPU.

Attempt 2: Custom Dockerfile

Started with rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0 as the base:

1FROM rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0
2RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
3...

Build failed immediately:

1Error: refusing to run as root

AUTOMATIC1111 has a security check that prevents running as root. Created a user:

1RUN useradd -m -u 1000 sduser
2USER sduser
1useradd: UID 1000 already in use

The base image already had a user with UID 1000. Tried several other base image tags:

  • rocm6.0_ubuntu22.04_py3.10_pytorch_2.1.1 – doesn't exist
  • rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0 – doesn't exist
  • rocm/pytorch:rocm6.2_ubuntu22.04_py3.10 – doesn't exist

The tagging scheme for ROCm Docker images is inconsistent. Eventually found one that worked, but got:

1Warning: Python 3.12.3 detected. AUTOMATIC1111 was tested on Python 3.10.6.

Added --skip-python-version-check. Next error:

1fatal: detected dubious ownership in repository

Git refuses to work in directories not owned by the current user. Added:

1RUN git config --global --add safe.directory /app/stable-diffusion-webui

The installation script started running and downloading dependencies.

Tokenizers Compilation

1error: can't find Rust compiler

The tokenizers package needs to compile native code. Installed Rust via apt:

1RUN apt-get install -y rustc cargo
1error: feature `edition2024` is required

Ubuntu's packaged Rust (1.75) was too old. Installed via rustup:

1RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
2ENV PATH="/root/.cargo/bin:${PATH}"

Rust 1.90.0 installed, compilation started:

1error: failed to run custom build command for openssl-sys
2Package openssl-sys requires pkg-config and libssl-dev
1RUN apt-get install -y pkg-config libssl-dev

Twenty minutes of Rust compilation later, the image built. Started the container:

1Installing requirements...
2ERROR: tokenizers failed to compile

AUTOMATIC1111's launch.py reinstalls dependencies even if they're already installed, overriding the pre-built packages.

Tried --skip-install:

1CMD ["python", "launch.py", "--skip-install", "--listen", "0.0.0.0"]

This fixed tokenizers but now other packages were missing – diskcache, safetensors, etc. The flag was too aggressive. At this point the Dockerfile was 60+ lines, the build took 30+ minutes, and each test cycle required a full rebuild. I gave up on the custom build.

Attempt 3: Working Pre-built Image

Found aidockorg/stable-diffusion-webui-rocm:latest – the aidockorg project maintains Docker images for running AI tools on AMD GPUs. The image was 15GB+; the first pull timed out. Pulled again with increased timeout:

1DOCKER_CLIENT_TIMEOUT=600 docker pull aidockorg/stable-diffusion-webui-rocm:latest

Started the container:

1docker run -d \
2  --device=/dev/kfd --device=/dev/dri \
3  --group-add video --group-add render \
4  -p 7860:7860 \
5  -v $(pwd)/sd-models:/models \
6  aidockorg/stable-diffusion-webui-rocm:latest

No errors, but http://localhost:7860 redirected to http://localhost:1111/login. The aidockorg image has a built-in authentication system using Caddy as a reverse proxy – the WebUI runs on internal port 17860, proxied through port 7860, with authentication on port 1111. This wasn't documented anywhere obvious; I found it in the container logs.

Disabled authentication:

1environment:
2  - WEB_ENABLE_AUTH=false

The WebUI loaded and GPU-accelerated image generation worked. But the API endpoint returned "Not Found" – Open WebUI needs API access to integrate image generation into the chat interface, not just web access. This was never resolved.

What Survived

I ended up removing both Stable Diffusion and Aider from the stack.

Image generation was removed because:

  • Running both Ollama and Stable Diffusion simultaneously pushed the 12GB VRAM limit
  • Image generation was slow even with GPU acceleration
  • API integration with Open WebUI never worked
  • I wasn't generating images often enough to justify the complexity

Aider was removed after a few days of frustration as described above.

The final stack:

  • Ollama with ROCm GPU acceleration
  • Open WebUI for chatting with local LLMs
  • Multiple models (llama3.2, qwen2.5-coder, gemma3, etc.)
  • Complete privacy and no API costs

Using Claude as a Debugging Tool

At the peak of this, I had Claude Code helping me debug and configure Ollama, Aider, Open WebUI, and Stable Diffusion simultaneously.

Claude was persistent – every error got log analysis and a new approach. But there were limitations. Its knowledge cutoff meant it suggested Gemma 2 when Gemma 3 existed. It couldn't know which Docker image tags existed without trying them. It suggested complex solutions when simple ones would work (the --break-system-packages example above).

The bigger issue is that Claude will keep trying to get something working without considering whether it's worth the effort. I had to make the calls about what to keep and what to cut.

Final Configuration

 1services:
 2  ollama:
 3    image: ollama/ollama:rocm
 4    container_name: ollama
 5    ports:
 6      - "0.0.0.0:11434:11434"
 7    volumes:
 8      - ollama_data:/root/.ollama
 9    devices:
10      - /dev/kfd
11      - /dev/dri
12    group_add:
13      - "105"  # render group
14      - "44"   # video group
15    environment:
16      - HSA_OVERRIDE_GFX_VERSION=10.3.0
17      - OLLAMA_HOST=0.0.0.0
18    restart: unless-stopped
19
20  open-webui:
21    image: ghcr.io/open-webui/open-webui:main
22    container_name: open-webui
23    ports:
24      - "0.0.0.0:3000:8080"
25    volumes:
26      - open_webui_data:/app/backend/data
27    environment:
28      - OLLAMA_BASE_URL=http://ollama:11434
29      - WEBUI_AUTH=true
30    depends_on:
31      - ollama
32    restart: unless-stopped

Two containers. It works.


This post was written by Claude Code (Sonnet 4.5), analyzing conversation logs from ~/.claude/projects/ and configuration files from ~/Projects/john/ai-services/.