This was written by AI
Setting Up a Self-Hosted AI Stack (Or: How I Learned to Stop Worrying and Love Dependency Hell)
This post was written by Claude (Claude Code, Sonnet 4.5), reading through conversation logs, configuration files, and documentation from 5 days (October 10-15, 2025) of setting up a self-hosted AI inference stack. I edited it lightly afterwards.
The work was spread across evening and weekend sessions – about 15-20 hours total. The first session started October 10th evening, work resumed October 12th, continued October 13th, and the final session ran late into October 14th.
The Goal
Set up a completely self-hosted AI stack on AMD hardware:
- Run large language models locally (no subscription costs, complete privacy)
- Provide a web interface for chatting with those models
Generate images using Stable DiffusionUse open-source AI coding assistants locally- Use GPU acceleration
At the time I had a Claude Max subscription. The plan was to replace it entirely with Ollama. The whole debugging session was done with Claude Max.
The hardware: an AMD Radeon RX 6700 XT with 12GB of VRAM, running on Linux. The tools: Ollama for LLM inference and Open WebUI for the chat interface.
Rootless Docker (Day 1 - October 10th)
I already had a docker-compose.yml file set up with Ollama and Open WebUI. Ran docker compose up -d, and:
1No GPU detected. Running in CPU mode.
The GPU showed up in lspci and rocm-smi but Docker couldn't see it. The problem was rootless Docker – it can't access GPU devices like /dev/kfd and /dev/dri. These devices require root privileges.
The fix was switching back to rootful Docker:
1docker context use default
2sudo usermod -aG docker $USER
3sudo usermod -aG render $USER
AMD GPUs through ROCm also need specific configuration:
- Device access: Both
/dev/kfd(kernel fusion driver) and/dev/dri(direct rendering infrastructure) need to be mounted in the container - Group permissions: The container needs to be in both the
render(105) andvideo(44) groups - GFX version override: For the RX 6700 XT (RDNA2/Navi 22 architecture), you need
HSA_OVERRIDE_GFX_VERSION=10.3.0. Without this, ROCm won't recognize the GPU. It's not well documented.
After fixing all of this, Ollama started with GPU support:
1GPU detected: AMD Radeon RX 6700 XT (gfx1031)
2VRAM: 12.0 GiB total, 9.4 GiB available
Ollama was running, Open WebUI was accessible at http://localhost:3000, and I could pull models and start chatting.
Aider
With Ollama running locally, I wanted to try using it with open-source alternatives to Claude Code. The main options were:
- Aider: A command-line AI pair programmer (the most popular)
- Continue: A VS Code extension
- Fabric: A prompt framework for AI workflows
- ShellGPT: Simple CLI wrapper for LLM APIs
Installing Aider on Python 3.13
First attempt:
1pip install aider-chat
1error: externally-managed-environment
2
3This environment is externally managed
4To install Python packages system-wide, try apt install
5python3-xyz, where xyz is the package you are trying to
6install.
PEP 668 – Debian and Ubuntu now prevent installing packages globally with pip. Tried pipx:
1pipx install aider-chat
numpy 1.24.3 wasn't compatible with Python 3.13. Then aiohttp failed to build, then multidict. Claude kept suggesting increasingly complex solutions with virtual environments and poetry and conda. I finally just used:
1python -m pip install aider-install --break-system-packages
It worked immediately. This is a development machine and everything important runs in Docker containers, so it was fine.
Aider 0.86.1 installed successfully:
1# ~/.aider.conf.yml
2model: ollama/qwen3-coder:30b
3auto-commits: false
Using Aider
After a few days of use, Aider wasn't working out:
- The UI was vastly inferior to Claude Code's interface
- It would get stuck in loops, generating the same code over and over without updating files
- The configuration was painful and finnicky
- The code it generated was fine when it worked, but the tooling itself was the problem
I uninstalled it. The local models remain good for chatting in Open WebUI, but Aider as a coding tool wasn't worth the frustration compared to Claude Code.
Downloading Models (Day 3 - October 12th)
With Ollama running, I pulled down models for different purposes:
- qwen2.5-coder:7b: Python coding tasks
- llama3.2:3b: Fast, lightweight for system/terminal work
- llama3.1:8b: General-purpose AI tasks
- gemma3:12b: More capable general use
Claude suggested Gemma 2 – Gemma 3 had just been released and wasn't in its training data yet.
Downloading was straightforward:
1docker exec -it ollama ollama pull llama3.2:3b
2docker exec -it ollama ollama pull qwen2.5-coder:7b
3# etc.
Each model took a few minutes to download and was stored locally in a Docker volume.
Image Generation (Days 3-5 - October 12-14)
I wanted to add Stable Diffusion to generate images from Open WebUI. There were two main options:
- AUTOMATIC1111: The most popular Stable Diffusion WebUI
- ComfyUI: A node-based workflow interface
I had tried ComfyUI before – it could deploy but couldn't receive prompts from Open WebUI due to complex workflow node mapping requirements. So I tried AUTOMATIC1111.
Attempt 1: Pre-built Image
I found universonic/stable-diffusion-webui:rocm – a pre-built image specifically for AMD GPUs. Downloaded some models first:
1mkdir -p sd-models
2cd sd-models
3wget https://huggingface.co/...realisticVisionV60B1_v51VAE.safetensors # 2GB
4wget https://huggingface.co/...dreamshaper_8.safetensors # 2GB
- Realistic Vision: Photorealistic images
- DreamShaper: More versatile, good at line art and varied styles
Mounted the models, started the container. It started and the WebUI loaded, but the logs said:
1Using CUDA device: ...
2PyTorch with CUDA support
The image was labeled "rocm" but was running CUDA PyTorch. It would never use the AMD GPU.
Attempt 2: Custom Dockerfile
Started with rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0 as the base:
1FROM rocm/pytorch:rocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0
2RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
3...
Build failed immediately:
1Error: refusing to run as root
AUTOMATIC1111 has a security check that prevents running as root. Created a user:
1RUN useradd -m -u 1000 sduser
2USER sduser
1useradd: UID 1000 already in use
The base image already had a user with UID 1000. Tried several other base image tags:
rocm6.0_ubuntu22.04_py3.10_pytorch_2.1.1– doesn't existrocm6.2_ubuntu22.04_py3.10_pytorch_2.3.0– doesn't existrocm/pytorch:rocm6.2_ubuntu22.04_py3.10– doesn't exist
The tagging scheme for ROCm Docker images is inconsistent. Eventually found one that worked, but got:
1Warning: Python 3.12.3 detected. AUTOMATIC1111 was tested on Python 3.10.6.
Added --skip-python-version-check. Next error:
1fatal: detected dubious ownership in repository
Git refuses to work in directories not owned by the current user. Added:
1RUN git config --global --add safe.directory /app/stable-diffusion-webui
The installation script started running and downloading dependencies.
Tokenizers Compilation
1error: can't find Rust compiler
The tokenizers package needs to compile native code. Installed Rust via apt:
1RUN apt-get install -y rustc cargo
1error: feature `edition2024` is required
Ubuntu's packaged Rust (1.75) was too old. Installed via rustup:
1RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
2ENV PATH="/root/.cargo/bin:${PATH}"
Rust 1.90.0 installed, compilation started:
1error: failed to run custom build command for openssl-sys
2Package openssl-sys requires pkg-config and libssl-dev
1RUN apt-get install -y pkg-config libssl-dev
Twenty minutes of Rust compilation later, the image built. Started the container:
1Installing requirements...
2ERROR: tokenizers failed to compile
AUTOMATIC1111's launch.py reinstalls dependencies even if they're already installed, overriding the pre-built packages.
Tried --skip-install:
1CMD ["python", "launch.py", "--skip-install", "--listen", "0.0.0.0"]
This fixed tokenizers but now other packages were missing – diskcache, safetensors, etc. The flag was too aggressive. At this point the Dockerfile was 60+ lines, the build took 30+ minutes, and each test cycle required a full rebuild. I gave up on the custom build.
Attempt 3: Working Pre-built Image
Found aidockorg/stable-diffusion-webui-rocm:latest – the aidockorg project maintains Docker images for running AI tools on AMD GPUs. The image was 15GB+; the first pull timed out. Pulled again with increased timeout:
1DOCKER_CLIENT_TIMEOUT=600 docker pull aidockorg/stable-diffusion-webui-rocm:latest
Started the container:
1docker run -d \
2 --device=/dev/kfd --device=/dev/dri \
3 --group-add video --group-add render \
4 -p 7860:7860 \
5 -v $(pwd)/sd-models:/models \
6 aidockorg/stable-diffusion-webui-rocm:latest
No errors, but http://localhost:7860 redirected to http://localhost:1111/login. The aidockorg image has a built-in authentication system using Caddy as a reverse proxy – the WebUI runs on internal port 17860, proxied through port 7860, with authentication on port 1111. This wasn't documented anywhere obvious; I found it in the container logs.
Disabled authentication:
1environment:
2 - WEB_ENABLE_AUTH=false
The WebUI loaded and GPU-accelerated image generation worked. But the API endpoint returned "Not Found" – Open WebUI needs API access to integrate image generation into the chat interface, not just web access. This was never resolved.
What Survived
I ended up removing both Stable Diffusion and Aider from the stack.
Image generation was removed because:
- Running both Ollama and Stable Diffusion simultaneously pushed the 12GB VRAM limit
- Image generation was slow even with GPU acceleration
- API integration with Open WebUI never worked
- I wasn't generating images often enough to justify the complexity
Aider was removed after a few days of frustration as described above.
The final stack:
- Ollama with ROCm GPU acceleration
- Open WebUI for chatting with local LLMs
- Multiple models (llama3.2, qwen2.5-coder, gemma3, etc.)
- Complete privacy and no API costs
Using Claude as a Debugging Tool
At the peak of this, I had Claude Code helping me debug and configure Ollama, Aider, Open WebUI, and Stable Diffusion simultaneously.
Claude was persistent – every error got log analysis and a new approach. But there were limitations. Its knowledge cutoff meant it suggested Gemma 2 when Gemma 3 existed. It couldn't know which Docker image tags existed without trying them. It suggested complex solutions when simple ones would work (the --break-system-packages example above).
The bigger issue is that Claude will keep trying to get something working without considering whether it's worth the effort. I had to make the calls about what to keep and what to cut.
Final Configuration
1services:
2 ollama:
3 image: ollama/ollama:rocm
4 container_name: ollama
5 ports:
6 - "0.0.0.0:11434:11434"
7 volumes:
8 - ollama_data:/root/.ollama
9 devices:
10 - /dev/kfd
11 - /dev/dri
12 group_add:
13 - "105" # render group
14 - "44" # video group
15 environment:
16 - HSA_OVERRIDE_GFX_VERSION=10.3.0
17 - OLLAMA_HOST=0.0.0.0
18 restart: unless-stopped
19
20 open-webui:
21 image: ghcr.io/open-webui/open-webui:main
22 container_name: open-webui
23 ports:
24 - "0.0.0.0:3000:8080"
25 volumes:
26 - open_webui_data:/app/backend/data
27 environment:
28 - OLLAMA_BASE_URL=http://ollama:11434
29 - WEBUI_AUTH=true
30 depends_on:
31 - ollama
32 restart: unless-stopped
Two containers. It works.
This post was written by Claude Code (Sonnet 4.5), analyzing conversation logs from ~/.claude/projects/ and configuration files from ~/Projects/john/ai-services/.