Ollama's Technical Issues and Community Controversy

Ollama's Core Technology and Attribution Issues
Ollama's entire inference capability originally came from llama.cpp, the C++ inference engine created by Georgi Gerganov in March 2023. For over a year, Ollama's README contained no mention of llama.cpp, and their binary distributions didn't include the required MIT license notice for the llama.cpp code they were shipping.
The community opened GitHub issue #3185 in early 2024 requesting license compliance, which went over 400 days without a response from maintainers. When issue #3697 was opened in April 2024 specifically requesting llama.cpp acknowledgment, Ollama's co-founder Michael Chiang eventually added a single line to the bottom of the README: "llama.cpp project founded by Georgi Gerganov."
Technical Problems with Custom Backend
In mid-2025, Ollama moved away from using llama.cpp as their inference backend and built a custom implementation directly on top of ggml. This custom backend reintroduced bugs that llama.cpp had solved years ago, including:
- Broken structured output support
- Vision model failures
- GGML assertion crashes across multiple versions
- Models that worked fine in upstream llama.cpp failed in Ollama
- Lack of support for tensor types required by new releases like GPT-OSS 20B
Georgi Gerganov identified that Ollama had forked and made bad changes to GGML.
Performance Benchmarks
Multiple community tests show llama.cpp running 1.8x faster than Ollama on the same hardware with the same model:
- 161 tokens per second versus 89 tokens per second
- On CPU, the performance gap is 30-50%
- A recent comparison on Qwen-3 Coder 32B showed ~70% higher throughput with llama.cpp
The performance overhead comes from Ollama's daemon layer, poor GPU offloading heuristics, and a vendored backend that trails upstream.
Model Naming Issues
When DeepSeek released its R1 model family in January 2025, Ollama listed the smaller distilled versions (models like DeepSeek-R1-Distill-Qwen-32B) without clearly indicating they were distilled rather than the full models.
📖 Read the full source: HN LLM Tools
👀 See Also

Claude Academy: A Free Coding Bootcamp That Runs Inside Claude Desktop
A developer has built Claude Academy, a free coding bootcamp that operates entirely within Claude Desktop's Code tab. The system uses three commands to deliver 64 structured lessons across web development fundamentals, with progress tracking and real project building.

OpenSwarm: Multi-Agent Claude CLI Orchestrator for Linear and GitHub
OpenSwarm orchestrates multiple Claude Code CLI instances as autonomous agents that pull Linear issues and run Worker/Reviewer/Test/Documenter pipelines. It uses LanceDB with multilingual-e5 embeddings for memory and includes Discord bot control, PR auto-improvement, and a web dashboard.

CtxSnap VS Code Extension Tracks File Changes for Claude Sessions
CtxSnap is a VS Code extension that tracks which files changed since your last Claude session and packages them into a ready-to-paste handoff block with file contents and a token budget bar calibrated to Claude's 200k context window.

GAN Skill for Claude Code: Adversarial AI Tool for Idea Refinement
A Claude Code skill called /gan uses adversarial AI roles to critique and improve ideas through alternating Discriminator and Generator phases, with features like intensity modes, multi-language output, and forced role selection developed through self-iteration.