LLM Architecture Gallery: Visual Reference for Model Designs

Sebastian Raschka's LLM Architecture Gallery is a collection of architecture figures and fact sheets from The Big LLM Architecture Comparison and A Dream of Spring for Open-Weight LLMs, focusing specifically on architecture panels. The gallery includes clickable figures that enlarge for detail, with model titles linking to corresponding article sections.
Key Model Details
The gallery provides specific architectural specifications for numerous models:
- Llama 3 8B: 8B parameters, released 2024-04-18, dense decoder with GQA and RoPE attention, serves as pre-norm baseline
- OLMo 2 7B: 7B parameters, released 2024-11-25, dense decoder with MHA and QK-Norm, uses inside-residual post-norm instead of pre-norm
- DeepSeek V3: 671B total parameters (37B active), released 2024-12-26, sparse MoE decoder with MLA attention, uses dense prefix plus shared expert
- DeepSeek R1: 671B total parameters (37B active), released 2025-01-20, sparse MoE decoder with MLA attention, architecture matches DeepSeek V3 with reasoning-oriented training
- Gemma 3 27B: 27B parameters, released 2025-03-11, dense decoder with GQA and QK-Norm, uses 5:1 sliding-window/global attention ratio
- Mistral Small 3.1 24B: 24B parameters, released 2025-03-18, dense decoder with standard GQA, latency-focused design with smaller KV cache
- Llama 4 Maverick: 400B total parameters (17B active), released 2025-04-05, sparse MoE decoder with GQA attention, alternates dense and MoE blocks
- Qwen3 235B-A22B: 235B total parameters (22B active), released 2025-04-28, sparse MoE decoder with GQA and QK-Norm, optimized for serving efficiency without shared expert
- Qwen3 32B: 32B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, reference dense Qwen stack with 8 KV heads
- Qwen3 4B: 4B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, compact stack with 151k vocabulary
- Qwen3 8B: 8B parameters, released 2025-04-28, dense decoder with GQA and QK-Norm, reference Qwen3 dense stack with 8 KV heads
- SmolLM3 3B: 3B parameters, released 2025-06-19, dense decoder with GQA, experiments with periodic NoPE layers
Practical Features
The gallery includes an issue tracker for reporting inaccurate fact sheets, mislabeled architectures, or broken links. A physical poster version is available via Zazzle with a high-resolution export at 14570 x 12490 pixels (56 MB PNG file, 182 megapixels).
For developers working with AI coding agents, this resource provides concrete architectural details that can inform model selection, fine-tuning decisions, and performance optimization. The side-by-side comparison format makes it easier to understand trade-offs between different architectural choices.
📖 Read the full source: HN LLM Tools
👀 See Also

Claude IDE Bridge: MCP Tool for Remote Editor Access
Claude IDE Bridge is an open-source tool that provides Claude AI with remote control access to code editors via MCP (Model Context Protocol). It exposes editor knowledge like live type information and debugger state as callable tools.

Mímir: A Python Memory System Built on 21 Neuroscience Mechanisms
Mímir is a Python memory system for AI agents that implements 21 cognitive science mechanisms like flashbulb memory and retrieval-induced forgetting. It uses a hybrid BM25 + semantic + date index and shows benchmark improvements including 13% higher tool accuracy on Mem2ActBench versus VividnessMem.

AI Doomsday Toolbox v0.932 adds benchmarking, dataset creation, and agent workspace for Android local AI
AI Doomsday Toolbox v0.932 introduces benchmarking for local LLMs on Android devices, a dataset creator that converts text/PDF files to Alpaca JSON format, and an AI agent workspace with Termux integration. The update also includes subtitle burning with Whisper and built-in Ollama management tools.

ClaudeHive: Web UI for Managing Parallel Claude Code Sessions
ClaudeHive is a web UI that handles parallel Claude Code sessions, allowing users to define prompt templates with placeholders, batch-run them across multiple inputs with configurable concurrency, and review all results in one place. It includes a CLI tool for manager agents to spawn and coordinate worker agents.