GuppyLM: A 9M Parameter LLM Built from Scratch for Educational Purposes

What GuppyLM Is
GuppyLM is a tiny language model (~9M parameters) that pretends to be a fish named Guppy. It's built from scratch to demonstrate how language models work without requiring a PhD or massive GPU cluster. The project includes data generation, tokenizer creation, model architecture, training loop, and inference—all in about 130 lines of PyTorch code.
Architecture Details
- Parameters: 8.7M
- Layers: 6
- Hidden dimension: 384
- Heads: 6
- FFN: 768 (ReLU)
- Vocab: 4,096 (BPE)
- Max sequence: 128 tokens
- Norm: LayerNorm
- Position: Learned embeddings
- LM head: Weight-tied with embeddings
This is a vanilla transformer with no GQA, RoPE, SwiGLU, or early exit—as simple as possible.
Training and Data
The model trains on 60K synthetic conversations across 60 topics in about 5 minutes on a free Colab T4 GPU. The dataset (arman-bd/guppylm-60k-generic on HuggingFace) contains 57K training and 3K test samples in format: {"input": "...", "output": "...", "category": "..."}.
Topics include: greetings, feelings, temperature, food, light, water, tank, noise, night, loneliness, bubbles, glass, reflection, breathing, swimming, colors, taste, plants, filter, algae, snails, scared, excited, bored, curious, happy, tired, outside, cats, rain, seasons, music, visitors, children, meaning of life, time, memory, dreams, size, future, past, name, weather, sleep, friends, jokes, fear, love, age, intelligence, health, singing, TV, and more.
Personality and Output
Guppy speaks in short, lowercase sentences about water, temperature, light, vibrations, and food. It doesn't understand human abstractions like money, phones, or politics. Sample conversation:
You> hi guppy Guppy> hi there. i just found a nice spot near the rock. the temperature feels nice. You> are you hungry Guppy> yes. always yes. i will swim to the top right now. i promise to eat all of it. You> what is the meaning of life Guppy> food. the answer is always food.
Quick Start
Chat with Guppy (no training needed): Downloads the pre-trained model from HuggingFace. Just run all cells in the notebook.
Train your own: Set runtime to T4 GPU, run all cells—downloads dataset, trains tokenizer, trains model, tests it. Upload to HuggingFace or download locally.
Chat locally:
pip install torch tokenizers python -m guppylm chat
Example local interaction:
You> the cat is looking at you Guppy> i hide behind the plant when the furry one comes. You> it is raining outside Guppy> i think rain is the best thing about outside.
Project Structure
guppylm/ ├── config.py # Hyperparameters (model + training) ├── model.py # Vanilla transformer ├── dataset.py # Data loading + batching ├── train.py # Training loop (cosine LR, AMP) ├── generate_data.py # Conversation data generator (60 topics) ├── eval_cases.py # Held-out test cases ├── prepare_data.py # Data prep + tokenizer training └── inference.py # Chat interface
This project is useful for developers who want to understand transformer architecture fundamentals without dealing with billion-parameter models. The complete implementation shows every piece from raw text to trained weights to generated output.
📖 Read the full source: HN LLM Tools
👀 See Also

OpenClaw Skill 'Waiting Tips' Shows Learning Tips During AI Response Wait Times
The 'Waiting Tips' skill for OpenClaw sends a random learning tip immediately when a user sends a message, filling the 5-10 second wait time for AI responses with useful content. It includes 75 bilingual tips in five categories and works across multiple messaging platforms.
PullMD v2.4.1 Adds Native MCP Connector for claude.ai Web and Multi-User Auth
PullMD v2.4.1 now supports the claude.ai web custom connector dialog via OAuth 2.1 + PKCE-S256 and adds multi-user auth modes. Turn any URL into clean Markdown via self-hosted MCP.

UK Sovereign LLM Inference: Relax.ai Launches Public Docs
Relax.ai released docs for UK sovereign LLM inference, redirecting to /docs/getting-started/introduction. The service was shared on HN with 104 points.

Developer creates read/write WordPress MCP plugin with 28 abilities
A developer built a WordPress plugin that registers 28 MCP abilities through the WordPress Abilities API, enabling full read/write access for AI coding agents. The plugin handles content management, quality auditing, and safety features, converting between Markdown and Gutenberg blocks automatically.