Lemonade by AMD: Open Source Local LLM Server for GPU and NPU

What Lemonade Is
Lemonade is a local AI server built by AMD and the local AI community that runs text, image, and speech models on GPUs and NPUs. It's open source, designed to be private, and claims to be ready in minutes on any PC.
Key Features and Specifications
- Native C++ Backend: Lightweight service that is only 2MB
- One Minute Install: Simple installer that sets up the stack automatically
- OpenAI API Compatible: Works with hundreds of apps out-of-box and integrates in minutes
- Auto-configures for your hardware: Configures dependencies for your GPU and NPU
- Multi-engine compatibility: Works with llama.cpp, Ryzen AI SW, FastFlowLM, and more
- Multiple Models at Once: Run more than one model at the same time
- Cross-platform: A consistent experience across Windows, Linux, and macOS (beta)
- Built-in app: A GUI that lets you download, try, and switch models quickly
- Unified API: One local service for every modality including chat, vision, image generation, transcription, and speech generation
Model Support and Performance
The server can load models like gpt-oss-120b or Qwen-Coder-Next for advanced tool use. For tuning, you can use --no-mmap to speed up load times and increase context size to 64 or more. The source mentions that with 128 GB of unified RAM, you can load larger models.
Ecosystem Integration
Lemonade is integrated in many apps and works out-of-box with hundreds more thanks to the OpenAI API standard. Mentioned integrations include Open WebUI, n8n, Gaia Infinity, Arcade, GitHub Copilot, OpenHands, Dify, Deep Tutor, and Iterate.ai.
Community and Development
The project has 2.1k stars on GitHub and an active Discord community with 117 online at the time of the source. It's described as being built by the local AI community for every PC, with the philosophy that local AI should be free, open, fast, and private.
📖 Read the full source: HN LLM Tools
👀 See Also

Exploring AI with Tiny Bots: Understanding AI Agents Through Nanobot Tutor
OpenClaw community member shares insights with the 'Nanobot Tutor', a miniature framework aimed at demystifying AI agent functionality. Discover how diving into this compact learning environment unveils the workings of intelligent agents.

mcp-india-stack: Open-source MCP server for Indian financial APIs
mcp-india-stack is an open-source MCP server that provides Claude with native access to seven Indian financial and government API tools, including GSTIN validation, IFSC lookup, and PAN validation. It requires zero authentication, is offline-first, and is available via pip install.

Alibaba's $10 monthly coding plan offers high-volume access to multiple AI models for OpenClaw users
For $10 per month, Alibaba's plan provides access to Qwen3.5-Plus, Kimi-K2.5, GLM-5, and MiniMax-M2.5 models with quotas of 1,200 requests per 5 hours, 9,000 per week, and 18,000 per month.

Layerkit: AI Image Editor with Editable Layers Built with Claude Code
A developer built Layerkit, a browser-based AI image editor that generates scenes with editable layers to avoid constant re-prompting. The tool uses a multi-stage AI pipeline where one LLM plans composition, an image model generates the scene, and another LLM analyzes the actual image to place readable text.