GoModel: A Lightweight Open-Source AI Gateway Written in Go

GoModel is an open-source AI gateway written in Go that sits between your application and model providers like OpenAI, Anthropic, Gemini, and others. It provides a unified OpenAI-compatible API interface while handling provider-specific differences internally.
Key Features and Differences
The project was built to solve several practical problems: tracking AI usage and cost per client or team, switching models without changing application code, debugging request flows more easily, and reducing AI spending with exact and semantic caching.
Key differentiators from alternatives:
- ~17MB Docker image (LiteLLM's image is ~746MB on amd64, making GoModel 44x lighter)
- Request workflow is visible and easy to inspect
- Configuration is environment-variable-first by default
Quick Start
Basic deployment with Docker:
docker run --rm -p 8080:8080 \
-e OPENAI_API_KEY="your-openai-key" \
enterpilot/gomodel
For production, avoid passing secrets via command line and use:
docker run --env-file .env enterpilot/gomodel
Make your first API call:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{ "model": "gpt-5-chat-latest", "messages": [{"role": "user", "content": "Hello!"}] }'
Supported Providers
GoModel supports multiple LLM providers with automatic detection based on supplied credentials:
- OpenAI (OPENAI_API_KEY)
- Anthropic (ANTHROPIC_API_KEY)
- Google Gemini (GEMINI_API_KEY)
- Groq (GROQ_API_KEY)
- OpenRouter (OPENROUTER_API_KEY)
- Z.ai (ZAI_API_KEY)
- xAI/Grok (XAI_API_KEY)
- Azure OpenAI (AZURE_API_KEY + AZURE_BASE_URL)
- Oracle (ORACLE_API_KEY + ORACLE_BASE_URL)
- Ollama (OLLAMA_BASE_URL)
The gateway supports chat completions, embeddings, file processing, batch operations, and passthrough capabilities across most providers. For Oracle, you may need to set ORACLE_MODELS=openai.gpt-oss-120b,xai.grok-3 when the upstream /models endpoint is unavailable.
Alternative Setup Methods
You can also run from source (Go 1.26.2+ required) or use Docker Compose for infrastructure components including Redis, PostgreSQL, MongoDB, and Adminer.
This type of gateway is particularly useful for teams managing multiple AI models across different providers, needing cost tracking, or wanting to maintain flexibility to switch providers without code changes. The lightweight Docker image makes it suitable for resource-constrained environments.
📖 Read the full source: HN LLM Tools
👀 See Also

n8n-mcp-lite: MCP server reduces token usage by 80% for Claude with n8n workflows
A new open-source Model Context Protocol server called n8n-mcp-lite helps Claude reason about n8n automation workflows while reducing token usage by approximately 80%. The tool addresses the token-heavy nature of visual node automations by providing targeted workflow scanning and surgical updates.

Open source AI model stack for cost-effective Claude replacement
A Reddit user shares a working AI model stack using open source models like Llama 3.3 70b and DeepSeek R1 32b for local execution, reducing monthly AI costs from £60+ to under £3 by routing 90% of tasks to free models.

Nudge: A local-first app that surfaces Claude-generated plans via contextual triggers
Nudge is a free, local-first iOS/Android app that lets you paste markdown plans (from Claude, ChatGPT, Notes) and attach triggers like time, location, Wi-Fi, inactivity, or one-time to surface them via local notifications.

context-link v1.0.0: Local MCP server reduces Claude Code token usage by 91%
context-link v1.0.0 is a local MCP server that indexes codebases with Tree-sitter to serve Claude only the exact symbols, dependencies and structure needed, reducing token usage by 91% in specific cases and 70-80% across full tasks.