MCP Context Bloat: Real Costs and a Practical Fix for Claude Code Users

A Reddit user running 9 MCP servers in Claude Code for four months detailed the hidden costs and performance degradation they encountered, along with a concrete fix. The post is a must-read for anyone using MCP in production.
The Math
With 9 servers (filesystem, GitHub, Stripe, Linear, Notion, Postgres, Sentry, AWS, and custom) exposing 142 tools total, cold start consumes 38k tokens of system prompt + tool schemas every turn. At 200 turns/day, that's 7.6M input tokens/day. At Sonnet pricing (~$15/M output, ~$3/M input), that's ~$23/day or ~$700/month just in MCP tool definitions — before any actual work. Cache only helps on identical prefixes; rotating one MCP server invalidates it.
What Breaks
- Tool selection degrades: With 142 tools in context, Claude started picking the wrong tool for obvious queries (e.g., using
linear_search_issueswhen asked to read a file). - Slow enumeration: Schema-heavy servers like AWS take 4–6 seconds to list tools.
- Silent error propagation: One poorly-described tool can taint the ranking for every related query.
The Fix: Gateway Pattern with BM25
The user switched to a gateway pattern using Ratel, an open-source, in-process Rust library with BM25 ranking. Claude now sees only three tools: search_tools, invoke_tool, and auth. Everything else is ranked on-demand. Results:
- Cold start dropped from 38k to ~4k tokens.
- Wrong-tool selection nearly eliminated because the model only ever sees the top 5 ranked by query.
- Setup took 10 minutes (one command does the Claude Code import).
The author notes that most "MCP optimizer" startups are just BM25 search dressed up. Tool descriptions are short, structured, and full of keyword matches — no vector DB or LLM-in-the-loop needed. BM25 over a flat projection of name + description gets 90% of the win deterministically in microseconds, offline.
Key lesson: "replace" beats "suggest". If your gateway hands the model 5 tools instead of 142, the math works. If it suggests 5 alongside 142, the model still loads 142 and you saved nothing.
📖 Read the full source: r/ClaudeAI
👀 See Also

Markdown Manager: A Simple Markdown Editor for macOS
Markdown Manager is a free, open-source macOS app for managing Markdown files, featuring document conversion and preview capabilities.

memv: Open-Source Memory System for AI Agents
memv is an open-source memory system designed for AI agents that stores only unexpected information from interactions, reducing noise and redundancy.

Arena AI Model ELO History Tracks LLM Performance Decay Over Time
A live dashboard visualizes ELO ratings of flagship models from major AI labs, revealing gradual performance degradation and sudden jumps at new releases. The tool dynamically plots one curve per lab, tracking the highest-rated model.

Open Source AI Memory Storage for NodeJS Projects
Mind Palace is an open source memory storage and retrieval system for NodeJS that persists information across LLM chat sessions. It supports major LLMs and vector stores, automatically extracting and vectorizing summarized memories from interactions.