LLM Cost Profiler: Open-source tool tracks API spending to make case for local models

LLM Cost Profiler is an open-source Python tool that tracks every API call your code makes to OpenAI and Anthropic, showing exactly what you're spending, where, and why. The tool exposes which tasks are overpriced relative to their complexity, providing concrete data to make the case for local inference.
Key Features and Findings
The tool stores everything in local SQLite and is MIT licensed. According to the source, it found several specific examples of API call waste:
- A classifier using GPT-4o that outputs one of 5 labels — a task any decent 7B local model handles easily. Cost: ~$89/week on API calls.
- Thousands of duplicate calls to the same prompt — zero caching. Local inference with caching would make this effectively free.
- A summarizer where 34% of calls were retries from format errors. A well-tuned local model with constrained generation eliminates this entire class of waste.
The author notes this tool gives teams concrete ammunition for investing in local inference infrastructure: "Here's the exact dollar amount we'd save by moving X task to a local model."
The tool is available on GitHub at https://github.com/BuildWithAbid/llm-cost-profiler. The author is planning to add support for tracking local model inference costs too (compute time based costing) and asked the community if this would be useful.
This type of cost profiling tool is particularly relevant for developers using AI coding agents, as it provides data-driven insights into where API spending might be inefficient compared to local alternatives.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Two New Open Source Tools for AI Agent Security and Optimization
Two open source tools are available for AI agent developers: AI Agent Defense Kit provides runtime security skills, and AgentGuard (in development) offers cost tracking, security scanning, and activity monitoring.

Time Complexity MCP: Static Analysis Tool Feeds Big-O Complexity to AI Coding Agents
Time Complexity MCP is an open-source MCP server that performs static code analysis to detect Big-O complexity, feeding the results directly to AI coding agents like Claude Code or Copilot without token consumption. It supports JavaScript, TypeScript, Python, Java, Kotlin, and Dart.

Introducing Xrouter: A Smart Hybrid LLM Router to Optimize Cost and Performance
Discover Xrouter, an open-source creation that dynamically integrates local with cloud inference, designed to slash AI costs while boosting efficiency.

Real-world comparison: Opus 4.6 vs MiMo-V2-Pro vs GLM-5 on OpenClaw setup
A developer tested three AI models on practical tasks including Turkish idiom translation, Python coding, spatial reasoning, and browser automation. MiMo-V2-Pro outperformed Opus 4.6 on coding tasks and cost 20x less, while Opus maintained advantages in non-English language understanding.