LLMock: HTTP-based mocking server for deterministic LLM testing across processes

LLMock is a mocking server that intercepts LLM API calls by running as a real HTTP server on a specified port, allowing deterministic testing across multiple processes without hitting paid APIs.
Key Details
The tool was discovered after a developer spent $12 running Playwright tests against real OpenAI APIs. The problem occurred when using MSW (Mock Service Worker), which patches the HTTP module inside the Node.js process that calls server.listen(), but leaves separate processes (like a Python agent) completely blind to the mocking.
With LLMock, you point the OPENAI_BASE_URL environment variable at the mock server from every process, regardless of whether it's Node.js, Python, or any other language:
const mock = new LLMock({ port: 5555 });
await mock.start();
process.env.OPENAI_BASE_URL = "http://localhost:5555/v1";Fixtures are plain JSON files that match on user message substrings or regex patterns, eliminating handler boilerplate:
{
"fixtures": [
{
"match": { "userMessage": "stock price of AAPL" },
"response": { "content": "The current stock price of Apple Inc. (AAPL) is $150.25." }
}
]
}Key features from the source:
- Speaks actual OpenAI/Claude/Gemini SSE format correctly (getting event types wrong breaks streaming in subtle ways)
- Full tool call support - agent frameworks execute them normally
- Predicate routing for inspecting system prompt state or message history for multi-agent flows
- Request journal to assert on what was actually called, not just whether the test passed
- Zero dependencies
The developer ended up with 9 LLM calls across 3 Playwright tests, costing $0 and producing deterministic results every run.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Meera: A Fully Offline AI Assistant for Linux Gnome Built on Qwen3.5-2B
Meera is an offline AI assistant for Gnome Desktop that uses Qwen3.5-2B-Q4_K_M (1.2 GB) and llama-cpp with Vulkan support. It leverages a second tiny embedding model for tool selection and RAG, avoiding prompt embedding bloat. Works on Ubuntu 24.04 with RTX 5090 and Fedora Silverblue on Intel i3.

Multi-operator Claude Code: Hub-based architecture for multi-agent sessions
A hub-based setup for Claude Code enables multiple people to attach to the same session, route subtasks across repos, and run headless agents in Docker containers.

Loom: A Local Execution Harness for Complex AI Tasks
Loom is an open-source local execution harness designed to manage complex tasks by providing a structured process with around 50 tools, a custom package plugin system for repeatable workflows, and both CLI and MCP server interfaces.

How AI assistants fetch web pages: Nginx log analysis of ChatGPT, Claude, Gemini and others
A developer tested five major AI assistants by prompting them with unique URLs and monitoring Nginx logs, revealing distinct retrieval patterns: ChatGPT, Claude, and Perplexity use dedicated user-agents while Gemini answered from its index without fetching.