LLMock: HTTP Server for Deterministic LLM Testing

LLMock is a mocking server that intercepts LLM API calls by running as a real HTTP server on a specified port, allowing deterministic testing across multiple processes without hitting paid APIs.

Key Details

The tool was discovered after a developer spent $12 running Playwright tests against real OpenAI APIs. The problem occurred when using MSW (Mock Service Worker), which patches the HTTP module inside the Node.js process that calls server.listen(), but leaves separate processes (like a Python agent) completely blind to the mocking.

With LLMock, you point the OPENAI_BASE_URL environment variable at the mock server from every process, regardless of whether it's Node.js, Python, or any other language:

const mock = new LLMock({ port: 5555 });
await mock.start();
process.env.OPENAI_BASE_URL = "http://localhost:5555/v1";

Fixtures are plain JSON files that match on user message substrings or regex patterns, eliminating handler boilerplate:

{
  "fixtures": [
    {
      "match": { "userMessage": "stock price of AAPL" },
      "response": { "content": "The current stock price of Apple Inc. (AAPL) is $150.25." }
    }
  ]
}

Key features from the source:

Speaks actual OpenAI/Claude/Gemini SSE format correctly (getting event types wrong breaks streaming in subtle ways)
Full tool call support - agent frameworks execute them normally
Predicate routing for inspecting system prompt state or message history for multi-agent flows
Request journal to assert on what was actually called, not just whether the test passed
Zero dependencies

The developer ended up with 9 LLM calls across 3 Playwright tests, costing $0 and producing deterministic results every run.

📖 Read the full source: r/LocalLLaMA

LLMock: HTTP-based mocking server for deterministic LLM testing across processes

Key Details

👀 See Also

Meera: A Fully Offline AI Assistant for Linux Gnome Built on Qwen3.5-2B

Multi-operator Claude Code: Hub-based architecture for multi-agent sessions

Loom: A Local Execution Harness for Complex AI Tasks

How AI assistants fetch web pages: Nginx log analysis of ChatGPT, Claude, Gemini and others