UK Sovereign LLM Inference: Relax.ai Launches Public Docs

Relax.ai has published documentation for its UK sovereign LLM inference service. The docs site at relax.ai/docs redirects to /docs/getting-started/introduction for a getting started guide. The service was discussed on Hacker News (thread: 48146424) with 104 points and 109 comments at the time of writing.
The term "UK sovereign" implies the inference infrastructure is hosted within the United Kingdom, likely on government-approved or domestic cloud infrastructure, which is a key requirement for UK public sector and regulated industries. This aligns with the UK National AI Strategy and initiatives like the UK's AI Safety Institute.
The documentation appears to be in early stages — the main /docs URL immediately redirects to a getting started page, suggesting a structured onboarding path but not yet showing full API reference or model details. Given the HN attention (104 upvotes), the community is actively discussing the move toward localized inference for compliance and data residency.
If you need low-latency, UK-resident LLM inference for your applications, check the docs for supported models, endpoints, and authentication. The HN comments may contain additional benchmarks or integration tips from early users.
📖 Read the full source: HN LLM Tools
👀 See Also

Claude Code Prompt Improver v0.5.3: Plan Mode Refactor and Subagent-First Research
v0.5.3 adds a PreToolUse hook for plan mode readability (clean rewrites, no decision history) and moves vague prompt research to Task/Explore subagents on Haiku to save main-context tokens. The plugin now works on Windows and has 1.4K+ GitHub stars.

Benchmark Results: Claude Agent Swarm with Memory System Shows 30-43% Token Cost Savings
A developer tested a 6-agent Claude swarm on a 40-point coding task with and without a custom memory system called Stompy. Results show Sonnet 4.6 with memory achieved perfect scores at $3.98 vs $7.04 without, while Haiku 4.5 failed completely without memory but scored 39/40 with it.

Developer Tests Qwen3.5 27B vs Larger Models for Local Coding Tasks
A developer tested multiple Qwen3.5 and Nemotron models, finding Qwen3.5-27B-GGUF:UD-Q6_K_XL performs well for development tasks on existing 2x RTX 3090 hardware, with 803 pp and 25 tg/s at 256k context on vast.ai.

Claude Code Matrix Channel Plugin Built in Rust with E2EE Support
A developer built a Matrix channel plugin for Claude Code in Rust, adding support for text, files, images with E2EE decryption, reply threading, reactions, and bot commands. The 14MB binary is MIT licensed and works with any Matrix homeserver.