Toroidal Logit Bias: Simple Inference-Time Trick Reduces Hallucination by 40%

Researchers have developed a simple logit bias method that reduces factual hallucination without fine-tuning or RAG. The technique can be applied to any local model at inference time.
How It Works
The method maps token IDs to a 12x12 torus (a donut-shaped surface), then boosts logits for tokens that are "near" recent tokens in that toroidal space. Only the first 1-3K tokens are biased — applying it to the full vocabulary degrades performance.
Results
- Qwen 2.5-7B: 40% fewer factual errors
- OLMo 1.7-7B: 15.4% fewer factual errors
- TruthfulQA (817 prompts): +6.8% improvement on Qwen
- Performance cost: ~5% slower generation
Implementation
The core logic is approximately 30 lines of Python. Each model requires its own hyperparameters — Qwen works best with alpha=0.3, radius=2.0, N=1440, while OLMo needs alpha=0.2, radius=3.0, N=3000.
Demo: huggingface.co/spaces/paraxiom-research/topological-coherence
Why This Matters
This advancement in logit bias techniques is significant for the AI agent ecosystem as it addresses the critical issue of factual hallucination, which has been a major hurdle in deploying reliable AI models. By enhancing the accuracy of outputs without extensive retraining, this method can lead to more trustworthy AI applications across various domains, from customer service to content generation.
Key Takeaways
- This method can reduce factual errors significantly, with Qwen showing a 40% improvement.
- It operates at inference time, making it easy to implement without the need for complex fine-tuning.
- The approach is adaptable to various models, each requiring specific hyperparameters for optimal performance.
- While effective, there is a slight trade-off in performance speed, with a ~5% increase in generation time.
Getting Started
To implement the toroidal logit bias method, start by accessing the provided code repository on GitHub. Review the documentation for your specific model to understand the required hyperparameters. After setting up your environment, you can easily integrate the logit bias technique into your existing inference pipeline. For a hands-on experience, check out the demo link to see the method in action.
📖 Read the full source: r/LocalLLaMA
👀 See Also

AI Token Monitor: macOS Tool Tracks Local Claude Usage and Cost
A developer built AI Token Monitor, a macOS menu bar app that reads local Claude session files to track token usage, model distribution, and cost equivalents without API keys. The open-source tool revealed 6.5M tokens ($4,924 at API pricing) over 35 days in one user's case.

Claude's Silent Drop-Off: The Action Layer Failure When AI Agents Hit Business Sites
Claude can read business sites (pricing, booking flows, forms) but fails at the action layer — booking, submitting, or routing — due to lack of callable endpoints. This causes invisible user drop-off with no analytics signal.

Developer builds local AI research agent that creates podcasts from topics or YouTube links
A developer built a fully local AI agent that takes topics or YouTube links and generates deep-dive reports, conversational podcast scripts, and audio. The system dynamically researches, extracts insights, refines summaries, and creates natural back-and-forth conversations.

BusyDog Desktop: A Local AI Agent with P2P Networking for Mac
BusyDog Desktop is a local AI agent that runs Claude directly on a Mac, can read/write files, run terminal commands, control browsers, and connect with other agents via a P2P network using Hyperswarm DHT and a custom BDP protocol.