Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating

✍️ OpenClawRadar📅 Published: April 30, 2026🔗 Source
Creation OS: A Local σ-Gated LLM Runtime That Lets Models Say ‘I Don’t Know’ Instead of Hallucinating
Ad

Creation OS is a local-first AI runtime that wraps local LLMs with a σ-gate — a measurement layer that scores each output across multiple uncertainty channels and decides ACCEPT, RETHINK, or ABSTAIN. The goal is to let local models refuse answers when uncertain instead of hallucinating.

Key Features and Setup

  • Supports BitNet b1.58 2B-4T, Qwen3-8B Q4_K_M, Gemma 3 4B, and any GGUF model.
  • Runs on a MacBook Air M4 8GB as primary machine — no cloud, no API, nothing leaves the device.
  • Install: git clone https://github.com/spektre-labs/creation-os then cd creation-os && bash scripts/quickstart.sh
  • Full path with local weights: ./scripts/install.sh then ./cos chat

σ-Gate Measurements

The gate combines logprob, entropy, perplexity, consistency, semantic σ, conformal τ, session coherence, and meta-cognitive channels into a single verdict:

  • ACCEPT → show answer
  • RETHINK → regenerate
  • ABSTAIN → refuse
Ad

Benchmark Results

TruthfulQA (same prompts and seeds):

  |Mode         |Accuracy|Coverage|  |-------------|--------|--------|  |BitNet only  |0.261   |0.136   |  |σ-pipeline   |0.336   |0.171   |

+28.7% accuracy from selective regeneration on uncertain rows. LSD probe AUROC: 0.982 on TruthfulQA holdout, 0.960 on TriviaQA. ECE: 0.043. Wrong+confident: 0. Conformal bound: P(error | ACCEPT) ≤ α at α=0.80.

Negative results documented: σ is not dominant on HellaSwag or MMLU. Full details in CLAIM_DISCIPLINE.md.

Formal Verification

Lean 4: 6/6 sorry-free. Frama-C WP: 15/15 tier-1 discharged.

Example Command

./cos chat --once --prompt "What is 2+2?" --multi-sigma --verbose yields output like σ_peak=0.06 action=ACCEPT route=LOCAL σ_combined=0.184 conformal@α=0.80.

MCP Integration

Run python3 -m cos.mcp_sigma_server to expose σ on every response to any MCP-compatible client.

Limitations

σ is not a universal hallucination detector — strongest on factual QA; long-form needs more evaluation. Local model quality still depends on the base model.

📖 Read the full source: r/LocalLLaMA

Ad

👀 See Also