GLiGuard: Open-Source 300M Parameter Safety Moderation Model Claims 16x Speedup Over LLM Guardrails

✍️ OpenClawRadar📅 Published: May 13, 2026🔗 Source
Ad

Fastino Labs has open-sourced GLiGuard, a safety moderation model that replaces generative guardrails with a classification approach. The 300M parameter encoder handles four moderation tasks in one forward pass, achieving accuracy comparable to 7B–27B parameter decoder models while reducing latency by up to 16x. Weights are available under Apache 2.0 on Hugging Face, with inference also available on Pioneer.

Why decoder-based guardrails are slow

Current state-of-the-art guardrails (e.g., Llama Guard) use decoder-only transformers that generate verdicts token by token. This sequential generation makes them slow and expensive for real-time safety filtering. Most also evaluate safety dimensions separately, compounding latency. At 7B to 27B parameters, these models are costly to run at production scale.

Ad

GLiGuard's encoder approach

GLiGuard reframes moderation as text classification. It encodes both input text and task labels together, scoring all labels simultaneously in a single pass. Adding more safety dimensions (labels) does not add inference time. The model handles four concurrent tasks:

  • Safety classification — safe / unsafe for both user prompts and model responses
  • Jailbreak strategy detection — 11 categories (prompt injection, roleplay bypass, instruction override, social engineering, etc.)
  • Harm category detection — 14 categories (violence, sexual content, hate speech, PII, misinformation, child safety, copyright violation, etc.)
  • Refusal detection — compliance or refusal, used to measure over-refusal and false compliance

All four are evaluated together, where decoder models would require sequential passes or multiple model calls.

Benchmarks and performance

Across nine safety benchmarks, GLiGuard matches or exceeds models 23–90x its size while running up to 16x faster. No specific accuracy numbers are given in the post, but performance is claimed to be comparable to leading generative guardrails.

Who it's for

Teams deploying LLM agents or chat systems that need low-latency, cost-effective real-time safety filtering at scale.

📖 Read the full source: HN AI Agents

Ad

👀 See Also

Slides-grab: Visual Editor for Fixing HTML Slides Generated by Claude Code
Tools

Slides-grab: Visual Editor for Fixing HTML Slides Generated by Claude Code

Slides-grab is a tool that lets you drag elements on HTML/CSS slides generated by Claude Code, then sends XPath and a highlighted screenshot to the AI agent for precise editing. It addresses the pain point of fixing small layout issues through text prompts alone.

OpenClawRadar
Fullerenes: Open-source persistent memory layer for coding agents cuts tokens by 64% on SWE-bench
Tools

Fullerenes: Open-source persistent memory layer for coding agents cuts tokens by 64% on SWE-bench

Fullerenes uses a local SQLite knowledge graph built via Tree-sitter to give coding agents like Claude Code persistent memory, reducing token usage by 64% on SWE-bench and up to 96.6% on internal benchmarks.

OpenClawRadar
WordPress.com MCP Integration Adds Write Capabilities for Claude
Tools

WordPress.com MCP Integration Adds Write Capabilities for Claude

WordPress.com's MCP integration now supports write operations, allowing Claude to draft posts, build pages, manage comments, fix image alt text, and restructure content categories directly on WordPress.com sites. Before generating content, Claude reads the site's theme to understand design elements like colors, fonts, and block patterns.

OpenClawRadar
Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations
Tools

Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations

A developer documented running NVIDIA's NemoClaw sandboxed AI agent platform with a local Nemotron 9B v2 model via vLLM on WSL2. Key findings include inference routing details, parser compatibility issues, and observations about the agent engineering gap.

OpenClawRadar