GLiGuard: Open-Source 300M Parameter Safety Moderation Model Claims 16x Speedup Over LLM Guardrails
Fastino Labs has open-sourced GLiGuard, a safety moderation model that replaces generative guardrails with a classification approach. The 300M parameter encoder handles four moderation tasks in one forward pass, achieving accuracy comparable to 7B–27B parameter decoder models while reducing latency by up to 16x. Weights are available under Apache 2.0 on Hugging Face, with inference also available on Pioneer.
Why decoder-based guardrails are slow
Current state-of-the-art guardrails (e.g., Llama Guard) use decoder-only transformers that generate verdicts token by token. This sequential generation makes them slow and expensive for real-time safety filtering. Most also evaluate safety dimensions separately, compounding latency. At 7B to 27B parameters, these models are costly to run at production scale.
GLiGuard's encoder approach
GLiGuard reframes moderation as text classification. It encodes both input text and task labels together, scoring all labels simultaneously in a single pass. Adding more safety dimensions (labels) does not add inference time. The model handles four concurrent tasks:
- Safety classification — safe / unsafe for both user prompts and model responses
- Jailbreak strategy detection — 11 categories (prompt injection, roleplay bypass, instruction override, social engineering, etc.)
- Harm category detection — 14 categories (violence, sexual content, hate speech, PII, misinformation, child safety, copyright violation, etc.)
- Refusal detection — compliance or refusal, used to measure over-refusal and false compliance
All four are evaluated together, where decoder models would require sequential passes or multiple model calls.
Benchmarks and performance
Across nine safety benchmarks, GLiGuard matches or exceeds models 23–90x its size while running up to 16x faster. No specific accuracy numbers are given in the post, but performance is claimed to be comparable to leading generative guardrails.
Who it's for
Teams deploying LLM agents or chat systems that need low-latency, cost-effective real-time safety filtering at scale.
📖 Read the full source: HN AI Agents
👀 See Also

Slides-grab: Visual Editor for Fixing HTML Slides Generated by Claude Code
Slides-grab is a tool that lets you drag elements on HTML/CSS slides generated by Claude Code, then sends XPath and a highlighted screenshot to the AI agent for precise editing. It addresses the pain point of fixing small layout issues through text prompts alone.

Fullerenes: Open-source persistent memory layer for coding agents cuts tokens by 64% on SWE-bench
Fullerenes uses a local SQLite knowledge graph built via Tree-sitter to give coding agents like Claude Code persistent memory, reducing token usage by 64% on SWE-bench and up to 96.6% on internal benchmarks.

WordPress.com MCP Integration Adds Write Capabilities for Claude
WordPress.com's MCP integration now supports write operations, allowing Claude to draft posts, build pages, manage comments, fix image alt text, and restructure content categories directly on WordPress.com sites. Before generating content, Claude reads the site's theme to understand design elements like colors, fonts, and block patterns.

Running NemoClaw with Local vLLM: Setup Notes and Agent Engineering Observations
A developer documented running NVIDIA's NemoClaw sandboxed AI agent platform with a local Nemotron 9B v2 model via vLLM on WSL2. Key findings include inference routing details, parser compatibility issues, and observations about the agent engineering gap.