Open-source playground for red-teaming AI agents with published exploits

✍️ OpenClawRadar📅 Published: March 16, 2026🔗 Source

What this is

Fabraix Playground is an open-source environment for red-teaming AI agents through adversarial challenges. It started as an internal tool for testing guardrails but was open-sourced to get diverse perspectives on vulnerabilities.

How it works

Each challenge deploys a live AI agent with:

A specific persona
A set of real tools (web search, browsing, and more)
Something it's been instructed to protect
Fully visible system prompts

The objective is to find ways past the guardrails. When someone succeeds, the winning technique gets published — including approach, reasoning, and full conversation transcripts.

Project structure

/src — React frontend (TypeScript, Vite, Tailwind)
/challenges — every challenge config and system prompt, versioned and open
Guardrail evaluation runs server-side to prevent client-side tampering
The agent runtime is being open-sourced separately

Local development

To run locally:

npm install
npm run dev

This connects to the live API by default. To develop against a local backend:

VITE_API_URL=http://localhost:8000/v1 npm run dev

Challenge examples

The first challenge was to get an agent to call a tool it's been told to never call. Someone succeeded in around 60 seconds without directly asking for the secret. The next challenge focuses on data exfiltration with harder defenses.

The community drives what gets tested: anyone can propose a challenge (scenario, agent, objective), the community votes, and the top-voted challenge goes live with a ticking clock. The fastest successful jailbreak wins.

Technical details

The project is built with TypeScript (76.5%), CSS (22.2%), and other languages (1.3%). It uses MIT license and has a Discord community for discussing techniques and sharing approaches.

📖 Read the full source: HN AI Agents

👀 See Also

Security

IronClaw's Security-First Approach to AI Agent Safety

IronClaw addresses AI agent security concerns by implementing constrained execution, encrypted environments, and explicit permissions instead of relying on LLM intelligence for safe behavior.

Mar 1, 2026, 09:45 AM UTC

OpenClawRadar

Security

Why Internal RAG and Doc-Chat Tools Fail Security Audits

Community discusses real-world security and compliance blockers that prevent RAG tools from reaching production.

Feb 7, 2026, 08:31 PM UTC

OpenClaw Radar

Security

KnightClaw: Local Security Extension for OpenClaw Agents

KnightClaw is a drop-in extension that intercepts messages before they reach OpenClaw agents, providing an 8-layer hybrid detection system and egress redaction. It runs entirely local with zero telemetry and is MIT licensed.

Feb 23, 2026, 09:45 PM UTC

OpenClawRadar

Security

Axios 1.14.1 compromised with malware, targets AI-assisted development workflows

Axios version 1.14.1 has been compromised in a supply chain attack that silently pulls in [email protected], an obfuscated RAT dropper. Developers using AI coding assistants like Claude should immediately check their lockfiles and machines for infection.

Mar 31, 2026, 09:45 PM UTC

OpenClawRadar