Open-source playground for red-teaming AI agents with published exploits

What this is
Fabraix Playground is an open-source environment for red-teaming AI agents through adversarial challenges. It started as an internal tool for testing guardrails but was open-sourced to get diverse perspectives on vulnerabilities.
How it works
Each challenge deploys a live AI agent with:
- A specific persona
- A set of real tools (web search, browsing, and more)
- Something it's been instructed to protect
- Fully visible system prompts
The objective is to find ways past the guardrails. When someone succeeds, the winning technique gets published — including approach, reasoning, and full conversation transcripts.
Project structure
/src— React frontend (TypeScript, Vite, Tailwind)/challenges— every challenge config and system prompt, versioned and open- Guardrail evaluation runs server-side to prevent client-side tampering
- The agent runtime is being open-sourced separately
Local development
To run locally:
npm install
npm run devThis connects to the live API by default. To develop against a local backend:
VITE_API_URL=http://localhost:8000/v1 npm run devChallenge examples
The first challenge was to get an agent to call a tool it's been told to never call. Someone succeeded in around 60 seconds without directly asking for the secret. The next challenge focuses on data exfiltration with harder defenses.
The community drives what gets tested: anyone can propose a challenge (scenario, agent, objective), the community votes, and the top-voted challenge goes live with a ticking clock. The fastest successful jailbreak wins.
Technical details
The project is built with TypeScript (76.5%), CSS (22.2%), and other languages (1.3%). It uses MIT license and has a Discord community for discussing techniques and sharing approaches.
📖 Read the full source: HN AI Agents
👀 See Also

IronClaw's Security-First Approach to AI Agent Safety
IronClaw addresses AI agent security concerns by implementing constrained execution, encrypted environments, and explicit permissions instead of relying on LLM intelligence for safe behavior.

Why Internal RAG and Doc-Chat Tools Fail Security Audits
Community discusses real-world security and compliance blockers that prevent RAG tools from reaching production.

KnightClaw: Local Security Extension for OpenClaw Agents
KnightClaw is a drop-in extension that intercepts messages before they reach OpenClaw agents, providing an 8-layer hybrid detection system and egress redaction. It runs entirely local with zero telemetry and is MIT licensed.

Axios 1.14.1 compromised with malware, targets AI-assisted development workflows
Axios version 1.14.1 has been compromised in a supply chain attack that silently pulls in [email protected], an obfuscated RAT dropper. Developers using AI coding assistants like Claude should immediately check their lockfiles and machines for infection.