Threat data from 91K AI agent interactions: Tool abuse up 6.4%, new multimodal attacks

Threat landscape from production AI agent data
Real-world threat data from 91,284 AI agent interactions across 47 deployments shows 35,711 threats detected in February 2026. The detection model uses a Gemma-based 5-head multilabel classifier.
Key threats for self-hosted deployments
- Tool/command abuse: Increased 6.4% to 14.5% of threats. The dominant pattern is tool chain escalation where a harmless read call is followed by a write or execute. Most local setups give agents tool access without sufficient safeguards.
- Agent goal hijacking: Doubled to 6.9% of threats. Targets the planning phase in autonomous agent loops, particularly relevant for local setups with less monitoring on agent state.
- RAG poisoning: Shifted to metadata attacks at 12.0% (up from 10.0%). New pattern targets document metadata (titles, authors, annotations) rather than content. Most people sanitize content but pass metadata through as-is.
- Multimodal injection: New threat at 2.3% where instructions are hidden in images and PDFs. Text-only safety scanning misses these attacks.
Threat breakdown percentages
- Data Exfiltration: 18.0% (-1.2 MoM change)
- Tool/Command Abuse: 14.5% (+6.4)
- RAG/Context Attack: 12.0% (+2.0)
- Jailbreak: 11.0% (-1.3)
- Prompt Injection: 8.1% (-0.7)
- Agent Goal Hijack: 6.9% (+3.3)
- Inter-Agent Attack: 5.0% (+1.6)
Detection approach
The detection pipeline uses two layers: L1 is pattern matching with 218 rules (sub-ms latency, runs entirely locally), and L2 is Gemma-based. The full community edition is open source at github.com/raxe-ai/raxe-ce.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Security Alert: Malicious Code in LiteLLM May Steal API Keys
A critical security vulnerability has been identified in LiteLLM that could expose API keys. Users of OpenClaw or nanobot may be affected and should check the GitHub issues linked in the source.

AI Agent Security: Beyond Jailbreaks to Tool Misuse and Prompt Injection
AI agents that browse the web, execute commands, and trigger workflows face security risks from prompt injection and tool misuse, where untrusted content redirects legitimate tools like shell execution and HTTP requests.

Monitoring OpenClaw Commands with Python and Gemini Flash for Security
A user created a Python script that trails commands injected by OpenClaw, analyzes them with Gemini Flash, and sends notifications via Discord webhook for alarming or irregular activity, costing about $0.14 daily.

NPM Compromise via Axios Backdoor: Impact on AI Coding Agents
On March 31, 2026, a DPRK-linked threat actor compromised npm by publishing backdoored versions of Axios (1.14.1 and 0.30.4) during a 3-hour window. The malware injected a dependency that downloaded a platform-specific RAT, harvested credentials, and self-erased, with AI coding agents like Claude Code and Cursor being particularly vulnerable due to automated npm installs.