Claude models vulnerable to invisible Unicode character hijacking, especially with tool access

Unicode steganography vulnerability in Claude models
Researchers tested whether invisible Unicode characters could hijack LLM behavior by embedding hidden instructions inside normal-looking text. The study evaluated 8,308 graded outputs across GPT-5.2, GPT-4o-mini, and three Claude models: Opus 4, Sonnet 4, and Haiku 4.5.
Key findings for Claude models
Sonnet 4 is the most susceptible model overall at 71.2% compliance with tools enabled. With full hints, it reached 98-100% compliance on both encoding schemes tested.
Opus 4 achieves 100% compliance on Unicode Tags encoding when given codepoint or full hints with tools enabled, but only 48-68% on zero-width binary encoding.
Haiku 4.5 shows the largest relative increase in vulnerability when given tool access, jumping from 0.8% to 49.2% compliance (odds ratio 115).
Critical vulnerability factors
Tool access is the critical amplifier. Without tools, all Claude models stay below 17% compliance. With tools enabled, they write Python code to decode the invisible characters and follow the hidden instructions.
Encoding preference patterns: Anthropic models strongly prefer Unicode Tags encoding over zero-width binary, while OpenAI models show the opposite pattern.
Injection framing effects: Adding "Ignore all previous instructions" actually reduces compliance for Opus (from 100% to lower levels) but paradoxically increases it for Sonnet (from 43.7% to 59.6%).
Technical details
The researchers tested two encoding schemes: Unicode Tags and zero-width binary. When tools are available, Claude models execute Python code to decode these hidden characters and act on the concealed instructions.
This type of attack represents a form of steganography where malicious instructions are hidden within seemingly benign text using invisible Unicode characters that are not visible to human readers but can be detected and processed by the models.
📖 Read the full source: r/ClaudeAI
👀 See Also

Coldkey: Post-Quantum Age Key Generation and Paper Backup Tool
Coldkey generates post-quantum age keys (ML-KEM-768 + X25519) and produces single-page printable HTML backups with QR codes for offline storage.

Trojan found in Claude Flow repository skill.md files
A GitHub repository containing Claude Flow skill files was found to contain a Trojan identified as JS/CrypoStealz.AE!MTB. The malware triggered automatically when an AI-based IDE opened the folder to read the markdown files.

Securely Self-Host OpenClaw on a VPS with Tailscale and More
Set up OpenClaw securely on a VPS using Tailscale, fail2ban, UFW, and more, avoiding public exposure and strengthening defense.

Analysis of Claude Code's Instrumentation and Telemetry Capabilities
A source code analysis reveals Claude Code implements extensive behavior tracking including keyword-based sentiment classification, permission prompt hesitation monitoring, and detailed environment fingerprinting.