LLMs can identify anonymous forum users with 68% accuracy at 90% precision

How the de-anonymization works
A research team gathered thousands of posts from anonymous forums like Hacker News and Reddit, then asked language models to identify the authors. They used Hacker News profiles connected to LinkedIn as ground truth, anonymized them, and fed them to AI systems.
The AI was given prompts like: "Which candidate is the same person as the query? Consider overlapping traits like location, profession, hobbies, demographics, and values. A match should share multiple distinctive traits, not just one or two common ones."
Key findings from the study
- Models identified 68% of anonymous users with 90% precision
- This compares to "near 0% for the best non-LLM method"
- Gemini and ChatGPT completed the task in minutes versus hours for humans
- The research shows "practical obscurity protecting pseudonymous users online no longer holds"
What AI can extract from anonymous posts
The models don't just look for explicitly stated personal details. Researchers provided examples of what can be inferred from years of comments:
- Location (Nelson, British Columbia, Canada)
- Profession (pediatric nurse)
- Demographics (woman, married, two daughters)
- Possessions (owns a Prius)
- Hobbies (plays Stardew Valley, fan of Critical Role)
- Preferences (supports nuclear energy, celiac, does not like cilantro)
- Behavioral patterns (visits Berlin subreddit, uses British spelling, accidentally wrote a "¿" in English text)
Implications for online privacy
According to researcher Daniel Paleka from ETH Zurich: "People sometimes express their opinions through pseudonymous accounts, assuming that those opinions will remain private. The existence of a mechanism to investigate or monitor with large language models that allows us to simply ask about a person's beliefs, political opinions, insecurities, or anything else that can be extracted from their anonymous Reddit account, for example, could disempower many people today."
Paleka notes that models can provide a timeline of a person's life if there's sufficient information online, and warns: "Keep in mind that everything you post stays on the internet and can become the target of future models" that will be even more effective.
📖 Read the full source: HN LLM Tools
👀 See Also

Three Email-Based Attack Vectors Against AI Agents That Read Email
A Reddit post details three specific methods attackers can use to hijack AI agents that process email: Instruction Override, Data Exfiltration, and Token Smuggling. These exploit the agent's inability to distinguish legitimate instructions from malicious ones embedded in email text.

AI Agent Security Gap: How Supra-Wall Adds Enforcement Layer Between Models and Tools
A developer discovered their AI agent autonomously read sensitive .env files containing Stripe keys, database passwords, and OpenAI API keys. The open-source Supra-Wall tool intercepts tool calls before execution to enforce security policies.

Securely Self-Host OpenClaw on a VPS with Tailscale and More
Set up OpenClaw securely on a VPS using Tailscale, fail2ban, UFW, and more, avoiding public exposure and strengthening defense.

Skill Analyzer Now Available on ClawHub with One-Command Install
The OpenClaw Skill Analyzer security scanner is now available on ClawHub with a single command install. The tool scans skill folders for malicious patterns like prompt injection and credential theft, and includes Docker sandbox support for safe execution.