LLMs Identify Anonymous Users With 68% Accuracy

How the de-anonymization works

A research team gathered thousands of posts from anonymous forums like Hacker News and Reddit, then asked language models to identify the authors. They used Hacker News profiles connected to LinkedIn as ground truth, anonymized them, and fed them to AI systems.

The AI was given prompts like: "Which candidate is the same person as the query? Consider overlapping traits like location, profession, hobbies, demographics, and values. A match should share multiple distinctive traits, not just one or two common ones."

Key findings from the study

Models identified 68% of anonymous users with 90% precision
This compares to "near 0% for the best non-LLM method"
Gemini and ChatGPT completed the task in minutes versus hours for humans
The research shows "practical obscurity protecting pseudonymous users online no longer holds"

What AI can extract from anonymous posts

The models don't just look for explicitly stated personal details. Researchers provided examples of what can be inferred from years of comments:

Location (Nelson, British Columbia, Canada)
Profession (pediatric nurse)
Demographics (woman, married, two daughters)
Possessions (owns a Prius)
Hobbies (plays Stardew Valley, fan of Critical Role)
Preferences (supports nuclear energy, celiac, does not like cilantro)
Behavioral patterns (visits Berlin subreddit, uses British spelling, accidentally wrote a "¿" in English text)

Implications for online privacy

According to researcher Daniel Paleka from ETH Zurich: "People sometimes express their opinions through pseudonymous accounts, assuming that those opinions will remain private. The existence of a mechanism to investigate or monitor with large language models that allows us to simply ask about a person's beliefs, political opinions, insecurities, or anything else that can be extracted from their anonymous Reddit account, for example, could disempower many people today."

Paleka notes that models can provide a timeline of a person's life if there's sufficient information online, and warns: "Keep in mind that everything you post stays on the internet and can become the target of future models" that will be even more effective.

📖 Read the full source: HN LLM Tools