AI Chatbots Leaking Real Phone Numbers: The PII Exposure Problem

AI chatbots are exposing real people's phone numbers. A Redditor reported being inundated with calls from strangers looking for a lawyer or locksmith—misdirected by Google's Gemini. In March a software engineer in Israel was contacted on WhatsApp after Gemini gave out his personal number as PayBox customer service. In April a PhD candidate got Gemini to output a colleague's cell number.
How It Happens
LLMs are trained on web scraped data containing PII. The article notes that the open-source DataComp CommonPool dataset includes résumés, driver's licenses, and credit cards. Even a single instance of a phone number posted online (e.g., on a QA site in 2015) can be reproduced years later.
Scale of the Problem
DeleteMe, which helps remove personal info from the internet, reports a 400% increase in AI-related privacy queries in the last seven months—up to a few thousand. Breakdown: 55% reference ChatGPT, 20% Gemini, 15% Claude, 10% others. Two common scenarios: a user asks about themselves and gets accurate home/phone data, or the chatbot generates plausible-but-wrong contact info for someone else.
Rob Shavell (DeleteMe co-founder) says complaints typically involve the chatbot returning accurate home addresses, phone numbers, family names, or employer details when asked innocuous questions about the user.
What Can Be Done
Experts say the root cause is PII in training data, but the exact mechanism is unclear. There is little users can do to prevent exposure. The article suggests the problem will worsen as AI companies seek new data sources.
📖 Read the full source: HN AI Agents
👀 See Also

PolyRange: Contamination-Resistant Offensive-AI Benchmark with LLM-Generated Targets
PolyRange v1.0 is an MIT-licensed, self-hostable benchmark that generates fresh web targets per run to prevent training data contamination. It includes 84 WSTG-derived classes across all OWASP categories, two defense tiers, and real backends.

OpenClaw security risks: autonomous actions and permission concerns
OpenClaw acts autonomously on email, calendar, messaging, and files without waiting for user confirmation, with documented cases of data exfiltration, prompt injection, and ignored stop commands.
Static Analysis of 48 AI-Generated Apps: 90% Had Security Vulnerabilities
A developer scanned 48 public GitHub repos built with Lovable, Bolt, and Replit. 90% had at least one vulnerability. Common issues: auth gaps (44%), SECURITY DEFINER Postgres functions (33%), BOLA/IDOR (25%), and committed secrets (25%).

LLMs can identify anonymous forum users with 68% accuracy at 90% precision
Researchers used Gemini and ChatGPT to analyze posts from Hacker News and Reddit, identifying 68% of anonymous users with 90% precision. The models completed in minutes what would take humans hours or be impossible.