llm-hasher: Local PII Detection & Tokenization for LLMs

llm-hasher addresses a specific security gap in hybrid LLM workflows: when you run local LLMs but still call external services like OpenAI, Claude, or Gemini for certain tasks, your PII still leaves your infrastructure in plaintext. This tool runs PII detection entirely locally using Ollama, so no data leaves your systems during the detection phase.

How It Works

The process follows three steps: detect PII locally, tokenize it before external LLM calls, then restore the original values after processing. This prevents sensitive data from being exposed to third-party services.

Detection Approach

The detection system uses a hybrid approach:

Regex patterns for structured data types: credit cards, IBAN numbers, email addresses, and IPv4 addresses
Ollama with llama3.2:3b (by default) for contextual detection of unstructured PII: names, addresses, national IDs, passports, and dates of birth

Technical Implementation

Mappings between original PII and tokens are stored in an AES-256-GCM encrypted SQLite vault. Deployment is simplified with Docker Compose, which spins up both Ollama and the llm-hasher service with a single command.

📖 Read the full source: r/LocalLLaMA

llm-hasher: Local PII Detection and Tokenization for Hybrid LLM Workflows

How It Works

Detection Approach

Technical Implementation

👀 See Also

ClawCare: Security Guard for AI Coding Agents After AWS Key Leak

NPM Compromise via Axios Backdoor: Impact on AI Coding Agents

AISI Evaluation Shows Claude Mythos Preview's Cyber Capabilities in CTF and Multi-Step Attacks

SCION: Switzerland's Secure Alternative to BGP Routing Protocol