Using AI to Untangle 10,000 Brazilian Property Titles: A Technical Case Study

Project Context and Problem
A Brazilian real estate company inherited approximately 10,000 property titles across 10+ municipalities with decades of poor management. The data includes hundreds of unregistered "drawer contracts" (informal sales never filed), duplicate sales of the same properties, fraudulent contracts, forged powers of attorney, irregular occupations, and approximately 500 active lawsuits including adverse possession claims, compulsory adjudication, evictions, duplicate sale disputes, and 2 class action suits. The physical document archive is partially held by police as part of an old investigation.
Technical Approach
The team (6 lawyers + 3 operators) decided against building infrastructure upfront, opting instead for a discovery-first approach with AI assistance. The plan involves five steps:
- Step 1 - Physical scanning: Documents organized by municipality, scanned in batches with naming convention: [municipality]_[document-type]_[sequence] using a document scanner with ADF (automatic document feeder).
- Step 2 - OCR: Considering Google Document AI, Mistral OCR 3, AWS Textract, or other tools. The team is asking for feedback on tools specifically tested on degraded Latin American registry documents.
- Step 3 - Discovery: Feeding OCR output directly into AI tools with large context windows for open-ended analysis before database setup. Using Gemini 3.1 Pro (in NotebookLM or other interface) for broad batch analysis with prompts like "which lots appear linked to more than one buyer?", "flag contracts with incoherent dates", "identify clusters of suspicious names or activity", and "help us see problems and solutions for what we aren't seeing". Running Claude Projects in parallel for similar analysis.
- Step 4 - Data cleaning and standardization: Normalizing raw extracted data before database insertion. Addressing municipality names written multiple ways ("B. Vista", "Bela Vista de GO", "Bela V. Goiás") to canonical form, standardizing CPFs (Brazilian personal ID numbers) with and without punctuation, fixing inconsistent lot status descriptions to enum categories, and fuzzy matching buyer names with spelling variations. Tools: Python + rapidfuzz for fuzzy matching, Claude API for normalizing free-text fields into categories. The team is asking whether fuzzy matching + LLM normalization is sufficient for 10,000 records with decades of inconsistency or if they need more rigorous entity resolution (e.g., Dedupe.io).
- Step 5 - Database: Stack chosen: Supabase (PostgreSQL + pgvector) with NocoDB on top. Three options were evaluated: Airtable (easiest to start but limited at scale), direct PostgreSQL (most control but slower iteration), and Supabase + NocoDB (chosen as the middle ground).
The goal is to get a real consolidated picture in 30-60 days, avoiding the previous failed attempts at organization.
📖 Read the full source: r/ClaudeAI
👀 See Also

Building a Persistent Personal OS for Claude: Psychology Profile, Goals, and Live Context Injection via Notion + Shortcuts
A developer built a persistent Personal OS in Notion that injects a compressed 800-word psychological profile, goals, relationships, and live context (location, time, calendar, weather) into every Claude API call via iOS Shortcuts, with a nightly debrief loop to keep context fresh.

Using OpenClaw on Raspberry Pi as an AI hardware lab for device management
A developer runs OpenClaw on a dedicated Raspberry Pi to manage hardware devices through Discord, handling firmware flashing, troubleshooting, and system operations via subagents with guardrails like backups and rollback paths.

Claude AI Used to Set Up Proxmox Home Server via SSH
A developer used Claude AI over SSH to configure a Proxmox VE 9.1 home server, performing tasks from drive formatting and ZFS pool creation to Docker deployment and security hardening.

OpenClaw workflow automates meeting follow-ups, replaces Granola for user
A user replaced their $14/month Granola subscription with an OpenClaw workflow that transcribes meetings via STT, generates summaries on WhatsApp, breaks out action items, and creates draft follow-up emails automatically.