How to Analyze 10,000 Property Titles with AI: A Case Study

Project Context and Problem

A Brazilian real estate company inherited approximately 10,000 property titles across 10+ municipalities with decades of poor management. The data includes hundreds of unregistered "drawer contracts" (informal sales never filed), duplicate sales of the same properties, fraudulent contracts, forged powers of attorney, irregular occupations, and approximately 500 active lawsuits including adverse possession claims, compulsory adjudication, evictions, duplicate sale disputes, and 2 class action suits. The physical document archive is partially held by police as part of an old investigation.

Technical Approach

The team (6 lawyers + 3 operators) decided against building infrastructure upfront, opting instead for a discovery-first approach with AI assistance. The plan involves five steps:

Step 1 - Physical scanning: Documents organized by municipality, scanned in batches with naming convention: [municipality]_[document-type]_[sequence] using a document scanner with ADF (automatic document feeder).
Step 2 - OCR: Considering Google Document AI, Mistral OCR 3, AWS Textract, or other tools. The team is asking for feedback on tools specifically tested on degraded Latin American registry documents.
Step 3 - Discovery: Feeding OCR output directly into AI tools with large context windows for open-ended analysis before database setup. Using Gemini 3.1 Pro (in NotebookLM or other interface) for broad batch analysis with prompts like "which lots appear linked to more than one buyer?", "flag contracts with incoherent dates", "identify clusters of suspicious names or activity", and "help us see problems and solutions for what we aren't seeing". Running Claude Projects in parallel for similar analysis.
Step 4 - Data cleaning and standardization: Normalizing raw extracted data before database insertion. Addressing municipality names written multiple ways ("B. Vista", "Bela Vista de GO", "Bela V. Goiás") to canonical form, standardizing CPFs (Brazilian personal ID numbers) with and without punctuation, fixing inconsistent lot status descriptions to enum categories, and fuzzy matching buyer names with spelling variations. Tools: Python + rapidfuzz for fuzzy matching, Claude API for normalizing free-text fields into categories. The team is asking whether fuzzy matching + LLM normalization is sufficient for 10,000 records with decades of inconsistency or if they need more rigorous entity resolution (e.g., Dedupe.io).
Step 5 - Database: Stack chosen: Supabase (PostgreSQL + pgvector) with NocoDB on top. Three options were evaluated: Airtable (easiest to start but limited at scale), direct PostgreSQL (most control but slower iteration), and Supabase + NocoDB (chosen as the middle ground).

The goal is to get a real consolidated picture in 30-60 days, avoiding the previous failed attempts at organization.

📖 Read the full source: r/ClaudeAI

Using AI to Untangle 10,000 Brazilian Property Titles: A Technical Case Study

Project Context and Problem

Technical Approach

👀 See Also

Building a Persistent Personal OS for Claude: Psychology Profile, Goals, and Live Context Injection via Notion + Shortcuts

Using OpenClaw on Raspberry Pi as an AI hardware lab for device management

Claude AI Used to Set Up Proxmox Home Server via SSH

OpenClaw workflow automates meeting follow-ups, replaces Granola for user