LLM Workflow for Localizing 4,500 UI Keys in Large Codebases

A developer documented their process for localizing a large game project with approximately 4,500 UI keys stored in a 500KB en-US.json file. They used a multi-step LLM workflow to handle extraction, translation, and quality improvement.

Initial Extraction and Translation Attempts

First, they used Claude to scan their codebase, extract hardcoded UI strings, and migrate them to i18n standards, creating the locale file. For translation to Italian, they initially tried Claude and Gemini Pro (via Gemini CLI and Antigravity). Both cloud models produced unacceptable quality translations. Gemini Pro also encountered errors with the large file, requiring it to be split into 10 smaller chunks.

Shifting to Local Models and the Context Breakthrough

They then tried TranslateGemma locally via LM Studio, translating key-by-key. While slightly better, the quality was still not acceptable. The key insight was that UI words are often ambiguous, and translation requires disambiguation and usage context.

To solve this, they went back to Claude to generate a second file. For each of the 4,500 keys, Claude inspected the code usage to provide context: where the string appears, its function (button label, description, input hint), and its effect in gameplay.

The Final Translation Pipeline

They built an automated translation pipeline with the following steps:

Batch keys together with their generated context.
Use a prompt focused on functional (not literal) translation.
Enforce placeholder and tag preservation.
Send requests to a local model through LM Studio.

TranslateGemma couldn't handle the context-heavy prompt format, so they switched models. They tested on an M1 Mac Mini with 16GB unified memory.

Model Performance and Results

Qwen 3 4B performed well, but Qwen 3 8B was the sweet spot, producing fewer grammar mistakes and better phrasing while remaining manageable to run locally. The final pipeline can translate the 4,500+ keys into multiple languages, taking roughly 8 hours per locale on their machine. They use a quantized model so they can continue working while it runs in the background.

The developer notes this approach produced quality good enough to ship and felt better than many auto-translated projects they've seen.

📖 Read the full source: r/LocalLLaMA