Local Book Translation Pipeline Uses Qwen 32B and Mistral 24B with Contextual RAG

A developer has created a fully local, automated book translation pipeline that converts PDF files to ePub format using eight Python scripts. The system addresses common translation issues like context loss and formatting problems through a multi-step workflow.
Workflow Details
The pipeline consists of eight scripts that handle the entire process:
- PDF Extraction: Uses Marker to extract content from PDFs while preserving formatting elements like bold text, chapters, and images
- Text Segmentation: Splits the extracted text into manageable chunks
- Context Creation: Before translation, sends excerpts from throughout the book to Qwen 32B to generate a "Super Bible" - a global glossary containing characters, tone, and atmosphere
- Translation: Qwen 32B translates each text segment while referencing the Super Bible to maintain consistency
- Style Editing: Mistral 24B acts as an editor, reviewing Qwen's translations and rewriting them for perfect literary style
- Assembly: A final script reassembles all translated segments, reinserts images, and uses Pandoc to output a polished ePub file
Automation Features
The system includes a monitoring script that watches a designated folder. Users simply drop a PDF into this folder, and the pipeline automatically processes it. After several hours, the system outputs both the translated ePub and a receipt showing processing time.
The developer notes the results are surprisingly effective, though not 100% perfect, and mentions having several improvement ideas. The entire system runs locally on a personal computer without requiring external services.
📖 Read the full source: r/LocalLLaMA
👀 See Also

Intuno: Open-Sourced Network for AI Agent Discovery and Communication
Intuno is an open-source network where AI agents register capabilities, discover each other via semantic search, and invoke functions with 3 lines of Python code. It includes MCP integration for use with Claude Desktop or Cursor.

Legal MCP Server for Claude Provides Access to 4M+ US Court Opinions
A free, open-source MCP server built with Claude Code gives Claude AI access to 4M+ real US court opinions, providing 18 tools for case law search, citation tracing, Bluebook parsing, Clio practice management, and PACER federal filings without hallucinations.

Open-sourced library of 59 Claude skills covers full website lifecycle
A developer released 59 reusable Claude skills covering brand discovery, design, content, SEO, development, ops, and growth — stack-agnostic, with uniform structure and CI lint validation.

Claude Code Lazy-Loads Tool Schemas via ToolSearch to Save Tokens
Claude Code defers tool schema loading by sending only tool names upfront and requiring a ToolSearch call to fetch schemas before use. This architecture cuts token burn significantly.