How to Translate PDFs Without Losing Formatting

The Problem

Copy-pasting PDF text into translation tools destroys your formatting. Tables become jumbled text. Diagrams lose their labels. Headers and footers scatter across the page.

Then there's the font problem. German text is 30% longer than English. Japanese needs completely different fonts. Arabic reads right-to-left. Your carefully designed manual becomes unreadable.

The result? Hours of manual PDF reconstruction for every document, every language.

The Solution

  • Extract with context: PDFDancer reads text while preserving position, font, and paragraph structure
  • Translate with any API: Send to OpenAI, DeepL, Google Translate, or your own models
  • Rebuild intelligently: Text flows back with automatic font substitution, reflow for longer text, and RTL support
  • Preserve layout: Tables, diagrams, headers, and page structure stay intact

Full Code Example

Works with any translation API. Example uses DeepL.

Why It Works

  • Structured document model: Work with paragraphs and sections, not character coordinates
  • Font intelligence: Automatic fallback to Noto Sans for non-Latin scripts, or specify your own fonts
  • Text reflow: Handles expansion (German) and contraction (Chinese) without breaking layout
  • RTL support: Arabic, Hebrew, and other RTL languages render correctly
  • Batch processing: Process thousands of documents with the same code
Technical Details for Developers

Font Fallback Configuration

PDFDancer automatically selects appropriate fonts for target languages. For Japanese, it uses Noto Sans JP. For Arabic, Noto Sans Arabic. You can override with your own font files.

Handling Text Expansion

German text averages 30% longer than English. PDFDancer handles this by: slightly reducing font size when needed, adjusting line spacing, and reflowing paragraphs within their bounding boxes.

RTL Language Support

For Arabic and Hebrew, PDFDancer automatically reverses text direction while preserving left-to-right elements like numbers and URLs.

Batch Processing Pattern

Use Promise.all() or asyncio.gather() to process multiple documents in parallel. PDFDancer sessions are independent and thread-safe.

Start Using PDFDancer Today

Get started in seconds with our free tier. No credit card required.