How to Redact PDFs by Line, Paragraph, or Pattern

The Problem

Most PDF libraries see documents as glyph soup - individual characters positioned at x,y coordinates. Finding "the line that starts with SSN:" means reconstructing text from scattered glyphs, guessing word boundaries, and hoping the layout algorithm got it right.

That's why redaction is so hard. You can't select what you can't find.

Structured Document Model

  • PDFDancer parses PDFs into words, lines, and paragraphs - not character coordinates
  • Select content the way you think about it: "the line starting with SSN:" or "all lines matching this pattern"
  • True redaction - content is permanently removed from the PDF, not just covered with a black box
Source PDF
Saved PDF
cursor

PATIENT INTAKE FORM

Westside Medical Center | Form MR-2025-INT

Patient Information

Name:Sarah Johnson

Date of Birth:03/15/1985

SSN:482-55-7891

Phone:(555) 867-5309

Email:sarah.johnson@email.com

Address:1847 Oak Avenue, Portland, OR 97201

Emergency Contact

Name:Michael Johnson

Relationship:Spouse

Phone:(555) 234-5678

Medical Information

Primary Diagnosis:Type 2 Diabetes

Medications:Metformin 500mg (twice daily)

Allergies:Penicillin, Sulfa drugs

Blood Type:O+

Insurance Information

Provider:BlueCross BlueShield

Policy Number:BC-449281-PPO

Group Number:GRP-78452

Consent: I authorize Westside Medical Center to use and disclose my health information for treatment, payment, and healthcare operations. I understand that I may revoke this authorization at any time by submitting a written request.

Sarah Johnson

Patient Signature

01/15/2025

Date

CONFIDENTIAL - Protected Health Information (PHI) under HIPAA

API Calls

Full Code

Works instantly in guest mode - no API key required.

Why It Works

  • Findable content: Select by line, paragraph, or regex - work with the document, not the PDF
  • Permanent removal: Content is deleted from the PDF, not covered with a black box
  • Batch processing: Apply the same selection logic across thousands of documents
  • Compliance-ready: HIPAA, GDPR, PCI-DSS - audit-safe output