Skip to main content
For Legal Tech & Litigation Support

Automated PII Redaction for Legal Documents

PDFDancer detects sensitive data across your documents automatically. Redact with confidence scores, produce audit trails for court, and process thousands of discovery documents via API. SDKs for Python, Java, and TypeScript.

DocumentsContracts, filings, discovery
piiDetect()Find sensitive data
ReviewAttorney sign-off
RedactPermanent removal + audit log

Manual Redaction Doesn't Scale for Discovery

A single litigation matter can produce tens of thousands of documents. Each one needs to be reviewed for PII, privileged content, and confidential information before production. Manual review is slow, expensive, and error-prone — one missed SSN can mean sanctions.

The Limitations

  • Manual review doesn't scale — a paralegal can't scan 10,000 pages for SSNs
  • Black-box redaction tools cover text with overlays — the data is still in the PDF
  • No audit trail proving what was found, what was redacted, and what was reviewed
  • Most tools require a GUI — you can't integrate them into a document pipeline
  • Pattern-matching (regex) misses PII that doesn't follow a fixed format

What PDFDancer Changes

  • Automated PII detection — piiDetect() finds names, SSNs, dates, addresses, and more across every page
  • Confidence scores — auto-redact high confidence, flag low confidence for attorney review
  • True content removal — redacted text is permanently deleted, not covered with boxes
  • Full audit trail — every finding logged with entity type, confidence, and page location
  • SDKs for Python, Java, and TypeScript — integrate into any document pipeline

Entity Types Detected

Person names
Social Security numbers
Dates of birth
Addresses
Phone numbers
Email addresses
Account numbers
Driver's license numbers
Passport numbers
Financial data

Detect and Redact PII — in Any Language

Feed a discovery document in. Get redacted output with an audit trail. High-confidence findings are auto-redacted — edge cases are flagged for attorney review.

True content removal, not overlays. When PDFDancer redacts text, the original content is permanently deleted from the PDF. There's nothing to "uncover" with a PDF editor. This is the standard courts require — and the standard most tools fail to meet.

From Discovery to Compliance — Automated

Contract Redaction

Sharing contracts with third parties, auditors, or during due diligence? Strip party names, financial terms, and confidential clauses programmatically. True content removal — the redacted text isn't recoverable.

FOIA & Regulatory Compliance

Government agencies and regulated entities need to release documents with PII removed. PDFDancer detects and redacts sensitive data, then produces an audit trail proving what was removed and why.

Detect. Review. Redact.

1

Detect

Feed your documents to piiDetect(). The engine scans every page for names, dates, SSNs, addresses, account numbers, and more. Each finding comes with a confidence score and page location.

2

Review

High-confidence findings get auto-redacted. Low-confidence findings are flagged for human review. Your attorneys stay in control — the machine handles the tedious scanning.

3

Redact & Audit

Redacted content is permanently removed — not covered with black boxes. Every action is logged: entity type, confidence score, page number. The audit trail is ready for court.

Under the hoodSee Benchmarks & Compliance →Detection performance by entity type, certifications, pricing, and what we don't do yet.

Send Us a Document. We'll Redact It.

No pitch deck, no generic demo. Send a representative document from your workflow and we'll run it through piiDetect() — so you can see what automated redaction looks like on your actual documents.