Skip to main content
For Clinical Trials & Life Sciences

Compliance-Grade Redaction for Clinical Documents

EMA Policy 0070 requires public disclosure of clinical data with all subject identifiers removed. PDFDancer detects PII across CSRs, protocols, and TMF documents — and permanently deletes it. Not overlays. True content removal, with an audit trail for every finding.

EMA Wants Your Data Public. Every Identifier Must Be Gone.

Policy 0070 requires sponsors to publish clinical data after marketing authorization — with all personal data removed. A single CSR can be 1,000+ pages. Subject identifiers, investigator names, and site addresses are scattered across headers, footers, tables, and appendices. Miss one, and the submission is rejected.

Most redaction tools draw black boxes over text. The data is still in the PDF — any reader with a text selection tool can extract it. Regulators know this. Overlay redaction is not compliant.

Overlay Redaction (Most Tools)

  • Black rectangle drawn on top of text
  • Original text remains in the PDF binary
  • Extractable with copy-paste, Acrobat, or any PDF parser
  • Fails regulatory inspection if auditors check the raw file

True Redaction (PDFDancer)

  • Original text is permanently deleted from the PDF
  • Nothing to extract — the content no longer exists
  • Replacement text ("[REDACTED]") inherits the original formatting
  • Audit trail logs every finding: entity type, confidence, page, action taken

What the Engine Detects in Clinical Documents

Subject names
Investigator names
Site addresses & identifiers
Dates of birth
Device identifiers
Randomization codes
Sponsor identifiers
Phone & fax numbers
Email addresses
Geographic data

Selective Redaction — Keep Investigators, Remove Subjects

The key differentiator for clinical trials: you choose which entity types to redact and which to preserve. Investigator names stay visible for regulators. Subject identifiers are permanently removed.

From CSRs to TMFs

Selective Entity Redaction

Keep investigator names visible for regulatory reviewers while stripping subject identifiers, dates, and addresses. Configure which entity types to redact per document type, per submission.

TMF Batch Processing

Redact an entire Trial Master File export in one run — informed consent forms, adverse event reports, monitoring visit logs. Every finding logged with entity type, confidence, and page number.

Compliance & Security

EMA Policy 0070
FDA FOIA Standards
ICH E6(R2) GCP
GDPR
Self-host / VPC
Encryption in transit & at rest
Audit logs
GxP-ready deployment
Under the hoodSee Benchmarks & Compliance →Detection performance by entity type, certifications, pricing, and what we don't do yet.

Send Us a CSR. We'll Redact It.

No pitch deck. Send a representative Clinical Study Report or protocol and we'll run it through piiDetect() — so you can see what compliance-grade automated redaction looks like on your actual documents.