Developer API
Reference
Everything you need to integrate text redaction, document redaction, file conversion, and OCR into your pipeline. All endpoints are REST-based, JSON-native, and authenticated via API key.
Access & Billing
Authentication
Every request requires a Bearer token in the Authorization header. Keys are scoped to your account and can be rotated from the dashboard at any time.
Credits & Billing
The API runs on a credit system. Each call consumes credits based on the operation type and input size. Credits never expire and roll over month to month. Purchase additional credits anytime from the dashboard.
Plain Text Redaction
Text3 ENDPOINTSFeed raw text. Choose entity types — or let the engine auto-detect every sensitive token. Output: redacted, blackout, hashed, or synthetically replaced. Supports 50+ entity types including PII, PHI, financial identifiers, and custom regex patterns.
Analyzes the input text for sensitive entities and returns the redacted version. Specify target entity types or let auto-detection run the full 50+ type sweep. Supports three output modes: blackout (███), hash (SHA-256 prefix), or synthetic replacement (realistic fake data).
curl -X POST https://api.yourdomain.com/v1/redact/text \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_KEY" \-d '{"text": "John Smith, SSN 123-45-6789, reachable at john@example.com","entities": ["NAME", "SSN", "EMAIL"],"mode": "synthetic","language": "en"}'
{"id": "req_01HXYZ123ABC","status": "success","redacted_text": "Alex Johnson, SSN 987-65-4321, reachable at alex@random.net","entities_found": [{ "type": "NAME", "original": "John Smith", "start": 0, "end": 10 },{ "type": "SSN", "original": "123-45-6789", "start": 17, "end": 28 },{ "type": "EMAIL", "original": "john@example.com", "start": 43, "end": 59 }],"mode": "synthetic","credits_used": 1,"processing_ms": 48}
Submit up to 1,000 text items in a single request. Each item is processed independently with optional per-item overrides. Responses are returned in the same order as submitted. Ideal for bulk record scrubbing pipelines.
curl -X POST https://api.yourdomain.com/v1/redact/text/batch \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_KEY" \-d '{"default_mode": "blackout","default_entities": ["EMAIL", "PHONE", "NAME"],"items": [{ "text": "Contact Alice at alice@corp.com" },{ "text": "Bob mobile: +1-555-867-5309", "mode": "hash" },{ "text": "Patient ID 9182736 — DOB 1984-03-15", "entities": ["PATIENT_ID","DOB"] }]}'
{"batch_id": "batch_02HABC456DEF","status": "success","total": 2,"results": [{ "index": 0, "redacted_text": "Contact ███████ at ████████████", "entities_found": 2, "credits_used": 1 },{ "index": 1, "redacted_text": "Bob mobile: a3f9c1d2...", "entities_found": 1, "credits_used": 1 }],"total_credits_used": 2,"processing_ms": 112}
Returns the full catalog of entity types the redaction engine can detect — including code, display name, category (PII / PHI / FINANCIAL), and regional scope.
curl -X GET https://api.yourdomain.com/v1/redact/entities \-H "Authorization: Bearer YOUR_API_KEY"
{"entities": [{ "code": "NAME", "label": "Person Name", "category": "PII" },{ "code": "EMAIL", "label": "Email Address", "category": "PII" },{ "code": "PHONE", "label": "Phone Number", "category": "PII" },{ "code": "SSN", "label": "Social Security No", "category": "PII_US" },{ "code": "CREDIT_CARD", "label": "Credit Card Number", "category": "FINANCIAL" },{ "code": "IBAN", "label": "Bank Account IBAN", "category": "FINANCIAL" },{ "code": "PATIENT_ID", "label": "Patient Identifier", "category": "PHI" },{ "code": "DOB", "label": "Date of Birth", "category": "PII" }],"total": 54}
Document Redaction
Document2 ENDPOINTSUpload a PDF, DOCX, or image. Receive a layout-preserving, pixel-perfect redacted clone with classified data replaced by solid black bounding boxes. An immutable audit trail is attached to every job. Supports multi-page documents and mixed-format archives.
Upload a PDF, DOCX, PNG, or JPG via multipart/form-data. The engine parses the document, runs entity detection across all text layers (including embedded OCR for scanned pages), draws bounding-box redactions, and returns the redacted file plus a signed audit manifest.
curl -X POST https://api.yourdomain.com/v1/redact/document \-H "Authorization: Bearer YOUR_API_KEY" \-F "file=@/path/to/contract.pdf" \-F "entities=[\"NAME\",\"SSN\",\"CREDIT_CARD\"]" \-F "audit_trail=true" \-F "output_format=pdf"
{"job_id": "job_03HDEF789GHI","status": "complete","input": { "filename": "contract.pdf", "pages": 8, "size_bytes": 412304 },"output": {"download_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/redacted.pdf","expires_at": "2025-01-15T18:00:00Z","size_bytes": 398211},"entities_redacted": 23,"audit_trail_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/audit.json","credits_used": 8,"processing_ms": 1840}
Retrieve the current status and result of a previously submitted document redaction job. Large documents (50+ pages) are processed asynchronously — poll this endpoint or register a webhook_url on submission to be notified on completion.
curl -X GET https://api.yourdomain.com/v1/redact/document/job_03HDEF789GHI \-H "Authorization: Bearer YOUR_API_KEY"
{"job_id": "job_03HDEF789GHI","status": "complete","progress_pct": 100,"output": {"download_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/redacted.pdf","expires_at": "2025-01-15T18:00:00Z"},"credits_used": 8}
File Conversion
Convert2 ENDPOINTSPDF ↔ DOCX ↔ HTML ↔ Markdown ↔ plain text. Every conversion preserves heading hierarchy, table structure, hyperlinks, footnotes, and embedded assets. Metadata is stripped before output. Image-to-document via OCR. Batch-convert ZIP archives in a single call.
Upload a file and specify the target format. The engine parses the source, maps it to the target schema, strips metadata, and returns a download URL. Supported routes: PDF↔DOCX, PDF↔HTML, PDF↔MD, PDF↔TXT, DOCX↔HTML, DOCX↔MD, HTML↔MD, IMG→PDF, IMG→DOCX.
curl -X POST https://api.yourdomain.com/v1/convert \-H "Authorization: Bearer YOUR_API_KEY" \-F "file=@/path/to/report.docx" \-F "to=pdf" \-F "strip_metadata=true" \-F "pdf_compliance=pdf-a-1b"
{"job_id": "conv_04HJKL012MNO","status": "complete","from": "docx","to": "pdf","output": {"download_url": "https://cdn.yourdomain.com/conv/conv_04HJKL012MNO/report.pdf","expires_at": "2025-01-15T18:00:00Z","size_bytes": 187432,"pages": 4,"pdf_compliance": "pdf-a-1b"},"metadata_stripped": true,"credits_used": 1,"processing_ms": 312}
Submit a ZIP archive containing multiple files and a target format. The engine processes all files concurrently and returns a consolidated ZIP of converted outputs. Stream per-file progress via SSE at the returned sse_url, or receive a webhook callback on completion.
curl -X POST https://api.yourdomain.com/v1/convert/batch \-H "Authorization: Bearer YOUR_API_KEY" \-F "archive=@/path/to/documents.zip" \-F "to=pdf" \-F "strip_metadata=true" \-F "webhook_url=https://yourserver.com/webhooks/conversion"
{"batch_id": "batch_05HPQR345STU","status": "processing","total_files": 47,"completed": 0,"failed": 0,"estimated_ms": 14200,"poll_url": "https://api.yourdomain.com/v1/convert/batch/batch_05HPQR345STU","sse_url": "https://api.yourdomain.com/v1/convert/batch/batch_05HPQR345STU/stream"}
OCR Readable PDFs
OCR2 ENDPOINTSTransform scanned, image-heavy, or locked PDFs into fully searchable, indexable, machine-readable documents. Multi-language OCR across 40+ languages, handwriting recognition, and per-block confidence scoring. Outputs a searchable PDF with embedded text layer or a structured JSON AST.
Upload a scanned PDF or image. The engine renders each page, runs the OCR pass, reconstructs reading order, detects headings and tables, and returns a searchable PDF with an embedded text layer — or a structured JSON document if output_format is set to "json".
curl -X POST https://api.yourdomain.com/v1/ocr \-H "Authorization: Bearer YOUR_API_KEY" \-F "file=@/path/to/scanned_report.pdf" \-F "language=en" \-F "output_format=pdf" \-F "handwriting=false" \-F "confidence_threshold=0.85"
{"job_id": "ocr_06HVWX678YZA","status": "complete","input": { "filename": "scanned_report.pdf", "pages": 12, "size_bytes": 8341200, "type": "scanned" },"output": {"download_url": "https://cdn.yourdomain.com/ocr/ocr_06HVWX678YZA/searchable.pdf","expires_at": "2025-01-15T18:00:00Z","size_bytes": 9102440},"ocr": {"language_detected": "en","confidence_avg": 0.962,"confidence_min": 0.831,"low_confidence_pages": [7],"words_extracted": 4821,"handwriting_detected": false},"credits_used": 12,"processing_ms": 6840}
Runs the full OCR pass and returns a structured JSON abstract syntax tree. Nodes are typed (heading, paragraph, table, list) with bounding-box coordinates, confidence scores, and reading-order indices. Ideal for feeding into CMS pipelines, vector stores, or LLM pre-processors.
curl -X POST https://api.yourdomain.com/v1/ocr/extract \-H "Authorization: Bearer YOUR_API_KEY" \-F "file=@/path/to/invoice.png" \-F "language=en" \-F "include_bbox=true"
{"job_id": "ocr_07HBCD901EFG","status": "complete","page_count": 1,"nodes": [{"type": "heading", "level": 1,"text": "Invoice #INV-2024-0042","page": 1, "order": 0,"bbox": { "x": 72, "y": 88, "w": 480, "h": 32 },"confidence": 0.991},{"type": "table","page": 1, "order": 4,"rows": 4, "cols": 3,"bbox": { "x": 72, "y": 200, "w": 468, "h": 120 },"confidence": 0.974}],"credits_used": 2,"processing_ms": 920}