REST API · v1

Developer API
Reference

Everything you need to integrate text redaction, document redaction, file conversion, and OCR into your pipeline. All endpoints are REST-based, JSON-native, and authenticated via API key.

Base URL

https://api.yourdomain.com

Version:v1

Format:REST · JSON

Auth:Bearer Token

◈Text3 EP

Plain Text Redaction

⌖Document2 EP

Document Redaction

▣Convert2 EP

File Conversion

⟁OCR2 EP

OCR Readable PDFs

Authentication & Credits

Access & Billing

⬡

Authentication

Every request requires a Bearer token in the Authorization header. Keys are scoped to your account and can be rotated from the dashboard at any time.

Authorization:Bearer ••••••••••••••••••••••••••••••

01Sign up at dashboard.yourdomain.com

02Navigate to Settings → API Keys

03Click "Generate New Key" and copy it immediately

04Pass it as Bearer token in every request header

◈

Credits & Billing

The API runs on a credit system. Each call consumes credits based on the operation type and input size. Credits never expire and roll over month to month. Purchase additional credits anytime from the dashboard.

OperationUnitCredits

Text RedactionPer request1

Document RedactionPer page1

File ConversionPer file1

OCR — PDFPer page1

OCR — ImagePer file2

Batch jobsPer item1

Free

500 / mo

Starter

10k / mo

Pro

100k / mo

Enterprise

Unlimited

HTTP Status Codes

200Success

400Bad Request

401Invalid API Key

402Insufficient Credits

413File Too Large

422Unsupported Format

429Rate Limited

500Server Error

◈

Plain Text Redaction

Text3 ENDPOINTS

Feed raw text. Choose entity types — or let the engine auto-detect every sensitive token. Output: redacted, blackout, hashed, or synthetically replaced. Supports 50+ entity types including PII, PHI, financial identifiers, and custom regex patterns.

Analyzes the input text for sensitive entities and returns the redacted version. Specify target entity types or let auto-detection run the full 50+ type sweep. Supports three output modes: blackout (███), hash (SHA-256 prefix), or synthetic replacement (realistic fake data).

Request Parameters

ParameterTypeRequiredDescription

Param: text

Type: string

Required: Yes

Desc: The raw text content to redact.

Param: entities

Type: string[]

Required: No

Desc: Array of entity types to target. Omit to auto-detect all. e.g. ["EMAIL","SSN","CREDIT_CARD"]

Param: mode

Type: enum

Required: No

Desc: Redaction mode. One of: "blackout" | "hash" | "synthetic". Defaults to "blackout".

Param: language

Type: string

Required: No

Desc: ISO 639-1 language code for the input text. Defaults to "en".

curl -X POST https://api.yourdomain.com/v1/redact/text \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "text": "John Smith, SSN 123-45-6789, reachable at john@example.com",
    "entities": ["NAME", "SSN", "EMAIL"],
    "mode": "synthetic",
    "language": "en"
  }'

200 Response

{
  "id": "req_01HXYZ123ABC",
  "status": "success",
  "redacted_text": "Alex Johnson, SSN 987-65-4321, reachable at alex@random.net",
  "entities_found": [
    { "type": "NAME",  "original": "John Smith",       "start": 0,  "end": 10 },
    { "type": "SSN",   "original": "123-45-6789",      "start": 17, "end": 28 },
    { "type": "EMAIL", "original": "john@example.com", "start": 43, "end": 59 }
  ],
  "mode": "synthetic",
  "credits_used": 1,
  "processing_ms": 48
}

Submit up to 1,000 text items in a single request. Each item is processed independently with optional per-item overrides. Responses are returned in the same order as submitted. Ideal for bulk record scrubbing pipelines.

Request Parameters

ParameterTypeRequiredDescription

Param: items

Type: object[]

Required: Yes

Desc: Array of objects, each with a "text" field and optional per-item "entities" and "mode" overrides.

Param: default_mode

Type: enum

Required: No

Desc: Default redaction mode applied to all items. One of: "blackout" | "hash" | "synthetic".

Param: default_entities

Type: string[]

Required: No

Desc: Default entity list applied to all items unless overridden per-item.

curl -X POST https://api.yourdomain.com/v1/redact/text/batch \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "default_mode": "blackout",
    "default_entities": ["EMAIL", "PHONE", "NAME"],
    "items": [
      { "text": "Contact Alice at alice@corp.com" },
      { "text": "Bob mobile: +1-555-867-5309", "mode": "hash" },
      { "text": "Patient ID 9182736 — DOB 1984-03-15", "entities": ["PATIENT_ID","DOB"] }
    ]
  }'

200 Response

{
  "batch_id": "batch_02HABC456DEF",
  "status": "success",
  "total": 2,
  "results": [
    { "index": 0, "redacted_text": "Contact ███████ at ████████████", "entities_found": 2, "credits_used": 1 },
    { "index": 1, "redacted_text": "Bob mobile: a3f9c1d2...",         "entities_found": 1, "credits_used": 1 }
  ],
  "total_credits_used": 2,
  "processing_ms": 112
}

Returns the full catalog of entity types the redaction engine can detect — including code, display name, category (PII / PHI / FINANCIAL), and regional scope.

curl -X GET https://api.yourdomain.com/v1/redact/entities \
  -H "Authorization: Bearer YOUR_API_KEY"

200 Response

{
  "entities": [
    { "code": "NAME",        "label": "Person Name",        "category": "PII"       },
    { "code": "EMAIL",       "label": "Email Address",      "category": "PII"       },
    { "code": "PHONE",       "label": "Phone Number",       "category": "PII"       },
    { "code": "SSN",         "label": "Social Security No", "category": "PII_US"    },
    { "code": "CREDIT_CARD", "label": "Credit Card Number", "category": "FINANCIAL" },
    { "code": "IBAN",        "label": "Bank Account IBAN",  "category": "FINANCIAL" },
    { "code": "PATIENT_ID",  "label": "Patient Identifier", "category": "PHI"       },
    { "code": "DOB",         "label": "Date of Birth",      "category": "PII"       }
  ],
  "total": 54
}

⌖

Document Redaction

Document2 ENDPOINTS

Upload a PDF, DOCX, or image. Receive a layout-preserving, pixel-perfect redacted clone with classified data replaced by solid black bounding boxes. An immutable audit trail is attached to every job. Supports multi-page documents and mixed-format archives.

Upload a PDF, DOCX, PNG, or JPG via multipart/form-data. The engine parses the document, runs entity detection across all text layers (including embedded OCR for scanned pages), draws bounding-box redactions, and returns the redacted file plus a signed audit manifest.

Request Parameters

ParameterTypeRequiredDescription

Param: file

Type: file

Required: Yes

Desc: The document to redact. Accepted MIME types: application/pdf, application/vnd.openxmlformats-officedocument.wordprocessingml.document, image/png, image/jpeg.

Param: entities

Type: string[]

Required: No

Desc: Entity types to target. Omit for full auto-detection.

Param: audit_trail

Type: boolean

Required: No

Desc: Attach a signed JSON audit manifest to the response. Defaults to true.

Param: output_format

Type: enum

Required: No

Desc: Output format override. Defaults to matching the input format. Options: "pdf" | "docx".

curl -X POST https://api.yourdomain.com/v1/redact/document \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/contract.pdf" \
  -F "entities=[\"NAME\",\"SSN\",\"CREDIT_CARD\"]" \
  -F "audit_trail=true" \
  -F "output_format=pdf"

200 Response

{
  "job_id": "job_03HDEF789GHI",
  "status": "complete",
  "input":  { "filename": "contract.pdf", "pages": 8, "size_bytes": 412304 },
  "output": {
    "download_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/redacted.pdf",
    "expires_at": "2025-01-15T18:00:00Z",
    "size_bytes": 398211
  },
  "entities_redacted": 23,
  "audit_trail_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/audit.json",
  "credits_used": 8,
  "processing_ms": 1840
}

Retrieve the current status and result of a previously submitted document redaction job. Large documents (50+ pages) are processed asynchronously — poll this endpoint or register a webhook_url on submission to be notified on completion.

Request Parameters

ParameterTypeRequiredDescription

Param: job_id

Type: string

Required: Yes

Desc: The job ID returned by the POST /v1/redact/document endpoint (path parameter).

curl -X GET https://api.yourdomain.com/v1/redact/document/job_03HDEF789GHI \
  -H "Authorization: Bearer YOUR_API_KEY"

200 Response

{
  "job_id": "job_03HDEF789GHI",
  "status": "complete",
  "progress_pct": 100,
  "output": {
    "download_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/redacted.pdf",
    "expires_at": "2025-01-15T18:00:00Z"
  },
  "credits_used": 8
}

▣

File Conversion

Convert2 ENDPOINTS

PDF ↔ DOCX ↔ HTML ↔ Markdown ↔ plain text. Every conversion preserves heading hierarchy, table structure, hyperlinks, footnotes, and embedded assets. Metadata is stripped before output. Image-to-document via OCR. Batch-convert ZIP archives in a single call.

Upload a file and specify the target format. The engine parses the source, maps it to the target schema, strips metadata, and returns a download URL. Supported routes: PDF↔DOCX, PDF↔HTML, PDF↔MD, PDF↔TXT, DOCX↔HTML, DOCX↔MD, HTML↔MD, IMG→PDF, IMG→DOCX.

Request Parameters

ParameterTypeRequiredDescription

Param: file

Type: file

Required: Yes

Desc: The source file. Accepted: PDF, DOCX, HTML, MD, TXT, PNG, JPG, WEBP.

Param: to

Type: enum

Required: Yes

Desc: Target format. One of: "pdf" | "docx" | "html" | "md" | "txt".

Param: strip_metadata

Type: boolean

Required: No

Desc: Remove author, revision history, XMP/EXIF, and custom properties from output. Defaults to true.

Param: pdf_compliance

Type: enum

Required: No

Desc: PDF output standard. One of: "pdf-a-1b" | "pdf-a-2b" | "pdf-1.7". Defaults to "pdf-a-1b".

curl -X POST https://api.yourdomain.com/v1/convert \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/report.docx" \
  -F "to=pdf" \
  -F "strip_metadata=true" \
  -F "pdf_compliance=pdf-a-1b"

200 Response

{
  "job_id": "conv_04HJKL012MNO",
  "status": "complete",
  "from": "docx",
  "to":   "pdf",
  "output": {
    "download_url": "https://cdn.yourdomain.com/conv/conv_04HJKL012MNO/report.pdf",
    "expires_at": "2025-01-15T18:00:00Z",
    "size_bytes": 187432,
    "pages": 4,
    "pdf_compliance": "pdf-a-1b"
  },
  "metadata_stripped": true,
  "credits_used": 1,
  "processing_ms": 312
}

Submit a ZIP archive containing multiple files and a target format. The engine processes all files concurrently and returns a consolidated ZIP of converted outputs. Stream per-file progress via SSE at the returned sse_url, or receive a webhook callback on completion.

Request Parameters

ParameterTypeRequiredDescription

Param: archive

Type: file

Required: Yes

Desc: A ZIP archive containing the files to convert. Max 500 files per archive, max 500 MB total.

Param: to

Type: enum

Required: Yes

Desc: Target format for all files. One of: "pdf" | "docx" | "html" | "md" | "txt".

Param: strip_metadata

Type: boolean

Required: No

Desc: Strip metadata from all outputs. Defaults to true.

Param: webhook_url

Type: string

Required: No

Desc: URL to POST to when the batch job completes.

curl -X POST https://api.yourdomain.com/v1/convert/batch \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "archive=@/path/to/documents.zip" \
  -F "to=pdf" \
  -F "strip_metadata=true" \
  -F "webhook_url=https://yourserver.com/webhooks/conversion"

200 Response

{
  "batch_id": "batch_05HPQR345STU",
  "status": "processing",
  "total_files": 47,
  "completed": 0,
  "failed": 0,
  "estimated_ms": 14200,
  "poll_url": "https://api.yourdomain.com/v1/convert/batch/batch_05HPQR345STU",
  "sse_url":  "https://api.yourdomain.com/v1/convert/batch/batch_05HPQR345STU/stream"
}

⟁

OCR Readable PDFs

OCR2 ENDPOINTS

Transform scanned, image-heavy, or locked PDFs into fully searchable, indexable, machine-readable documents. Multi-language OCR across 40+ languages, handwriting recognition, and per-block confidence scoring. Outputs a searchable PDF with embedded text layer or a structured JSON AST.

Upload a scanned PDF or image. The engine renders each page, runs the OCR pass, reconstructs reading order, detects headings and tables, and returns a searchable PDF with an embedded text layer — or a structured JSON document if output_format is set to "json".

Request Parameters

ParameterTypeRequiredDescription

Param: file

Type: file

Required: Yes

Desc: Source file. Accepted: PDF (scanned or image-based), PNG, JPG, WEBP, TIFF.

Param: language

Type: string

Required: No

Desc: Primary language as ISO 639-1 code. Defaults to "en". For multilingual docs: "en,fr,de".

Param: output_format

Type: enum

Required: No

Desc: One of: "pdf" (searchable PDF with text layer) | "json" (structured AST). Defaults to "pdf".

Param: handwriting

Type: boolean

Required: No

Desc: Enable handwriting recognition model. Increases processing time ~30%. Defaults to false.

Param: confidence_threshold

Type: number

Required: No

Desc: Minimum OCR confidence (0.0–1.0). Blocks below this are flagged in the response. Defaults to 0.85.

curl -X POST https://api.yourdomain.com/v1/ocr \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/scanned_report.pdf" \
  -F "language=en" \
  -F "output_format=pdf" \
  -F "handwriting=false" \
  -F "confidence_threshold=0.85"

200 Response

{
  "job_id": "ocr_06HVWX678YZA",
  "status": "complete",
  "input": { "filename": "scanned_report.pdf", "pages": 12, "size_bytes": 8341200, "type": "scanned" },
  "output": {
    "download_url": "https://cdn.yourdomain.com/ocr/ocr_06HVWX678YZA/searchable.pdf",
    "expires_at": "2025-01-15T18:00:00Z",
    "size_bytes": 9102440
  },
  "ocr": {
    "language_detected": "en",
    "confidence_avg": 0.962,
    "confidence_min": 0.831,
    "low_confidence_pages": [7],
    "words_extracted": 4821,
    "handwriting_detected": false
  },
  "credits_used": 12,
  "processing_ms": 6840
}

Runs the full OCR pass and returns a structured JSON abstract syntax tree. Nodes are typed (heading, paragraph, table, list) with bounding-box coordinates, confidence scores, and reading-order indices. Ideal for feeding into CMS pipelines, vector stores, or LLM pre-processors.

Request Parameters

ParameterTypeRequiredDescription

Param: file

Type: file

Required: Yes

Desc: Source file. Accepted: PDF, PNG, JPG, WEBP, TIFF.

Param: language

Type: string

Required: No

Desc: Primary language as ISO 639-1 code. Defaults to "en".

Param: include_bbox

Type: boolean

Required: No

Desc: Include bounding-box coordinates for each node. Defaults to true.

Param: handwriting

Type: boolean

Required: No

Desc: Enable handwriting recognition. Defaults to false.

curl -X POST https://api.yourdomain.com/v1/ocr/extract \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/invoice.png" \
  -F "language=en" \
  -F "include_bbox=true"

200 Response

{
  "job_id": "ocr_07HBCD901EFG",
  "status": "complete",
  "page_count": 1,
  "nodes": [
    {
      "type": "heading", "level": 1,
      "text": "Invoice #INV-2024-0042",
      "page": 1, "order": 0,
      "bbox": { "x": 72, "y": 88, "w": 480, "h": 32 },
      "confidence": 0.991
    },
    {
      "type": "table",
      "page": 1, "order": 4,
      "rows": 4, "cols": 3,
      "bbox": { "x": 72, "y": 200, "w": 468, "h": 120 },
      "confidence": 0.974
    }
  ],
  "credits_used": 2,
  "processing_ms": 920
}

Developer APIReference

Access & Billing

Authentication

Credits & Billing

Plain Text Redaction

Document Redaction

File Conversion

OCR Readable PDFs

Developer API
Reference