REST API · v1

Developer API
Reference

Everything you need to integrate text redaction, document redaction, file conversion, and OCR into your pipeline. All endpoints are REST-based, JSON-native, and authenticated via API key.

Base URL
https://api.yourdomain.com
Version:v1
Format:REST · JSON
Auth:Bearer Token
Text3 EP
Plain Text Redaction
Document2 EP
Document Redaction
Convert2 EP
File Conversion
OCR2 EP
OCR Readable PDFs
Authentication & Credits

Access & Billing

Authentication

Every request requires a Bearer token in the Authorization header. Keys are scoped to your account and can be rotated from the dashboard at any time.

Authorization:Bearer ••••••••••••••••••••••••••••••
01Sign up at dashboard.yourdomain.com
02Navigate to Settings → API Keys
03Click "Generate New Key" and copy it immediately
04Pass it as Bearer token in every request header

Credits & Billing

The API runs on a credit system. Each call consumes credits based on the operation type and input size. Credits never expire and roll over month to month. Purchase additional credits anytime from the dashboard.

OperationUnitCredits
Text RedactionPer request1
Document RedactionPer page1
File ConversionPer file1
OCR — PDFPer page1
OCR — ImagePer file2
Batch jobsPer item1
Free
500 / mo
Starter
10k / mo
Pro
100k / mo
Enterprise
Unlimited
HTTP Status Codes
200Success
400Bad Request
401Invalid API Key
402Insufficient Credits
413File Too Large
422Unsupported Format
429Rate Limited
500Server Error

Plain Text Redaction

Text3 ENDPOINTS

Feed raw text. Choose entity types — or let the engine auto-detect every sensitive token. Output: redacted, blackout, hashed, or synthetically replaced. Supports 50+ entity types including PII, PHI, financial identifiers, and custom regex patterns.

Analyzes the input text for sensitive entities and returns the redacted version. Specify target entity types or let auto-detection run the full 50+ type sweep. Supports three output modes: blackout (███), hash (SHA-256 prefix), or synthetic replacement (realistic fake data).

Request Parameters
Param: text
Type: string
Required: Yes
Desc: The raw text content to redact.
Param: entities
Type: string[]
Required: No
Desc: Array of entity types to target. Omit to auto-detect all. e.g. ["EMAIL","SSN","CREDIT_CARD"]
Param: mode
Type: enum
Required: No
Desc: Redaction mode. One of: "blackout" | "hash" | "synthetic". Defaults to "blackout".
Param: language
Type: string
Required: No
Desc: ISO 639-1 language code for the input text. Defaults to "en".
curl -X POST https://api.yourdomain.com/v1/redact/text \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"text": "John Smith, SSN 123-45-6789, reachable at john@example.com",
"entities": ["NAME", "SSN", "EMAIL"],
"mode": "synthetic",
"language": "en"
}'
200 Response
{
"id": "req_01HXYZ123ABC",
"status": "success",
"redacted_text": "Alex Johnson, SSN 987-65-4321, reachable at alex@random.net",
"entities_found": [
{ "type": "NAME", "original": "John Smith", "start": 0, "end": 10 },
{ "type": "SSN", "original": "123-45-6789", "start": 17, "end": 28 },
{ "type": "EMAIL", "original": "john@example.com", "start": 43, "end": 59 }
],
"mode": "synthetic",
"credits_used": 1,
"processing_ms": 48
}

Submit up to 1,000 text items in a single request. Each item is processed independently with optional per-item overrides. Responses are returned in the same order as submitted. Ideal for bulk record scrubbing pipelines.

Request Parameters
Param: items
Type: object[]
Required: Yes
Desc: Array of objects, each with a "text" field and optional per-item "entities" and "mode" overrides.
Param: default_mode
Type: enum
Required: No
Desc: Default redaction mode applied to all items. One of: "blackout" | "hash" | "synthetic".
Param: default_entities
Type: string[]
Required: No
Desc: Default entity list applied to all items unless overridden per-item.
curl -X POST https://api.yourdomain.com/v1/redact/text/batch \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"default_mode": "blackout",
"default_entities": ["EMAIL", "PHONE", "NAME"],
"items": [
{ "text": "Contact Alice at alice@corp.com" },
{ "text": "Bob mobile: +1-555-867-5309", "mode": "hash" },
{ "text": "Patient ID 9182736 — DOB 1984-03-15", "entities": ["PATIENT_ID","DOB"] }
]
}'
200 Response
{
"batch_id": "batch_02HABC456DEF",
"status": "success",
"total": 2,
"results": [
{ "index": 0, "redacted_text": "Contact ███████ at ████████████", "entities_found": 2, "credits_used": 1 },
{ "index": 1, "redacted_text": "Bob mobile: a3f9c1d2...", "entities_found": 1, "credits_used": 1 }
],
"total_credits_used": 2,
"processing_ms": 112
}

Returns the full catalog of entity types the redaction engine can detect — including code, display name, category (PII / PHI / FINANCIAL), and regional scope.

curl -X GET https://api.yourdomain.com/v1/redact/entities \
-H "Authorization: Bearer YOUR_API_KEY"
200 Response
{
"entities": [
{ "code": "NAME", "label": "Person Name", "category": "PII" },
{ "code": "EMAIL", "label": "Email Address", "category": "PII" },
{ "code": "PHONE", "label": "Phone Number", "category": "PII" },
{ "code": "SSN", "label": "Social Security No", "category": "PII_US" },
{ "code": "CREDIT_CARD", "label": "Credit Card Number", "category": "FINANCIAL" },
{ "code": "IBAN", "label": "Bank Account IBAN", "category": "FINANCIAL" },
{ "code": "PATIENT_ID", "label": "Patient Identifier", "category": "PHI" },
{ "code": "DOB", "label": "Date of Birth", "category": "PII" }
],
"total": 54
}

Document Redaction

Document2 ENDPOINTS

Upload a PDF, DOCX, or image. Receive a layout-preserving, pixel-perfect redacted clone with classified data replaced by solid black bounding boxes. An immutable audit trail is attached to every job. Supports multi-page documents and mixed-format archives.

Upload a PDF, DOCX, PNG, or JPG via multipart/form-data. The engine parses the document, runs entity detection across all text layers (including embedded OCR for scanned pages), draws bounding-box redactions, and returns the redacted file plus a signed audit manifest.

Request Parameters
Param: file
Type: file
Required: Yes
Desc: The document to redact. Accepted MIME types: application/pdf, application/vnd.openxmlformats-officedocument.wordprocessingml.document, image/png, image/jpeg.
Param: entities
Type: string[]
Required: No
Desc: Entity types to target. Omit for full auto-detection.
Param: audit_trail
Type: boolean
Required: No
Desc: Attach a signed JSON audit manifest to the response. Defaults to true.
Param: output_format
Type: enum
Required: No
Desc: Output format override. Defaults to matching the input format. Options: "pdf" | "docx".
curl -X POST https://api.yourdomain.com/v1/redact/document \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/contract.pdf" \
-F "entities=[\"NAME\",\"SSN\",\"CREDIT_CARD\"]" \
-F "audit_trail=true" \
-F "output_format=pdf"
200 Response
{
"job_id": "job_03HDEF789GHI",
"status": "complete",
"input": { "filename": "contract.pdf", "pages": 8, "size_bytes": 412304 },
"output": {
"download_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/redacted.pdf",
"expires_at": "2025-01-15T18:00:00Z",
"size_bytes": 398211
},
"entities_redacted": 23,
"audit_trail_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/audit.json",
"credits_used": 8,
"processing_ms": 1840
}

Retrieve the current status and result of a previously submitted document redaction job. Large documents (50+ pages) are processed asynchronously — poll this endpoint or register a webhook_url on submission to be notified on completion.

Request Parameters
Param: job_id
Type: string
Required: Yes
Desc: The job ID returned by the POST /v1/redact/document endpoint (path parameter).
curl -X GET https://api.yourdomain.com/v1/redact/document/job_03HDEF789GHI \
-H "Authorization: Bearer YOUR_API_KEY"
200 Response
{
"job_id": "job_03HDEF789GHI",
"status": "complete",
"progress_pct": 100,
"output": {
"download_url": "https://cdn.yourdomain.com/jobs/job_03HDEF789GHI/redacted.pdf",
"expires_at": "2025-01-15T18:00:00Z"
},
"credits_used": 8
}

File Conversion

Convert2 ENDPOINTS

PDF ↔ DOCX ↔ HTML ↔ Markdown ↔ plain text. Every conversion preserves heading hierarchy, table structure, hyperlinks, footnotes, and embedded assets. Metadata is stripped before output. Image-to-document via OCR. Batch-convert ZIP archives in a single call.

Upload a file and specify the target format. The engine parses the source, maps it to the target schema, strips metadata, and returns a download URL. Supported routes: PDF↔DOCX, PDF↔HTML, PDF↔MD, PDF↔TXT, DOCX↔HTML, DOCX↔MD, HTML↔MD, IMG→PDF, IMG→DOCX.

Request Parameters
Param: file
Type: file
Required: Yes
Desc: The source file. Accepted: PDF, DOCX, HTML, MD, TXT, PNG, JPG, WEBP.
Param: to
Type: enum
Required: Yes
Desc: Target format. One of: "pdf" | "docx" | "html" | "md" | "txt".
Param: strip_metadata
Type: boolean
Required: No
Desc: Remove author, revision history, XMP/EXIF, and custom properties from output. Defaults to true.
Param: pdf_compliance
Type: enum
Required: No
Desc: PDF output standard. One of: "pdf-a-1b" | "pdf-a-2b" | "pdf-1.7". Defaults to "pdf-a-1b".
curl -X POST https://api.yourdomain.com/v1/convert \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/report.docx" \
-F "to=pdf" \
-F "strip_metadata=true" \
-F "pdf_compliance=pdf-a-1b"
200 Response
{
"job_id": "conv_04HJKL012MNO",
"status": "complete",
"from": "docx",
"to": "pdf",
"output": {
"download_url": "https://cdn.yourdomain.com/conv/conv_04HJKL012MNO/report.pdf",
"expires_at": "2025-01-15T18:00:00Z",
"size_bytes": 187432,
"pages": 4,
"pdf_compliance": "pdf-a-1b"
},
"metadata_stripped": true,
"credits_used": 1,
"processing_ms": 312
}

Submit a ZIP archive containing multiple files and a target format. The engine processes all files concurrently and returns a consolidated ZIP of converted outputs. Stream per-file progress via SSE at the returned sse_url, or receive a webhook callback on completion.

Request Parameters
Param: archive
Type: file
Required: Yes
Desc: A ZIP archive containing the files to convert. Max 500 files per archive, max 500 MB total.
Param: to
Type: enum
Required: Yes
Desc: Target format for all files. One of: "pdf" | "docx" | "html" | "md" | "txt".
Param: strip_metadata
Type: boolean
Required: No
Desc: Strip metadata from all outputs. Defaults to true.
Param: webhook_url
Type: string
Required: No
Desc: URL to POST to when the batch job completes.
curl -X POST https://api.yourdomain.com/v1/convert/batch \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "archive=@/path/to/documents.zip" \
-F "to=pdf" \
-F "strip_metadata=true" \
-F "webhook_url=https://yourserver.com/webhooks/conversion"
200 Response
{
"batch_id": "batch_05HPQR345STU",
"status": "processing",
"total_files": 47,
"completed": 0,
"failed": 0,
"estimated_ms": 14200,
"poll_url": "https://api.yourdomain.com/v1/convert/batch/batch_05HPQR345STU",
"sse_url": "https://api.yourdomain.com/v1/convert/batch/batch_05HPQR345STU/stream"
}

OCR Readable PDFs

OCR2 ENDPOINTS

Transform scanned, image-heavy, or locked PDFs into fully searchable, indexable, machine-readable documents. Multi-language OCR across 40+ languages, handwriting recognition, and per-block confidence scoring. Outputs a searchable PDF with embedded text layer or a structured JSON AST.

Upload a scanned PDF or image. The engine renders each page, runs the OCR pass, reconstructs reading order, detects headings and tables, and returns a searchable PDF with an embedded text layer — or a structured JSON document if output_format is set to "json".

Request Parameters
Param: file
Type: file
Required: Yes
Desc: Source file. Accepted: PDF (scanned or image-based), PNG, JPG, WEBP, TIFF.
Param: language
Type: string
Required: No
Desc: Primary language as ISO 639-1 code. Defaults to "en". For multilingual docs: "en,fr,de".
Param: output_format
Type: enum
Required: No
Desc: One of: "pdf" (searchable PDF with text layer) | "json" (structured AST). Defaults to "pdf".
Param: handwriting
Type: boolean
Required: No
Desc: Enable handwriting recognition model. Increases processing time ~30%. Defaults to false.
Param: confidence_threshold
Type: number
Required: No
Desc: Minimum OCR confidence (0.0–1.0). Blocks below this are flagged in the response. Defaults to 0.85.
curl -X POST https://api.yourdomain.com/v1/ocr \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/scanned_report.pdf" \
-F "language=en" \
-F "output_format=pdf" \
-F "handwriting=false" \
-F "confidence_threshold=0.85"
200 Response
{
"job_id": "ocr_06HVWX678YZA",
"status": "complete",
"input": { "filename": "scanned_report.pdf", "pages": 12, "size_bytes": 8341200, "type": "scanned" },
"output": {
"download_url": "https://cdn.yourdomain.com/ocr/ocr_06HVWX678YZA/searchable.pdf",
"expires_at": "2025-01-15T18:00:00Z",
"size_bytes": 9102440
},
"ocr": {
"language_detected": "en",
"confidence_avg": 0.962,
"confidence_min": 0.831,
"low_confidence_pages": [7],
"words_extracted": 4821,
"handwriting_detected": false
},
"credits_used": 12,
"processing_ms": 6840
}

Runs the full OCR pass and returns a structured JSON abstract syntax tree. Nodes are typed (heading, paragraph, table, list) with bounding-box coordinates, confidence scores, and reading-order indices. Ideal for feeding into CMS pipelines, vector stores, or LLM pre-processors.

Request Parameters
Param: file
Type: file
Required: Yes
Desc: Source file. Accepted: PDF, PNG, JPG, WEBP, TIFF.
Param: language
Type: string
Required: No
Desc: Primary language as ISO 639-1 code. Defaults to "en".
Param: include_bbox
Type: boolean
Required: No
Desc: Include bounding-box coordinates for each node. Defaults to true.
Param: handwriting
Type: boolean
Required: No
Desc: Enable handwriting recognition. Defaults to false.
curl -X POST https://api.yourdomain.com/v1/ocr/extract \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/invoice.png" \
-F "language=en" \
-F "include_bbox=true"
200 Response
{
"job_id": "ocr_07HBCD901EFG",
"status": "complete",
"page_count": 1,
"nodes": [
{
"type": "heading", "level": 1,
"text": "Invoice #INV-2024-0042",
"page": 1, "order": 0,
"bbox": { "x": 72, "y": 88, "w": 480, "h": 32 },
"confidence": 0.991
},
{
"type": "table",
"page": 1, "order": 4,
"rows": 4, "cols": 3,
"bbox": { "x": 72, "y": 200, "w": 468, "h": 120 },
"confidence": 0.974
}
],
"credits_used": 2,
"processing_ms": 920
}