Word/PDF Documents
Rekognita can not only recognize documents, but also generate restructured documents in DOCX and PDF formats with preserved structure, tables, and images.
How it works
Rekognita parses the input document, builds a graph structure, and then generates a clean DOCX or PDF with correct formatting:
- Headings preserve their hierarchy (H1, H2, H3...)
- Tables preserve columns, rows, and headers
- Images and charts are inserted with their captions
- Lists preserve numbering and nesting
- Footnotes and references are restored
API Request
POST /v1/documents/convert HTTP/1.1
Host: api.rekognita.com
Authorization: Bearer rk_sk_your_key
Content-Type: multipart/form-data
file=@scanned_document.pdf
output_format=docx # or "pdf"
model=rekognita-accurateResponse
{
"id": "doc_abc123",
"status": "completed",
"output_format": "docx",
"download_url": "/v1/documents/doc_abc123/download",
"expires_at": "2025-01-15T12:00:00Z",
"pages": 5,
"metadata": {
"processing_time_ms": 3200,
"model": "rekognita-accurate",
"tables_found": 3,
"images_found": 2
}
}Using the SDK
from rekognita import RekognitaClient
client = RekognitaClient()
# Convert a scan to DOCX
result = client.documents.convert(
file="scanned_invoice.pdf",
output_format="docx",
model="rekognita-accurate"
)
# Download the ready DOCX
result.download("output/restructured_invoice.docx")
print(f"Saved: {result.pages} pages, {result.metadata.tables_found} tables")
# Or get bytes
docx_bytes = result.content_bytes
with open("output.docx", "wb") as f:
f.write(docx_bytes)Additional Parameters
| Parameter | Type | Description |
|---|---|---|
output_format | string | "docx" or "pdf" |
include_images | boolean | Include images (default: true) |
page_range | string | Page range, e.g. "1-5" |
template | string | Template ID for output styling |
language | string | Document language for OCR (default: auto-detect) |
Comparison with Competitors
Other OCR tools output only flat text. Rekognita generates complete documents with restored structure — ready for use without manual editing.