JSON Schema — Extracting Structured Data
Define a JSON schema, and Rekognita will extract exactly the data you need — from any document. Ideal for automating the processing of invoices, medical records, contracts, and more.
How it works
- You define a JSON schema with the fields you want to extract
- Rekognita parses the document and finds the corresponding data
- Returns structured data in your format + coordinates (bounding box) of each field
Example: Invoice
Schema
{
"invoice_number": "string",
"date": "string",
"due_date": "string",
"vendor": {
"name": "string",
"address": "string"
},
"total_amount": "number",
"currency": "string",
"line_items": [
{
"description": "string",
"quantity": "number",
"unit_price": "number",
"amount": "number"
}
]
}Result
{
"data": {
"invoice_number": "INV-2847",
"date": "2024-12-15",
"due_date": "2025-01-14",
"vendor": {
"name": "Acme Corporation",
"address": "123 Business Ave, Suite 400, New York, NY 10001"
},
"total_amount": 848.00,
"currency": "USD",
"line_items": [
{
"description": "Document Processing API — Pro Plan",
"quantity": 1,
"unit_price": 499.00,
"amount": 499.00
},
{
"description": "Additional API calls (10,000)",
"quantity": 10,
"unit_price": 25.00,
"amount": 250.00
}
]
},
"citations": {
"invoice_number": { "page": 1, "bbox": [350, 42, 480, 58] },
"total_amount": { "page": 1, "bbox": [400, 320, 490, 338] }
},
"confidence": 0.97
}SDK Example
from rekognita import RekognitaClient
client = RekognitaClient()
result = client.documents.extract(
file="invoice.pdf",
schema={
"invoice_number": "string",
"total_amount": "number",
"line_items": [{
"description": "string",
"quantity": "number",
"price": "number"
}]
},
model="rekognita-accurate"
)
print(result.data["invoice_number"]) # "INV-2847"
print(result.data["total_amount"]) # 848.00
print(result.confidence) # 0.97
# Coordinates of each field for auditing
for field, citation in result.citations.items():
print(f"{field}: page {citation.page}, bbox {citation.bbox}")Supported Field Types
| Type | Description | Example |
|---|---|---|
string | Text value | "INV-2847" |
number | Numeric value | 848.00 |
boolean | Yes/No | true |
date | Date (ISO 8601) | "2024-12-15" |
array | Array of objects | [{...}, {...}] |
object | Nested object | {"name": "...", "address": "..."} |
enum | One of the specified values | "paid" | "pending" | "overdue" |
Citations & Auditing
Every extracted field has a citation — coordinates (bounding box) on the original page. This allows you to:
- Visually verify the results
- Build an audit trail for compliance
- Show the data source to users