Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Ocr

Overview

OCR API

Available Operations

process

OCR

Example Usage

from mistralai.client import Mistral
import os


with Mistral(
    api_key=os.getenv("MISTRAL_API_KEY", ""),
) as mistral:

    res = mistral.ocr.process(model="CX-9", document={
        "type": "document_url",
        "document_url": "https://upset-labourer.net/",
    }, bbox_annotation_format={
        "type": "text",
    }, document_annotation_format={
        "type": "text",
    })

    # Handle response
    print(res)

Parameters

Parameter Type Required Description Example
model Nullable[str] ✔️ N/A
document models.DocumentUnion ✔️ Document to run OCR on
pages OptionalNullable[models.Pages] Specific pages to process. Accepts a list of integers or a string of comma-separated numbers and ranges (e.g. '0,1,2' or '0-5' or '0,2-4'). Page numbers start from 0.
include_image_base64 OptionalNullable[bool] Include image URLs in response
image_limit OptionalNullable[int] Max images to extract
image_min_size OptionalNullable[int] Minimum height and width of image to extract
bbox_annotation_format OptionalNullable[models.ResponseFormat] Structured output class for extracting useful information from each extracted bounding box / image from document. Only json_schema is valid for this field Example 1: {
"type": "text"
}
Example 2: {
"type": "json_object"
}
Example 3: {
"type": "json_schema",
"json_schema": {
"schema": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"title": "Authors",
"type": "array"
}
},
"required": [
"name",
"authors"
],
"title": "Book",
"type": "object",
"additionalProperties": false
},
"name": "book",
"strict": true
}
}
document_annotation_format OptionalNullable[models.ResponseFormat] Structured output class for extracting useful information from the entire document. Only json_schema is valid for this field Example 1: {
"type": "text"
}
Example 2: {
"type": "json_object"
}
Example 3: {
"type": "json_schema",
"json_schema": {
"schema": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"authors": {
"items": {
"type": "string"
},
"title": "Authors",
"type": "array"
}
},
"required": [
"name",
"authors"
],
"title": "Book",
"type": "object",
"additionalProperties": false
},
"name": "book",
"strict": true
}
}
document_annotation_prompt OptionalNullable[str] Optional prompt to guide the model in extracting structured output from the entire document. A document_annotation_format must be provided.
table_format OptionalNullable[models.TableFormat] N/A
extract_header Optional[bool] N/A
extract_footer Optional[bool] N/A
confidence_scores_granularity OptionalNullable[models.ConfidenceScoresGranularity] Granularity for confidence scores: 'word' (per-word scores) or 'page' (aggregate only). Defaults to None (no confidence scores) to keep response payload small.
retries Optional[utils.RetryConfig] Configuration to override the default retry behavior of the client.

Response

models.OCRResponse

Errors

Error Type Status Code Content Type
errors.HTTPValidationError 422 application/json
errors.SDKError 4XX, 5XX */*