Docs
✨ Features
OCR for Documents

OCR for Documents

OCR (Optical Character Recognition) in LibreChat is an optional enhancement for text extraction from files.

Upload as Text

The “Upload as Text” feature (from the chat) works the same way:

  • Files matching fileConfig.ocr.supportedMimeTypes use OCR if available
  • Falls back to text parsing if OCR is not configured
  • Especially useful for images with text, scanned documents, and complex PDFs
  • Processing priority: OCR > STT > text parsing
  • See the Upload as Text documentation for details.

File Context (for agents)

When you upload files through the Agent Builder’s File Context section:

  1. Text is extracted using text parsing by default (OCR/STT if configured and file matches)
  2. Extracted text is stored as part of the agent’s system instructions
  3. Agent can reference this context in all conversations
  4. OCR service is optional - the feature works without it using text parsing

Files uploaded as “File Context” are processed to extract text, which is then added to the Agent’s system instructions. This is ideal for documents, code files, PDFs, or images with text where you need the full text content to be included in the agent’s instructions.

Note: The extracted text is included in the agent’s system instructions.

Optional OCR Configuration

Both Agent File Context and Upload as Text work out-of-the-box using text parsing. To enhance extraction quality for images and scanned documents, you can optionally configure an OCR service:

# librechat.yaml
endpoints:
  agents:
    capabilities:
      - "context"  # Enables both agent file context and upload as text
      - "ocr"      # Optionally enhances both with OCR
 
ocr:
  strategy: "mistral_ocr"
  apiKey: "${OCR_API_KEY}"
  baseURL: "https://api.mistral.ai/v1"
  mistralModel: "mistral-ocr-latest"

Note: The context capability is enabled by default. You only need to configure OCR (the ocr capability) if you want enhanced extraction quality for images and scanned documents.

Overview of OCR Capabilities

OCR functionality in LibreChat allows:

  • Extract text from images and documents
  • Maintain document structure and formatting
  • Process complex layouts including multi-column text
  • Handle tables, equations, and other specialized content
  • Work with multilingual content

OCR Strategies

LibreChat supports multiple OCR strategies to meet different deployment needs and requirements. Choose the strategy that best fits your infrastructure and compliance requirements.

1. Mistral OCR (Default)

The default strategy uses Mistral’s cloud API service directly. This is the simplest setup and requires only an API key from Mistral.

Environment Variables:

# `.env`
OCR_API_KEY=your-mistral-api-key
# OCR_BASEURL=https://api.mistral.ai/v1 # this is the default value

Configuration:

# `librechat.yaml`
ocr:
  mistralModel: "mistral-ocr-latest"       # Optional: Specify Mistral model, defaults to "mistral-ocr-latest"
  apiKey: "your-mistral-api-key"           # Optional: Defaults to OCR_API_KEY env variable
  baseURL: "https://api.mistral.ai/v1"     # Optional: Defaults to OCR_BASEURL env variable, or Mistral's API if no variable set
  strategy: "mistral_ocr"                  # Optional: Defaults to "mistral_ocr"

Key Features:

  • Document Structure Preservation: Maintains formatting like headers, paragraphs, lists, and tables
  • Multilingual Support: Processes text in multiple languages and scripts
  • Complex Layout Handling: Handles multi-column text and mixed content
  • Mathematical Expression Recognition: Accurately processes equations and formulas
  • High-Speed Processing: Processes up to 2000 pages per minute

Considerations:

  • Cost: Using Mistral OCR may incur costs as it’s a paid API service (though free trials may be available)
  • Data Privacy: Data processed through Mistral OCR is subject to Mistral’s cloud environment and their terms of service
  • Document Limitations:
    • Maximum file size: 50 MB
    • Maximum document length: 1,000 pages

2. Azure Mistral OCR

For organizations using Azure AI Foundry, you can deploy Mistral OCR models to your Azure infrastructure. Currently, the Mistral OCR 2503 model is available for Azure deployment.

Configuration:

# `librechat.yaml`
ocr:
  mistralModel: "deployed-mistral-ocr-2503"              # Should match your Azure deployment name
  apiKey: "${AZURE_MISTRAL_OCR_API_KEY}"                 # Reference to your Azure API key in .env
  baseURL: "https://your-deployed-endpoint.models.ai.azure.com/v1"  # Your Azure endpoint
  strategy: "azure_mistral_ocr"                          # Use Azure strategy

Azure Model Information: You can explore the latest Mistral OCR model available on Azure AI Foundry here (requires Azure subscription):

https://ai.azure.com/explore/models/mistral-ocr-2503

3. Google Vertex AI Mistral OCR

For organizations using Google Cloud Platform, you can deploy Mistral OCR models to your Google Cloud Vertex AI infrastructure.

Environment Variables:

# `.env`
# Option 1: File path
GOOGLE_SERVICE_KEY_FILE=/path/to/your/service-account-key.json
 
# Option 2: URL to fetch the key
GOOGLE_SERVICE_KEY_FILE=https://your-secure-server.com/service-account-key.json
 
# Option 3: Base64 encoded JSON
GOOGLE_SERVICE_KEY_FILE=eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsICJwcm9qZWN0X2lkIjogInlvdXItcHJvamVjdC1pZCIsIC4uLn0=
 
# Option 4: Raw JSON string
GOOGLE_SERVICE_KEY_FILE='{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "...",
  "client_id": "...",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "..."
}'

Configuration:

# `librechat.yaml`
ocr:
  mistralModel: "mistral-ocr-2505"                        # Model name as deployed in Vertex AI
  strategy: "vertexai_mistral_ocr"                       # Use Google Vertex AI strategy

Setup Requirements:

  1. Deploy a Mistral OCR model to Google Vertex AI (e.g., mistral-ocr-2505)
  2. Create a service account with appropriate permissions to access the Vertex AI endpoint
  3. Download the service account JSON key file
  4. Set the GOOGLE_SERVICE_KEY_FILE environment variable using one of the supported methods

4. Custom OCR (Planned)

Support for custom OCR providers and user-defined strategies is planned for future releases.

Detailed Configuration

For additional, detailed configuration options, see the OCR Config Object Structure.

OCR Processing Configuration

Control which file types are processed with OCR using fileConfig:

fileConfig:
  ocr:
    supportedMimeTypes:
      - "^image/(jpeg|gif|png|webp|heic|heif)$"
      - "^application/pdf$"
      - "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$"
      - "^application/vnd\\.ms-(word|powerpoint|excel)$"
      - "^application/epub\\+zip$"

Files matching these patterns will use OCR when:

  • Uploaded to agent file context (always, if OCR is configured)
  • Uploaded as text in chat (if OCR is configured; otherwise falls back to text parsing)

For more details on file processing configuration, see File Config Object Structure.

Use Cases for Agent File Context

Agent File Context is ideal for:

  • Persistent Agent Knowledge: Add documentation, policies, or reference materials to an agent’s system instructions
  • Specialized Agents: Create agents with domain-specific knowledge from documents
  • Document-Based Assistants: Build agents that always reference specific manuals or guides
  • Code Files: Include code examples or libraries in agent instructions
  • Structured Data: Add CSV, JSON, or other structured data for the agent to reference

When OCR is configured, File Context also handles:

  • Scanned Document Processing: Extract and store text from images or scanned PDFs
  • Image Text Extraction: Extract text from screenshots or photos of documents

For temporary document questions in chat, see Upload as Text.

Limitations

  • Text extraction accuracy may vary depending on file type, image quality, document complexity, and text clarity
  • Some specialized formatting or unusual layouts might not be perfectly preserved
  • Very large documents may be truncated due to token limitations of the underlying AI models
  • For best results with images and scanned documents, configure an OCR service

Future Enhancements

LibreChat plans to expand OCR capabilities in future releases:

  • Support for custom OCR providers
  • A user_provided strategy option that will allow users to choose their preferred OCR service
  • Integration with open-source OCR solutions
  • Enhanced document processing options
  • More granular control over OCR settings
  • Mistral plans to make their OCR API available through their cloud partners, such as GCP and AWS, and enterprise self-hosting for organizations with stringent data privacy requirements (source)
  • LibreChat currently does not include the parsed image content from the OCR process in its responses, even though services like Mistral’s OCR API may provide these in the result. This feature may be supported in future updates.

For more information on configuring OCR, see the OCR Config Object Structure.