OCR for Documents
OCR (Optical Character Recognition) in LibreChat is an optional enhancement for text extraction from files.
Upload as Text
The “Upload as Text” feature (from the chat) works the same way:
- Files matching
fileConfig.ocr.supportedMimeTypes
use OCR if available - Falls back to text parsing if OCR is not configured
- Especially useful for images with text, scanned documents, and complex PDFs
- Processing priority: OCR > STT > text parsing
- See the Upload as Text documentation for details.
File Context (for agents)
When you upload files through the Agent Builder’s File Context section:
- Text is extracted using text parsing by default (OCR/STT if configured and file matches)
- Extracted text is stored as part of the agent’s system instructions
- Agent can reference this context in all conversations
- OCR service is optional - the feature works without it using text parsing
Files uploaded as “File Context” are processed to extract text, which is then added to the Agent’s system instructions. This is ideal for documents, code files, PDFs, or images with text where you need the full text content to be included in the agent’s instructions.
Note: The extracted text is included in the agent’s system instructions.
Optional OCR Configuration
Both Agent File Context and Upload as Text work out-of-the-box using text parsing. To enhance extraction quality for images and scanned documents, you can optionally configure an OCR service:
# librechat.yaml
endpoints:
agents:
capabilities:
- "context" # Enables both agent file context and upload as text
- "ocr" # Optionally enhances both with OCR
ocr:
strategy: "mistral_ocr"
apiKey: "${OCR_API_KEY}"
baseURL: "https://api.mistral.ai/v1"
mistralModel: "mistral-ocr-latest"
Note: The context
capability is enabled by default. You only need to configure OCR (the ocr
capability) if you want enhanced extraction quality for images and scanned documents.
Overview of OCR Capabilities
OCR functionality in LibreChat allows:
- Extract text from images and documents
- Maintain document structure and formatting
- Process complex layouts including multi-column text
- Handle tables, equations, and other specialized content
- Work with multilingual content
OCR Strategies
LibreChat supports multiple OCR strategies to meet different deployment needs and requirements. Choose the strategy that best fits your infrastructure and compliance requirements.
1. Mistral OCR (Default)
The default strategy uses Mistral’s cloud API service directly. This is the simplest setup and requires only an API key from Mistral.
Environment Variables:
# `.env`
OCR_API_KEY=your-mistral-api-key
# OCR_BASEURL=https://api.mistral.ai/v1 # this is the default value
Configuration:
# `librechat.yaml`
ocr:
mistralModel: "mistral-ocr-latest" # Optional: Specify Mistral model, defaults to "mistral-ocr-latest"
apiKey: "your-mistral-api-key" # Optional: Defaults to OCR_API_KEY env variable
baseURL: "https://api.mistral.ai/v1" # Optional: Defaults to OCR_BASEURL env variable, or Mistral's API if no variable set
strategy: "mistral_ocr" # Optional: Defaults to "mistral_ocr"
Key Features:
- Document Structure Preservation: Maintains formatting like headers, paragraphs, lists, and tables
- Multilingual Support: Processes text in multiple languages and scripts
- Complex Layout Handling: Handles multi-column text and mixed content
- Mathematical Expression Recognition: Accurately processes equations and formulas
- High-Speed Processing: Processes up to 2000 pages per minute
Considerations:
- Cost: Using Mistral OCR may incur costs as it’s a paid API service (though free trials may be available)
- Data Privacy: Data processed through Mistral OCR is subject to Mistral’s cloud environment and their terms of service
- Document Limitations:
- Maximum file size: 50 MB
- Maximum document length: 1,000 pages
2. Azure Mistral OCR
For organizations using Azure AI Foundry, you can deploy Mistral OCR models to your Azure infrastructure. Currently, the Mistral OCR 2503 model is available for Azure deployment.
Configuration:
# `librechat.yaml`
ocr:
mistralModel: "deployed-mistral-ocr-2503" # Should match your Azure deployment name
apiKey: "${AZURE_MISTRAL_OCR_API_KEY}" # Reference to your Azure API key in .env
baseURL: "https://your-deployed-endpoint.models.ai.azure.com/v1" # Your Azure endpoint
strategy: "azure_mistral_ocr" # Use Azure strategy
Azure Model Information: You can explore the latest Mistral OCR model available on Azure AI Foundry here (requires Azure subscription):
https://ai.azure.com/explore/models/mistral-ocr-2503
3. Google Vertex AI Mistral OCR
For organizations using Google Cloud Platform, you can deploy Mistral OCR models to your Google Cloud Vertex AI infrastructure.
Environment Variables:
# `.env`
# Option 1: File path
GOOGLE_SERVICE_KEY_FILE=/path/to/your/service-account-key.json
# Option 2: URL to fetch the key
GOOGLE_SERVICE_KEY_FILE=https://your-secure-server.com/service-account-key.json
# Option 3: Base64 encoded JSON
GOOGLE_SERVICE_KEY_FILE=eyJ0eXBlIjogInNlcnZpY2VfYWNjb3VudCIsICJwcm9qZWN0X2lkIjogInlvdXItcHJvamVjdC1pZCIsIC4uLn0=
# Option 4: Raw JSON string
GOOGLE_SERVICE_KEY_FILE='{
"type": "service_account",
"project_id": "your-project-id",
"private_key_id": "...",
"private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
"client_email": "...",
"client_id": "...",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "..."
}'
Configuration:
# `librechat.yaml`
ocr:
mistralModel: "mistral-ocr-2505" # Model name as deployed in Vertex AI
strategy: "vertexai_mistral_ocr" # Use Google Vertex AI strategy
Setup Requirements:
- Deploy a Mistral OCR model to Google Vertex AI (e.g., mistral-ocr-2505)
- Create a service account with appropriate permissions to access the Vertex AI endpoint
- Download the service account JSON key file
- Set the
GOOGLE_SERVICE_KEY_FILE
environment variable using one of the supported methods
4. Custom OCR (Planned)
Support for custom OCR providers and user-defined strategies is planned for future releases.
Detailed Configuration
For additional, detailed configuration options, see the OCR Config Object Structure.
OCR Processing Configuration
Control which file types are processed with OCR using fileConfig
:
fileConfig:
ocr:
supportedMimeTypes:
- "^image/(jpeg|gif|png|webp|heic|heif)$"
- "^application/pdf$"
- "^application/vnd\\.openxmlformats-officedocument\\.(wordprocessingml\\.document|presentationml\\.presentation|spreadsheetml\\.sheet)$"
- "^application/vnd\\.ms-(word|powerpoint|excel)$"
- "^application/epub\\+zip$"
Files matching these patterns will use OCR when:
- Uploaded to agent file context (always, if OCR is configured)
- Uploaded as text in chat (if OCR is configured; otherwise falls back to text parsing)
For more details on file processing configuration, see File Config Object Structure.
Use Cases for Agent File Context
Agent File Context is ideal for:
- Persistent Agent Knowledge: Add documentation, policies, or reference materials to an agent’s system instructions
- Specialized Agents: Create agents with domain-specific knowledge from documents
- Document-Based Assistants: Build agents that always reference specific manuals or guides
- Code Files: Include code examples or libraries in agent instructions
- Structured Data: Add CSV, JSON, or other structured data for the agent to reference
When OCR is configured, File Context also handles:
- Scanned Document Processing: Extract and store text from images or scanned PDFs
- Image Text Extraction: Extract text from screenshots or photos of documents
For temporary document questions in chat, see Upload as Text.
Limitations
- Text extraction accuracy may vary depending on file type, image quality, document complexity, and text clarity
- Some specialized formatting or unusual layouts might not be perfectly preserved
- Very large documents may be truncated due to token limitations of the underlying AI models
- For best results with images and scanned documents, configure an OCR service
Future Enhancements
LibreChat plans to expand OCR capabilities in future releases:
- Support for custom OCR providers
- A
user_provided
strategy option that will allow users to choose their preferred OCR service - Integration with open-source OCR solutions
- Enhanced document processing options
- More granular control over OCR settings
- Mistral plans to make their OCR API available through their cloud partners, such as GCP and AWS, and enterprise self-hosting for organizations with stringent data privacy requirements (source)
- LibreChat currently does not include the parsed image content from the OCR process in its responses, even though services like Mistral’s OCR API may provide these in the result. This feature may be supported in future updates.
For more information on configuring OCR, see the OCR Config Object Structure.