Docs
✨ Features
File Context (OCR)

File Context via Optical Character Recognition (OCR)

LibreChat’s OCR (Optical Character Recognition) feature enables AI agents to extract and process text from images and documents. This capability enhances the AI’s ability to work with visual content, making it possible to analyze, understand, and respond to information contained in images.

Overview

OCR functionality in LibreChat allows agents to:

  • Extract text from images and documents
  • Maintain document structure and formatting
  • Process complex layouts including multi-column text
  • Handle tables, equations, and other specialized content
  • Work with multilingual content

Availability

Currently, OCR is only available as an agent capability. This means you must use an agent via the Agents endpoint to leverage OCR functionality.

Configuration

OCR can be enabled in the LibreChat configuration file (librechat.yaml). The OCR configuration supports two strategies:

  1. Mistral OCR (Default and currently the only available option)
  2. Custom OCR (Planned for future releases)

Basic Configuration Example

If using the Mistral OCR API, you only need the following environment variables to get started:

# `.env`
OCR_API_KEY=your-mistral-api-key
# OCR_BASEURL=https://api.mistral.ai/v1 # this is the default value

For additional, detailed configuration options, see the OCR Config Object Structure.

# `librechat.yaml`
ocr:
  mistralModel: "mistral-ocr-latest"  # Optional: Specify Mistral model, defaults to "mistral-ocr-latest"
  apiKey: "your-mistral-api-key"        # Optional: Defaults to OCR_API_KEY env variable
  baseURL: "https://api.mistral.ai/v1"  # Optional: Defaults to OCR_BASEURL env variable, or Mistral's API if no variable set
  strategy: "mistral_ocr"               # Optional: Defaults to "mistral_ocr" (only option currently available)

Mistral OCR

Currently, LibreChat uses Mistral’s OCR API as the default and only available OCR provider. Mistral OCR offers state-of-the-art document understanding capabilities.

Key Features of Mistral OCR

  • Document Structure Preservation: Maintains formatting like headers, paragraphs, lists, and tables
  • Multilingual Support: Processes text in multiple languages and scripts
  • Complex Layout Handling: Handles multi-column text and mixed content
  • Mathematical Expression Recognition: Accurately processes equations and formulas
  • High-Speed Processing: Processes up to 2000 pages per minute

Important Considerations

  • Cost: Using Mistral OCR may incur costs as it’s a paid API service (though free trials may be available)
  • Data Privacy: Data processed through Mistral OCR is subject to Mistral’s cloud environment and their terms of service
  • Document Limitations:
    • Maximum file size: 50 MB
    • Maximum document length: 1,000 pages

Future Plans

  • Mistral plans to make their OCR API available through their cloud partners, such as GCP and AWS, and enterprise self-hosting for organizations with stringent data privacy requirements (source).
  • LibreChat will continue to support Mistral OCR and explore additional OCR providers, including open-source solutions, for enhanced functionality.
  • LibreChat currently does not include the parsed image content from the OCR process in its responses, even though services like Mistral’s OCR API may provide these in the result. This feature may be supported in future updates.

Using File Context (OCR) in LibreChat

LibreChat provides two main ways to use OCR functionality:

1. Upload as Text in Chat

In any chat conversation, you can use OCR to extract text from images or documents:

  1. Click the attachment icon in the chat input
  2. Select “Upload as Text” from the menu
  3. Choose an image or document file
  4. The OCR system will process the file and insert the extracted text into your message

Upload as Text option in the attachment menu

2. File Context for Agents

When working with agents, you can add documents as context using OCR:

  1. Open the Agent Builder panel or edit an existing agent
  2. In the File Context section, click “Upload File Context”
  3. Select a document or image file
  4. The OCR system will extract text from the file and add it to the agent’s instructions

File Context using OCR for agents

Files uploaded as “Context” are processed using OCR to extract text, which is then added to the Agent’s instructions. This is ideal for documents, images with text, or PDFs where you need the full text content of a file to be available to the agent.

Note, the OCR is performed at the time of upload and is not stored as a separate file, rather purely as text in the database.

Example Use Cases

  • Document Analysis: Extract and analyze text from scanned documents, PDFs, or images
  • Data Extraction: Pull specific information from forms, receipts, or invoices
  • Research Assistance: Process academic papers, articles, or books
  • Language Translation: Extract text from foreign language documents for translation
  • Content Digitization: Convert printed materials into digital, searchable text

Limitations

  • OCR accuracy may vary depending on image quality, document complexity, and text clarity
  • Some specialized formatting or unusual layouts might not be perfectly preserved
  • Very large documents may be truncated due to token limitations of the underlying AI models

Future Enhancements

LibreChat plans to expand OCR capabilities in future releases:

  • Support for custom OCR providers
  • A user_provided strategy option that will allow users to choose their preferred OCR service
  • Integration with open-source OCR solutions
  • Enhanced document processing options
  • More granular control over OCR settings

For more information on configuring OCR, see the OCR Config Object Structure.