RAG API
Configure Retrieval-Augmented Generation (RAG) API for document indexing and retrieval using Langchain and FastAPI. This API integrates with LibreChat to provide context-aware responses based on user-uploaded files.
The RAG API indexes user-uploaded files and retrieves relevant passages to augment your prompts, giving LibreChat context-aware responses grounded in your documents. It runs as a separate FastAPI service backed by a PostgreSQL + pgvector database.
New to RAG?
The RAG API Presentation explains the concept in more detail and links to a helpful video. This page covers setup and configuration.
Availability
RAG works with Agents, as well as Custom Endpoints, OpenAI, Azure OpenAI, Anthropic, and Google.
OpenAI Assistants have their own RAG implementation through the "Retrieval" capability (details here). Using the RAG API with the Assistants API is still worthwhile since OpenAI charges for both file storage and Retrieval. This integration is planned for a future update.
Docker Quick Start
For Docker, the RAG API is already wired up in both the default docker-compose.yml and deploy-compose.yml files, including the RAG_API_URL value. You only need to make sure you are running the latest image and compose files. See the Updating LibreChat guide for Docker if you are unsure how to update.
Shared .env file
With the default Docker setup, the .env file is shared between LibreChat and the RAG API. Define the RAG variables in that same file. The full list lives in the RAG API README.
Pick the embeddings provider you want to use.
Use RAG with OpenAI embeddings. This is the default configuration.
Set the RAG API URL. Add the following to your .env file:
RAG_API_URL=http://host.docker.internal:8000Provide an OpenAI API key (if needed). If your OpenAI API key is set to user_provided, add a key for embeddings. Skip this step if you already supply the OpenAI key in your .env file.
RAG_OPENAI_API_KEY=sk-your-openai-api-key-exampleStart the containers.
docker compose up -dUse RAG with Hugging Face embeddings.
Configure the provider. Add the following to your .env file:
RAG_API_URL=http://host.docker.internal:8000
EMBEDDINGS_PROVIDER=huggingface
HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxSwitch to the full RAG API image. Update your docker-compose.override.yml file:
version: '3.4'
services:
rag_api:
image: registry.librechat.ai/danny-avila/librechat-rag-api-dev:latestStart the containers.
docker compose up -dUse RAG with Ollama local embeddings.
Prerequisite
You need Ollama and the nomic-embed-text embedding model. Pull it with ollama pull nomic-embed-text.
Configure the provider. Add the following to your .env file:
RAG_API_URL=http://host.docker.internal:8000
EMBEDDINGS_PROVIDER=ollama
OLLAMA_BASE_URL=http://host.docker.internal:11434
EMBEDDINGS_MODEL=nomic-embed-textSwitch to the full RAG API image. Update your docker-compose.override.yml file:
version: '3.4'
services:
rag_api:
image: registry.librechat.ai/danny-avila/librechat-rag-api-dev:latest
# If running on Linux
# extra_hosts:
# - "host.docker.internal:host-gateway"Start the containers.
docker compose up -dLite vs. full image
Docker uses the "lite" image of the RAG API by default (registry.librechat.ai/danny-avila/librechat-rag-api-dev-lite:latest), which only supports remote embeddings from OpenAI or a remote HuggingFace/Ollama service you have configured.
For local embeddings, switch the image in the compose file to the full build, registry.librechat.ai/danny-avila/librechat-rag-api-dev:latest. Make this change in your Docker Compose Override File. See docker-compose.override.yml.example at the root of the project for an example.
If you want a compose file that includes only the PostgreSQL + pgvector database and the Python API, see rag.yml at the root of the project.
Database storage
The default compose files store the pgvector/PostgreSQL data in the Docker-managed pgdata2 volume. This is intentional: the database files don't need to be edited directly from the host, and a managed volume avoids common ownership and permission problems. User-facing, editable files (uploads, logs, images, MongoDB data, and NGINX config) are bind-mounted to project folders where direct host access is useful.
Local Setup
A non-container setup is more hands-on. Follow the instructions in the RAG API repo.
Set RAG_API_URL in your LibreChat .env file to wherever the API is reachable from your setup. This differs from Docker, where the value is already set in the default docker-compose.yml file.
Configuration
Set RAG API options through environment variables in an .env file accessible to the API. Most are optional, aside from the credentials and paths required by your chosen provider. In the default setup, only RAG_OPENAI_API_KEY is required.
Environment Variables
| Key | Type | Description | Example |
|---|---|---|---|
| RAG_API_URL | string | URL of the RAG API service. | RAG_API_URL=http://host.docker.internal:8000 |
| RAG_OPENAI_API_KEY | string | OpenAI API key for embeddings. Overrides OPENAI_API_KEY for RAG. | # RAG_OPENAI_API_KEY=sk-your-key |
| RAG_OPENAI_BASEURL | string | Custom OpenAI base URL for RAG embeddings. | # RAG_OPENAI_BASEURL= |
| RAG_USE_FULL_CONTEXT | boolean | Fetch entire file context instead of top 4 results. Default: false. | # RAG_USE_FULL_CONTEXT=true |
| EMBEDDINGS_PROVIDER | string | Embeddings provider: openai, azure, huggingface, huggingfacetei, or ollama. Default: openai. | # EMBEDDINGS_PROVIDER=openai |
| EMBEDDINGS_MODEL | string | Embeddings model to use. Default depends on provider. | # EMBEDDINGS_MODEL=text-embedding-3-small |
| RAG_PORT | number | Port where RAG API runs. Default: 8000. | # RAG_PORT=8000 |
| RAG_HOST | string | Hostname for RAG API. Default: 0.0.0.0. | # RAG_HOST=0.0.0.0 |
| COLLECTION_NAME | string | Vector store collection name. Default: testcollection. | # COLLECTION_NAME=testcollection |
| CHUNK_SIZE | number | Size of text chunks. Default: 1500. | # CHUNK_SIZE=1500 |
| CHUNK_OVERLAP | number | Overlap between chunks. Default: 100. | # CHUNK_OVERLAP=100 |
| OLLAMA_BASE_URL | string | Ollama base URL when using Ollama embeddings. | # OLLAMA_BASE_URL=http://host.docker.internal:11434 |
Credential precedence
OPENAI_API_KEY works for RAG embeddings, but RAG_OPENAI_API_KEY overrides it to avoid credential conflicts.
For the complete list of variables and their descriptions, see the RAG API repo.
Usage
Once the RAG API is running, it integrates with LibreChat automatically. When a user uploads files to a conversation, the API indexes them and uses them for context-aware responses.
Upload files to the conversation. If RAG_API_URL is not configured or not reachable, the upload fails.
Chat as usual. As the user interacts with the model, the RAG API retrieves relevant passages from the indexed files based on the input and uses them to augment the prompt.
Control when files are queried. By default, the vector store is queried on every new message in a conversation that has a file attached. Craft your prompts accordingly.
Toggle Resend Files off in the conversation settings to query files only when they are explicitly attached to a message.
Reuse indexed files. Upload a file once, then attach it to any new message or conversation from the Side Panel.
Files must be in "Host" storage. "OpenAI" files are treated differently and are exclusive to Assistants, so they must not have been uploaded while the Assistants endpoint was selected and active. View and manage your files from the Side Panel.
Troubleshooting
If you run into issues setting up or using the RAG API:
- Confirm all required environment variables are set correctly in your
.envfile. - Make sure the vector database is configured and accessible.
- Verify that the OpenAI API key or other provider credentials are valid.
- Check both the LibreChat and RAG API logs for errors or warnings.
If the problem persists, refer to the RAG API documentation or ask the LibreChat community on GitHub Discussions or Discord.
How is this guide?