A production-ready Retrieval-Augmented Generation (RAG) system with multilingual support for scientific research and knowledge exploration. Built with robust error handling, structured logging, and enterprise-grade features.
- PDF extraction with PyMuPDF (context managers for resource safety)
- Intelligent text cleaning (preserves structure, removes noise)
- Semantic chunking with Indic script-aware sentence splitting
- Persistent vector storage via ChromaDB
- Dense + sparse search โ BGE-M3 dense vectors fused with BM25 lexical search via Reciprocal Rank Fusion (RRF)
- Cross-encoder reranking โ BAAI/bge-reranker-v2-m3 scores the top candidates for precision
- Faithfulness verification โ NLI-based claim-level grounding check flags unsupported assertions
- Retrieves 30 candidates, reranks to top 12, verifies citations against source chunks
- 10+ Indian languages + English (Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia)
- Unicode script-based language detection (no misclassification of short Indic queries)
- Two RAG strategies:
- Strategy A: Direct multilingual reasoning (recommended)
- Strategy B: Translation-enhanced reasoning with NLLB-200 (sentence-batched to prevent truncation)
- Cross-lingual semantic search with BGE-M3 embeddings (1024d, strong on Indic scripts)
- Google Gemini 3.5 Flash integration with automatic retry (tenacity, 3 attempts with exponential backoff)
- Optimized system prompt โ grounding-first, no mandatory section padding
- Smart citation extraction with range validation
- Low temperature (0.1) for deterministic grounded responses
- Thread-safe model initialization โ double-checked locking on all singletons
- Warm-up at startup โ models loaded via FastAPI lifespan, first request is never cold
- Session TTL eviction โ stale chat sessions cleaned automatically
- Admin-gated destructive ops โ purge endpoints require
ADMIN_API_KEY - API key authentication, Prometheus metrics, env-driven CORS
- Pydantic v2 validation and type safety
purge.py- CLI utility to safely clear PDFs, database, or model cache- Web-based document management - Upload, ingest, and purge via UI
- Comprehensive ingestion pipeline with progress tracking
- Evaluation framework with nDCG@10, Recall@20, and CI gating
- Python 3.11+
- Google Gemini API key (Get one here)
- 8GB+ RAM recommended
# Clone the repository
git clone https://github.com/DNSdecoded/IndicRAG.git
cd IndicRAG
# Create virtual environment
python -m venv .venv
# Activate (Windows)
.venv\Scripts\activate
# Activate (macOS/Linux)
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env and add your API key
# LLM_API_KEY=your_gemini_api_key_here
# Optional: Configure API authentication
# API_KEYS=key1,key2,key3# Place PDFs in papers/ directory
# Then ingest them:
python ingest.py
# Or specify a directory:
python ingest.py path/to/pdfs# With pre-flight checks
python start_server.py
# Skip checks (for production)
python start_server.py --skip-checks
# Development mode with auto-reload
python start_server.py --dev๐ That's it! Access the API at:
- Interactive docs: http://localhost:8080/api/docs
- Web Interface: http://localhost:8080
Open http://localhost:8080 and:
- Ask Questions - Enter queries in any supported language
- Manage Documents - Expand the panel to:
- Upload PDFs via drag-and-drop
- View uploaded papers list
- Ingest papers into the vector store
- Purge papers or database (with confirmation)
import requests
response = requests.post('http://localhost:8080/query', json={
"question": "เฐฏเฐพเฐเฐเฑเฐจเฑเฐจเฐพเฐคเฑ ml เฐจเฑ เฐเฐฒเฐพ เฐ
เฐฎเฐฒเฑ เฐเฑเฐฏเฐตเฐเฑเฐเฑ?", # Telugu
"strategy": "A",
"top_k": 5
})
result = response.json()
print(result['answer'])
print(f"Citations: {len(result['citations'])}")import rag
result = rag.answer_question(
"เคฎเคงเฅเคฎเฅเคน เคเคพ เคเคฒเคพเค เคเฅเคฏเคพ เคนเฅ?", # Hindi: diabetes treatment
strategy="B",
top_k=8
)
print(f"Answer ({result['language_name']}): {result['answer']}")
print(f"Used {result['chunks_used']} document chunks")Ask a question and get an AI-powered answer with citations.
Request:
{
"question": "What is quantum computing?",
"strategy": "A",
"top_k": 5
}Response:
{
"answer": "Quantum computing is...",
"language": "en",
"language_name": "English",
"chunks_used": 4,
"citations": [
{"number": "1", "title": "Quantum Computing Basics", "section": "Introduction"}
],
"processing_time": 1.23
}Ingest a PDF document (returns extracted title).
Get vector store statistics.
Health check endpoint.
Upload a PDF file (multipart form).
List all uploaded PDFs with sizes.
Delete all uploaded PDF files.
Clear the vector database (all chunks).
Safely clear indexed data:
# Delete all PDFs
python purge.py --papers
# Clear vector database
python purge.py --db
# Remove cached models (will re-download)
python purge.py --models
# Clear everything (with confirmation)
python purge.py --all
# Non-interactive mode
python purge.py --all --yes# Test with example queries
python examples/example_query.pyIndicRAG/
โโโ api_server.py # FastAPI app with auth, lifespan warm-up, session TTL
โโโ config.py # All configuration constants and prompts
โโโ rag.py # Core RAG pipeline (retrieval, rerank, generate, verify)
โโโ embeddings.py # BGE-M3 multilingual embeddings (thread-safe)
โโโ rerank.py # Cross-encoder reranker (bge-reranker-v2-m3)
โโโ bm25_search.py # BM25 lexical index + RRF fusion
โโโ verify.py # NLI-based faithfulness verification
โโโ vector_store.py # ChromaDB wrapper (thread-safe)
โโโ translation.py # NLLB-200 translation, sentence-batched
โโโ lang_utils.py # Unicode script + langdetect detection
โโโ pdf_utils.py # PDF extraction, Indic-aware chunking
โโโ ingest.py # PDF ingestion pipeline
โโโ start_server.py # Server launcher with pre-flight checks
โโโ purge.py # CLI cleanup utility
โ
โโโ static/ # Web frontend
โ โโโ index.html
โ
โโโ docs/ # Documentation
โ โโโ Eval/ # Evaluation framework (nDCG, Recall@20, CI gate)
โ โโโ QUICKSTART.md
โ โโโ ARCHITECTURE.md
โ โโโ ...
โ
โโโ examples/ # Example scripts
โโโ papers/ # Your PDF documents
โโโ chroma_db/ # Vector database
โโโ models/ # Cached ML models
Key settings in config.py (all overridable via environment variables):
# Embedding model (BGE-M3: dense + sparse, Indic-strong)
EMBEDDING_MODEL_NAME = "BAAI/bge-m3"
EMBEDDING_DIMENSION = 1024
# Retrieval pipeline
USE_RERANKER = True # cross-encoder reranking
USE_HYBRID_SEARCH = True # dense + BM25 fusion
DEFAULT_TOP_K = 30 # retrieve wide
MAX_CONTEXT_CHUNKS = 12 # keep after rerank
MAX_CONTEXT_LENGTH = 48000 # ~12k tokens
# Faithfulness verification
FAITHFULNESS_THRESHOLD = 0.5
FAITHFULNESS_ENFORCE = "warn" # warn | strip | regen
# LLM
LLM_MODEL_NAME = "gemini-3.5-flash"
LLM_TEMPERATURE = 0.1 # low for grounded citation tasks
LLM_MAX_TOKENS = 2048| Variable | Default | Description |
|---|---|---|
LLM_API_KEY |
(required) | Google Gemini API key |
ADMIN_API_KEY |
(none) | Required for /purge/* endpoints |
API_KEYS |
(none) | Comma-separated keys for general auth |
CORS_ORIGINS |
localhost | Comma-separated allowed origins |
USE_RERANKER |
true |
Enable cross-encoder reranking |
USE_HYBRID_SEARCH |
true |
Enable BM25 + dense fusion |
FAITHFULNESS_ENFORCE |
warn |
warn, strip, or regen |
EMBEDDING_MODEL_NAME |
BAAI/bge-m3 |
Sentence-transformers model |
| Language | Code | Native Name |
|---|---|---|
| English | en | English |
| Hindi | hi | เคนเคฟเคเคฆเฅ |
| Telugu | te | เฐคเฑเฐฒเฑเฐเฑ |
| Tamil | ta | เฎคเฎฎเฎฟเฎดเฏ |
| Bengali | bn | เฆฌเฆพเฆเฆฒเฆพ |
| Marathi | mr | เคฎเคฐเคพเค เฅ |
| Gujarati | gu | เชเซเชเชฐเชพเชคเซ |
| Kannada | kn | เฒเฒจเณเฒจเฒก |
| Malayalam | ml | เดฎเดฒเดฏเดพเดณเด |
| Punjabi | pa | เจชเฉฐเจเจพเจฌเฉ |
| Odia | or | เฌเฌกเฌผเฌฟเฌ |
| Urdu | ur | ุงุฑุฏู |
For detailed evaluation methodology, automated metrics, and per-query qualitative reports, see docs/evaluation.md.
| Metric | Final Score |
|---|---|
| Retrieval Precision | 0.93 |
| Retrieval Recall | 0.91 |
| Faithfulness (Grounding Accuracy) | 0.98 |
| Attribution Accuracy | 0.97 |
| Technical Depth | 0.88 |
| Convergence / Mechanistic Reasoning | 0.86 |
| Cross-Document Discipline | 0.95 |
| Hallucination Rate | < 2% |
| Formatting & Structural Compliance | 0.98 |
Typical query latency (on CPU):
- Strategy A (direct multilingual): ~1-2s
- Strategy B (with translation): ~3-6s (includes NLLB translation time)
ChromaDB retrieval: <100ms for 1000s of documents
Memory usage:
- Base system: ~500MB
- With BGE-M3 embeddings: ~2.5GB
- With reranker: ~3.5GB
- With NLLB translation: ~6GB (Strategy B only)
- API key authentication with secure parsing
- Admin key gating for destructive operations (
ADMIN_API_KEY) - Input validation with Pydantic v2
- Env-driven CORS (
CORS_ORIGINS) - Path traversal protection on ingest endpoints
- Structured logging across all modules
- Prometheus metrics at
/metrics - Processing time tracking
- Faithfulness warnings logged for ungrounded claims
- Thread-safe model singletons (double-checked locking)
- Warm-up at startup via FastAPI lifespan
- LLM retry with exponential backoff (tenacity)
- Session TTL eviction
- Graceful empty collection handling
- Cross-encoder reranking + faithfulness verification
- Hybrid dense+lexical retrieval
- Citation range validation (caps [2020-2023] false positives)
- Sentence-batched translation prevents truncation
"API key not configured"
# Check .env file
cat .env | grep LLM_API_KEY"No documents indexed"
# Ingest PDFs
python ingest.py"Translation model gated/authentication required"
- The system now uses NLLB-200 which requires no authentication
- First use will download ~2.4GB automatically
- See documentation for manual download if needed
"Out of memory"
# Edit config.py to reduce memory usage
CHUNK_SIZE = 512 # Smaller chunks
MAX_CONTEXT_CHUNKS = 3 # Fewer chunks in contextContributions welcome! See CONTRIBUTING.md
Recent improvements:
- โ Hybrid retrieval pipeline (BGE-M3 dense + BM25 lexical + RRF fusion)
- โ Cross-encoder reranking (bge-reranker-v2-m3)
- โ NLI-based faithfulness verification with configurable enforcement
- โ Thread-safe model initialization across all modules
- โ Sentence-batched translation (fixes Strategy B truncation)
- โ Unicode script-based language detection for short Indic queries
- โ LLM retry with exponential backoff (tenacity)
- โ Optimized system prompt โ grounding-first, no section padding
- โ Expanded evaluation framework (nDCG@10, Recall@20, CI gating)
- โ Admin key gating for destructive purge endpoints
- โ Env-driven CORS, warm-up at startup, session TTL eviction
- โ Query embedding LRU cache, Indic-aware chunking
Built with excellent open-source tools:
- Google Gemini - Multilingual LLM
- Sentence Transformers - BGE-M3 embeddings & reranking
- Facebook NLLB - Translation
- ChromaDB - Vector database
- FastAPI - API framework
- PyMuPDF - PDF processing
- Tenacity - Retry logic
MIT License - see LICENSE file for details.
- ๐ Documentation
- ๐ฌ GitHub Discussions
- ๐ Issue Tracker
Built with โค๏ธ for multilingual scientific accessibility
โญ Star this repo if you find it useful!
