A fully local, Retrieval-Augmented Generation (RAG) application that lets you upload PDF documents and ask natural language questions about them — powered by Groq, LangChain, and ChromaDB.
Your Question
│
▼
Embed question into a vector
│
▼
Search ChromaDB for similar chunks ←── Your PDFs (pre-indexed)
│
▼
Build prompt: question + relevant chunks
│
▼
Groq LLaMA 3 generates a grounded answer
│
▼
Answer + source citations shown to you
- Upload PDFs via a clean web UI — no terminal needed
- Automatic indexing — chunking, embedding, and storage triggered on upload
- Chat interface with persistent message history
- Source citations — every answer shows which file and page it came from
- Adjustable retrieval — control how many chunks are used per answer
| Component | Tool |
|---|---|
| Web UI | Streamlit |
| LLM (answer generation) | Groq — LLaMA 3.3 70B |
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
| Vector database | ChromaDB |
| PDF loader | LangChain + PyPDF |
| Package manager | uv |
RagMind/
├── app.py # Streamlit entry point — multi-page navigation
├── upload.py # Page 1: Upload PDFs and trigger indexing
├── chat.py # Page 2: Chat interface with source display
├── index.py # Core indexing pipeline (load → chunk → embed → store)
├── query.py # Core query pipeline (CLI version)
├── .env # API keys and configuration
├── .gitignore # Excludes chroma_db/, data/, .env
├── pyproject.toml # uv project file with dependencies
├── data/ # Your uploaded PDF files (git ignored)
└── chroma_db/ # ChromaDB vector store (git ignored)
uv run streamlit run app.pyOpen http://localhost:8501 in your browser.
Navigate to the Upload Docs page, drag and drop your PDF files, then click Save & Index Documents.
The app will:
- Save your PDFs to
./data/ - Split them into overlapping chunks
- Generate embeddings locally (first run downloads ~90 MB model, then cached)
- Store everything in ChromaDB
Navigate to the Ask Questions page and start chatting. Each answer includes:
- The generated response grounded in your documents
- Expandable Sources panel showing which file, page, and text excerpt was used
| Option | Description |
|---|---|
| Show source chunks | Toggle source citations on/off |
| Chunks to retrieve | How many passages to pull per question (1–8) |
| Clear chat history | Reset the conversation |
If you prefer the terminal over the web UI:
# Index your PDFs first (add PDFs to ./data manually)
uv run python index.py
# Then ask questions interactively
uv run python query.pyAll settings are in .env:
# Groq API
GROQ_API_KEY=gsk_...
GROQ_MODEL=llama-3.3-70b-versatile
# Embeddings
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Retrieval
RETRIEVAL_TOP_K=4 # number of chunks retrieved per question- LangChain — RAG framework
- Groq — ultra-fast LLM inference
- ChromaDB — local vector database
- Streamlit — web UI framework
- HuggingFace — open source embedding models