A Retrieval-Augmented Generation (RAG) system built in pure Python. Add YouTube videos directly from the UI, then ask questions and get grounded answers with source citations.
YouTube URL → transcript download → chunking → embeddings → FAISS index
↓
user question → embed query → vector search → top-k chunks → GPT-5.4 → answer
- Add — paste a YouTube URL into the chat. The app downloads the transcript and video metadata automatically, then chunks, embeds, and indexes it locally.
- Query — ask a question in the same chat. It's embedded, searched against the FAISS index, and the top matching chunks are sent to OpenAI GPT-5.4 to synthesize a grounded answer.
- Web UI — a Flask app serves a chat interface at
http://localhost:5000. The input field detects YouTube URLs automatically and switches between "Add" and "Send" mode.
Retrieval is entirely local. OpenAI is only used for the final answer generation step.
New to RAG? Read TUTORIAL.md for a ground-up explanation of how each component works.
pure-python-RAG/
├── transcripts/ # Transcript .txt files (downloaded or manually added)
│ └── <video_id>.txt
├── data/ # Auto-created by ingest (index + chunks)
│ ├── index.faiss
│ └── chunks.jsonl
├── src/
│ ├── transcript.py # YouTube transcript downloader
│ ├── chunking.py # Text chunking logic
│ ├── embeddings.py # sentence-transformers wrapper
│ ├── retriever.py # FAISS search
│ ├── llm.py # OpenAI answer generation
│ ├── ingest.py # Ingestion pipeline
│ └── query.py # End-to-end query pipeline
├── templates/
│ └── index.html # Web chat UI
├── app.py # CLI entrypoint (ingest + serve)
├── requirements.txt
└── .env # API key config
- Python 3.10+
- An OpenAI API key
1. Clone the repo and create a virtual environment
git clone https://github.com/mikepfeiffer/pure-python-RAG.git
cd pure-python-RAG
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate2. Install dependencies
pip install -r requirements.txtOn first run, sentence-transformers will download the all-MiniLM-L6-v2 model (~90 MB). This is cached locally after the first download.
3. Configure your API key
Create a .env file in the project root:
OPENAI_API_KEY=sk-...
python app.py serveOpen your browser to http://localhost:5000.
To run on a different port:
python app.py serve 8080Paste any YouTube URL into the input field. The border turns purple and the button changes to "Add" — press it (or hit Enter) to download and index the transcript. The UI shows a progress card while it works, then a confirmation with the video title and channel when done.
Supported URL formats:
https://www.youtube.com/watch?v=VIDEO_IDhttps://youtu.be/VIDEO_ID
Type any question and press Enter (or click Send). Answers include clickable source chips showing which video each claim came from, linked back to the original YouTube video.
Use Shift+Enter for a newline in your question.
Type /transcripts in the chat to see a list of all videos currently in the index. Each entry shows the video title, chunk count, and a link to the original YouTube video.
python app.py ingest # Process all .txt files in transcripts/ and rebuild the index
python app.py transcripts # Print a table of all indexed transcripts
python app.py serve # Start the web server (default port 5000)
python app.py serve 8080 # Start on a custom portIf you have existing .txt transcript files or want to bulk-load before starting the server:
python app.py ingestThis processes all .txt files in transcripts/ and rebuilds the index. The web UI's Add flow does this automatically, so you only need this command for manual imports.
Downloaded transcripts are saved automatically with the correct format. If you add files manually, use this structure:
Title: OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed
VideoURL: https://www.youtube.com/watch?v=gUNXZMcd2jU
Speaker: IBM
You know what's catching a lot of teams off guard right now? How easy
it is for an LLM to leak something that it shouldn't, or be steered
into doing something you never intended...
Header fields (all optional):
| Field | Description |
|---|---|
Title |
Video title — shown in citations |
VideoURL |
Linked in source chips in the UI |
Speaker |
Channel or speaker name |
Files are named <video_id>.txt when downloaded via the UI.
src/transcript.py handles the download flow:
- Extracts the video ID from the URL
- Calls YouTube's oEmbed API to fetch the video title and channel name (no API key required)
- Uses
youtube-transcript-apito fetch the auto-generated or manual transcript - Writes a formatted
.txtfile totranscripts/with the metadata header - Triggers a full re-ingest to rebuild the FAISS index
If a video has no available transcript (captions disabled, private video, etc.), the UI shows an error card with the reason.
Transcripts are split using a sliding window:
| Parameter | Default | Notes |
|---|---|---|
| Chunk size | 900 chars | ~150 words, works well for conversational text |
| Overlap | 120 chars | Preserves context across chunk boundaries |
To adjust, edit src/chunking.py or pass different values to chunk_text().
The project uses all-MiniLM-L6-v2 from sentence-transformers:
- Runs entirely locally — no API calls for retrieval
- 384-dimensional embeddings
- Fast and accurate for semantic search on English text
- Model is cached in
~/.cache/torch/sentence_transformers/after first download
Retrieved chunks are assembled into a prompt and sent to gpt-5.4. The prompt instructs the model to:
- Answer only from the provided transcript excerpts
- Cite the video title for each major claim
- Acknowledge when the answer isn't supported by the transcripts
To change the model, edit src/llm.py.
Each chunk stored in data/chunks.jsonl has the following shape:
{
"chunk_id": "gUNXZMcd2jU_0003",
"video_id": "gUNXZMcd2jU",
"video_title": "OWASP's Top 10 Ways to Attack LLMs: AI Vulnerabilities Exposed",
"source_file": "gUNXZMcd2jU.txt",
"chunk_index": 3,
"text": "In the case of prompt injection, the user basically has control over the system. The reason this occurs is that LLMs are not very good at separating input from instructions...",
"url": "https://www.youtube.com/watch?v=gUNXZMcd2jU"
}This file is plain JSONL and can be inspected directly to debug retrieval quality.
"No transcripts indexed yet" error
Add a YouTube video first by pasting a URL into the chat, or run python app.py ingest if you have files in transcripts/.
Transcript download fails
- The video may have captions disabled or be private/age-restricted
- Try a different video to confirm the setup is working
Poor answer quality
- Inspect
data/chunks.jsonlto verify chunks look reasonable - Increasing
top_kinsrc/query.pyretrieves more context (default: 5)
OpenAI API errors
- Verify
OPENAI_API_KEYis set correctly in.env - Check your OpenAI account has available quota
| Package | Purpose |
|---|---|
sentence-transformers |
Local embedding model |
faiss-cpu |
Vector similarity search |
openai |
GPT-5.4 answer generation |
flask |
Web server |
youtube-transcript-api |
YouTube transcript fetching |
python-dotenv |
.env file loading |
numpy |
Array handling for FAISS |
tqdm |
Progress bars during ingest |
This project is open source and available under the MIT License.


