PaperRoute

A document processing daemon that watches a directory for new PDFs and images, extracts content via parallel OCR + Vision LLM, classifies documents by recipient and category, and indexes everything into a RAG system for chat-based retrieval.

How it works

Drop a PDF or image into inbox/ and PaperRoute will:

Extract text using both OCR (deterministic) and a Vision LLM (structural) in parallel
Reconcile the two extractions into a clean markdown document via LLM
Classify the document — matching it to a recipient based on configured tags, determining a category, and generating a descriptive subject
Save the markdown (with YAML frontmatter) and a copy of the original to output/{recipient}/{category}/{date}-{subject}.md
Index into a RAG system for semantic search

Then ask questions about your documents through a chat interface.

Requirements

Python 3.14+
uv package manager
DeepFellow instance (provides OCR, Vision, LLM, and RAG APIs)

Quick start

# Install dependencies
uv sync

# Configure
cp config.yaml.example config.yaml  # Edit with your DeepFellow settings
export DEEPFELLOW_API_KEY=your-key

# Start the watcher daemon
uv run python -m docproc.watcher

# In another terminal, start the chat frontend
uv run python chat/app.py

Drop files into inbox/ and query them at http://localhost:7860.

Configuration

Edit config.yaml:

directories:
  watch: "./inbox"
  output: "./output"

deepfellow:
  base_url: "http://localhost:8000/v1"
  api_key: "${DEEPFELLOW_API_KEY}"
  vision_model: "gpt-4-vision"
  llm_model: "deepseek"

recipients:
  - name: "Piotr Zalewa"
    tags: ["aquarium", "fish", "reef"]

Documents that don't match any recipient's tags are filed under "Common". Categories are not predefined — the LLM determines them from document content.

Processing pipeline

inbox/ (new file)
  │
  ├─→ OCR (DeepFellow easyOCR) ──┐
  │                               ├─→ Reconciler (LLM) → Classifier (LLM) → Save + Index
  └─→ Vision (LLM) ──────────────┘

Supported file types

PDF (.pdf)
PNG (.png)
JPEG (.jpg, .jpeg)
TIFF (.tiff, .tif)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.claude/skills/review-and-fix		.claude/skills/review-and-fix
.github/workflows		.github/workflows
docs		docs
inbox		inbox
output		output
src/docproc		src/docproc
tasks		tasks
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
config-example.yaml		config-example.yaml
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperRoute

How it works

Requirements

Quick start

Configuration

Processing pipeline

Supported file types

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

zalun/PaperRoute

Folders and files

Latest commit

History

Repository files navigation

PaperRoute

How it works

Requirements

Quick start

Configuration

Processing pipeline

Supported file types

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages