Paddle Matrix - Video Subtitle OCR Service

Paddle Matrix is a high-performance HTTP service powered by PaddleOCR designed to extract hardcoded subtitles from videos and generate standard SRT subtitle files. It provides a robust API for video subtitle extraction with support for multiple languages and video formats.

✨ Features

🎯 Auto Subtitle Detection: Intelligently identifies subtitle regions within video frames without manual specification.
🌍 Multi-language Support: Robust support for Chinese, English, Japanese, Korean, and more.
📹 Wide Format Support: Compatible with MP4, AVI, MOV, MKV, WebM, FLV, WMV, and other mainstream video formats.
📄 SRT Generation: Automatically generates standard SubRip Subtitle (SRT) files with precise timestamps.
⚡ Sync/Async Processing:
- Synchronous: Real-time processing for short videos.
- Asynchronous: Background task processing for long videos with status polling.
🔍 Detailed Debug Info: Integrated debug info (raw OCR data, padding, original boxes) displayed in the Web UI for troubleshooting.
🐳 Docker Ready: One-click deployment using Docker and Docker Compose.
🖥️ Web UI: Includes a simple built-in web interface for file uploads and testing with interactive debug panel.
🍎 macOS Standalone App: Build a native macOS application with bundled Python runtime and OCR models.

🚀 Quick Start

🐳 Using Docker (Recommended)

The easiest way to run Paddle Matrix is using Docker Compose.

# Clone the repository
git clone https://github.com/mistbit/paddle-matrix.git
cd paddle-matrix

# Start the service
docker-compose up -d

The service will be available at http://localhost:8000.

🍎 macOS Standalone App

For macOS users, you can build a standalone application that runs without installing Python or any dependencies.

Prerequisites:

Python 3.10 (for building only)
Homebrew OpenSSL: brew install openssl@3

Build & Run:

# Build the macOS app
./build_app.sh

# Run the app
open "dist/Paddle Matrix.app"

# Or install to Applications
cp -r "dist/Paddle Matrix.app" /Applications/

The app includes:

✅ Python 3.10 runtime
✅ All dependencies (FastAPI, PaddleOCR, OpenCV, etc.)
✅ Pre-bundled OCR models (no download on first run)
✅ Native desktop window

🐍 Local Installation

If you prefer running it locally without Docker:

Prerequisites:

Python: 3.10 or higher
FFmpeg: Required for video frame extraction.
- Ubuntu/Debian: sudo apt install ffmpeg
- macOS: brew install ffmpeg
- Windows: Download from FFmpeg website and add to PATH.

Installation Steps:

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Start the service
# Option A: Using helper script
chmod +x manage.sh
./manage.sh start

# Option B: Using Uvicorn directly
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

🖥️ Web Interface

Access the built-in web UI at http://localhost:8000/ to upload videos and test the extraction directly from your browser.

Upload: Drag and drop video files.
Preview: Real-time preview of detection results.
Debug: View bounding boxes and OCR confidence scores.
Download: Export results as SRT files.

📖 API Documentation

Interactive API documentation (Swagger UI) is available at http://localhost:8000/docs.

Key Endpoints

Synchronous Extraction
- POST /api/v1/subtitle/extract
- Upload a video and wait for the SRT content. Best for short clips.
- Params: video (file), language (default: auto), sample_interval.
Asynchronous Extraction
- POST /api/v1/subtitle/extract/async
- Upload a video and get a task_id. Suitable for long videos.
Check Task Status
- GET /api/v1/subtitle/status/{task_id}
Download Subtitles
- GET /api/v1/subtitle/download/{task_id}

cURL Example

# Synchronous extraction (Chinese)
curl -X POST "http://localhost:8000/api/v1/subtitle/extract" \
  -H "Content-Type: multipart/form-data" \
  -F "video=@my_video.mp4" \
  -F "language=ch" > output.srt

🧠 Algorithm & Technical Details

1. Intelligent Subtitle Region Detection

Paddle Matrix uses a proprietary "Anchor Discovery Mechanism" to automatically locate subtitle regions without manual ROI (Region of Interest) specification.

Multi-Strategy Detection Pipeline:
1. Bottom ROI Priority: Scans the bottom 35% of the video first, covering 90% of subtitle scenarios.
2. Global Scan: Falls back to full-frame scanning if no text is found in the bottom region.
3. Temporal Subtitle Bands: Utilizes morphological operations and vertical projection analysis.
Stability Clustering & Optimized Padding:
- Performs Y-axis coordinate clustering on detection results.
- Enhanced Dynamic Padding: Automatically calculates optimized padding (x_pad: 8%, y_pad: 30%) to ensure text integrity.

2. OCR Engine Integration

Powered by Baidu's open-source PaddleOCR deep learning framework.

Models & Architecture: Uses PP-OCRv3/v4 ultra-lightweight models.
Dynamic Multi-Language Loading: Supports on-demand loading of language models (ch, en, japan, korean, etc.).
Preprocessing Optimization: Built-in OpenCV image preprocessing pipeline (BGR -> RGB, enhancement).

3. Subtitle Sequence Merger Algorithm

We designed the SubtitleMerger algorithm to transform fragmented OCR results into smooth SRT subtitles.

Similarity-Based Deduplication: Uses SequenceMatcher to merge text when similarity > 0.8.
Voting Mechanism: "Confidence + Frequency" weighted voting system selects the best text content.
Timeline Smoothing: Automatically merges micro-gaps and estimates reasonable end times.

⚙️ Configuration

Copy .env.example to .env to configure the application.

Variable	Description	Default
`APP_NAME`	Application Name	Video Subtitle OCR Service
`DEBUG`	Enable debug mode	`False`
`PADDLEOCR_LANG`	Default OCR Language	`ch`
`VIDEO_SAMPLE_INTERVAL`	Frame sampling interval (sec)	`1.0`
`SUBTITLE_MERGE_THRESHOLD`	Text similarity threshold	`0.8`

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/amazing-feature).
Commit your changes (git commit -m 'feat(core): add amazing feature').
Push to the branch (git push origin feature/amazing-feature).
Open a Pull Request.

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app		app
docs/images		docs/images
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
app_launcher.py		app_launcher.py
build_app.sh		build_app.sh
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
manage.sh		manage.sh
paddle_matrix.spec		paddle_matrix.spec
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paddle Matrix - Video Subtitle OCR Service

📚 Table of Contents

✨ Features

🚀 Quick Start

🐳 Using Docker (Recommended)

🍎 macOS Standalone App

🐍 Local Installation

🖥️ Web Interface

📖 API Documentation

Key Endpoints

cURL Example

🧠 Algorithm & Technical Details

1. Intelligent Subtitle Region Detection

2. OCR Engine Integration

3. Subtitle Sequence Merger Algorithm

⚙️ Configuration

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Paddle Matrix - Video Subtitle OCR Service

📚 Table of Contents

✨ Features

🚀 Quick Start

🐳 Using Docker (Recommended)

🍎 macOS Standalone App

🐍 Local Installation

🖥️ Web Interface

📖 API Documentation

Key Endpoints

cURL Example

🧠 Algorithm & Technical Details

1. Intelligent Subtitle Region Detection

2. OCR Engine Integration

3. Subtitle Sequence Merger Algorithm

⚙️ Configuration

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages