Paddle Matrix is a high-performance HTTP service powered by PaddleOCR designed to extract hardcoded subtitles from videos and generate standard SRT subtitle files. It provides a robust API for video subtitle extraction with support for multiple languages and video formats.
- Features
- Quick Start
- Web Interface
- API Documentation
- Algorithm & Technical Details
- Configuration
- Contributing
- License
- 🎯 Auto Subtitle Detection: Intelligently identifies subtitle regions within video frames without manual specification.
- 🌍 Multi-language Support: Robust support for Chinese, English, Japanese, Korean, and more.
- 📹 Wide Format Support: Compatible with MP4, AVI, MOV, MKV, WebM, FLV, WMV, and other mainstream video formats.
- 📄 SRT Generation: Automatically generates standard SubRip Subtitle (SRT) files with precise timestamps.
- ⚡ Sync/Async Processing:
- Synchronous: Real-time processing for short videos.
- Asynchronous: Background task processing for long videos with status polling.
- 🔍 Detailed Debug Info: Integrated debug info (raw OCR data, padding, original boxes) displayed in the Web UI for troubleshooting.
- 🐳 Docker Ready: One-click deployment using Docker and Docker Compose.
- 🖥️ Web UI: Includes a simple built-in web interface for file uploads and testing with interactive debug panel.
- 🍎 macOS Standalone App: Build a native macOS application with bundled Python runtime and OCR models.
The easiest way to run Paddle Matrix is using Docker Compose.
# Clone the repository
git clone https://github.com/mistbit/paddle-matrix.git
cd paddle-matrix
# Start the service
docker-compose up -dThe service will be available at http://localhost:8000.
For macOS users, you can build a standalone application that runs without installing Python or any dependencies.
Prerequisites:
- Python 3.10 (for building only)
- Homebrew OpenSSL:
brew install openssl@3
Build & Run:
# Build the macOS app
./build_app.sh
# Run the app
open "dist/Paddle Matrix.app"
# Or install to Applications
cp -r "dist/Paddle Matrix.app" /Applications/The app includes:
- ✅ Python 3.10 runtime
- ✅ All dependencies (FastAPI, PaddleOCR, OpenCV, etc.)
- ✅ Pre-bundled OCR models (no download on first run)
- ✅ Native desktop window
If you prefer running it locally without Docker:
Prerequisites:
- Python: 3.10 or higher
- FFmpeg: Required for video frame extraction.
- Ubuntu/Debian:
sudo apt install ffmpeg - macOS:
brew install ffmpeg - Windows: Download from FFmpeg website and add to PATH.
- Ubuntu/Debian:
Installation Steps:
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start the service
# Option A: Using helper script
chmod +x manage.sh
./manage.sh start
# Option B: Using Uvicorn directly
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadAccess the built-in web UI at http://localhost:8000/ to upload videos and test the extraction directly from your browser.
- Upload: Drag and drop video files.
- Preview: Real-time preview of detection results.
- Debug: View bounding boxes and OCR confidence scores.
- Download: Export results as SRT files.
Interactive API documentation (Swagger UI) is available at http://localhost:8000/docs.
-
Synchronous Extraction
POST /api/v1/subtitle/extract- Upload a video and wait for the SRT content. Best for short clips.
- Params:
video(file),language(default: auto),sample_interval.
-
Asynchronous Extraction
POST /api/v1/subtitle/extract/async- Upload a video and get a
task_id. Suitable for long videos.
-
Check Task Status
GET /api/v1/subtitle/status/{task_id}
-
Download Subtitles
GET /api/v1/subtitle/download/{task_id}
# Synchronous extraction (Chinese)
curl -X POST "http://localhost:8000/api/v1/subtitle/extract" \
-H "Content-Type: multipart/form-data" \
-F "video=@my_video.mp4" \
-F "language=ch" > output.srtPaddle Matrix uses a proprietary "Anchor Discovery Mechanism" to automatically locate subtitle regions without manual ROI (Region of Interest) specification.
- Multi-Strategy Detection Pipeline:
- Bottom ROI Priority: Scans the bottom 35% of the video first, covering 90% of subtitle scenarios.
- Global Scan: Falls back to full-frame scanning if no text is found in the bottom region.
- Temporal Subtitle Bands: Utilizes morphological operations and vertical projection analysis.
- Stability Clustering & Optimized Padding:
- Performs Y-axis coordinate clustering on detection results.
- Enhanced Dynamic Padding: Automatically calculates optimized padding (
x_pad: 8%,y_pad: 30%) to ensure text integrity.
Powered by Baidu's open-source PaddleOCR deep learning framework.
- Models & Architecture: Uses PP-OCRv3/v4 ultra-lightweight models.
- Dynamic Multi-Language Loading: Supports on-demand loading of language models (
ch,en,japan,korean, etc.). - Preprocessing Optimization: Built-in OpenCV image preprocessing pipeline (BGR -> RGB, enhancement).
We designed the SubtitleMerger algorithm to transform fragmented OCR results into smooth SRT subtitles.
- Similarity-Based Deduplication: Uses
SequenceMatcherto merge text when similarity >0.8. - Voting Mechanism: "Confidence + Frequency" weighted voting system selects the best text content.
- Timeline Smoothing: Automatically merges micro-gaps and estimates reasonable end times.
Copy .env.example to .env to configure the application.
| Variable | Description | Default |
|---|---|---|
APP_NAME |
Application Name | Video Subtitle OCR Service |
DEBUG |
Enable debug mode | False |
PADDLEOCR_LANG |
Default OCR Language | ch |
VIDEO_SAMPLE_INTERVAL |
Frame sampling interval (sec) | 1.0 |
SUBTITLE_MERGE_THRESHOLD |
Text similarity threshold | 0.8 |
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/amazing-feature). - Commit your changes (
git commit -m 'feat(core): add amazing feature'). - Push to the branch (
git push origin feature/amazing-feature). - Open a Pull Request.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

