Deep-Fake Detection Service

A Python FastAPI server with a pluggable detector abstraction layer for deepfake detection. Supports multiple models running in parallel across image, video, and audio media types.

Architecture

POST /detect  (multipart file upload)
     │
     ├─ detect media type (image/video)
     │
     ├─ route to all registered detectors for that type
     │
     └─ return combined results from all detectors

Every detector implements BaseDetector (load, detect, supported_media_types) and is registered in app/detectors/registry.py. New models plug in by implementing the interface and adding one line to the registry.

Current Models

Model	Media	Architecture	Source
ViT Deep-Fake Detector v2	Image	ViT-base (HuggingFace pipeline)	prithivMLmods/Deep-Fake-Detector-v2-Model
Frame Sampler	Video	Samples 20 video frames, runs ViT image detector on each, averages scores	Uses the image model above
VoiceGen	Audio (from video)	Dual RawNet2 encoders with domain-agnostic feature disentanglement, SAM optimization, 59M params	Purdue-M2/AI-Synthesized-Voice-Generalization

Quick Start

Requirements

Python 3.11+
~600MB disk for model weights (downloaded on first run)

Setup

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

CLI

Starts the server, sends a file, prints results, and shuts down:

python cli.py path/to/video.mp4
python cli.py path/to/image.jpg

API

Start the server:

uvicorn app.main:app --host 0.0.0.0 --port 8000

Detect a file:

curl -X POST http://localhost:8000/detect \
  -F "file=@video.mp4"

Health check:

curl http://localhost:8000/health

Benchmark

Evaluates all models against labeled test files in media/ (filenames prefixed with fake- or real-):

python benchmark.py

Outputs per-model accuracy and writes detailed results to benchmark_results.csv.

Configuration

Environment variables:

Variable	Default	Description
`HF_TOKEN`	—	HuggingFace token (for gated models)
`DEVICE`	`cpu`	`cpu` or `cuda`
`HOST`	`0.0.0.0`	Server bind address
`PORT`	`8000`	Server port

Project Structure

deep-fake-detection/
├── app/
│   ├── main.py              # FastAPI app, /detect and /health endpoints
│   ├── config.py             # Settings from env vars
│   ├── schemas.py            # Pydantic request/response models
│   └── detectors/
│       ├── base.py           # BaseDetector abstract class
│       ├── registry.py       # Detector registry and media type routing
│       ├── hf_image.py       # ViT image detector (HuggingFace)
│       ├── frame_sampler.py  # Video → frame sampling → image detector
│       └── voice_gen/        # VoiceGen dual-RawNet2 audio detector
├── cli.py                    # CLI tool
├── benchmark.py              # Model evaluation script
├── media/                    # Test files (fake-*.mp4, real-*.mp4)
├── requirements.txt
└── Dockerfile

Models to Evaluate in the Future

TrueMedia ML Models

Collection of deepfake detectors from TrueMedia.org covering image, video, and audio:

DistilDIRE (image) — distilled diffusion-based detector, 3.2x faster than DIRE, handles GAN and diffusion outputs
UniversalFakeDetectV2 (image) — CLIP-ViT feature spaces with nearest-neighbor/linear probing
GenConViT (video) — ConvNeXt + Swin Transformer hybrid with CNN autoencoder and VAE
StyleFlow (video) — style-latent flow anomaly detection with StyleGRU + supervised contrastive learning
FTCN (video) — temporal convolution network for long-term coherence detection
Transcript Based Detector (audio) — speech recognition + LLM analysis for factual coherence

Repository: https://github.com/truemediaorg/ml-models

Note: TrueMedia model weights require a formal request to aerin@truemedia.org with affiliation and intended use.

GenFace / CAEL

CAEL (image) — Cross-modal Appearance-Edge Learning transformer with multi-grained fusion, 158.63M params, 99.88% within-dataset ACC but 65.04% cross-dataset AUC
Dataset: GenFace (515K forged + 100K real faces covering GANs and diffusion methods)
Repository: https://github.com/Jenine-321/GenFace
Skipped for now: redundant with existing ViT image detector, weak cross-dataset generalization

Previously Integrated (Removed — Underperformed)

GenD CLIP L/14 (video) — CLIP ViT-L/14 + linear probe, 20-frame averaging. yermandy/GenD_CLIP_L_14
D3 (video) — Dual-branch CLIP ViT-L/14 (shuffled + original patches) + attention head. BigAandSmallq/D3
AASIST (audio) — Graph Attention Network with SincConv, 297K params. clovaai/aasist
UniversalFakeDetect (image + video) — Frozen CLIP ViT-L/14 + linear probe, 769 params. WisconsinAIVision/UniversalFakeDetect
GenD DINOv3 L (image + video) — DINOv3 ViT-L/16 + linear probe, 300M params. yermandy/GenD_DINOv3_L
Wav2Vec2 Voice Detector (audio) — Fine-tuned Wav2Vec2-XLSR, 300M params. garystafford/wav2vec2-deepfake-voice-detector

Other Evaluated Models

WaveSpect (audio) — hybrid waveform + CQT spectrogram analysis for synthetic audio detection. No public weights or code available yet.
FakeBrAccent / XGBoost (audio) — XGBoost/CNN on Brazilian-accented speech dataset. Too narrow (Portuguese-only, accent-specific) for general use.
BRSpeech-DF (audio) — Brazilian Portuguese deepfake speech dataset. Could be used to fine-tune AASIST but no pretrained weights provided.
SDD-APALLM (audio) — CQT spectrograms + LLM prompting. Interesting but no released code/weights.
F-SAT / DeepFakeVox-HQ (audio) — frequency-selective adversarial training for robustness. Code and dataset forthcoming.
deitfake-v2 (image) — DeiT-based image classifier on HuggingFace. Only 2 classes, redundant with existing ViT detector.
DFD-FCG (video) — frequency-aware CLIP with graph learning. Uses same CLIP ViT-L/14 backbone as GenD/D3; weights not publicly available.
FakeVLM (video/image) — vision-language model for explainable deepfake detection, 7B+ params. Too heavy for real-time pipeline.
DTAD (video) — temporal artifact detection. Code available but limited documentation and unclear weight availability.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep-Fake Detection Service

Architecture

Current Models

Quick Start

Requirements

Setup

CLI

API

Benchmark

Configuration

Project Structure

Models to Evaluate in the Future

TrueMedia ML Models

GenFace / CAEL

Previously Integrated (Removed — Underperformed)

Other Evaluated Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
app		app
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
benchmark.py		benchmark.py
benchmark_results.csv		benchmark_results.csv
cli.py		cli.py
fetch_model_info.py		fetch_model_info.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Deep-Fake Detection Service

Architecture

Current Models

Quick Start

Requirements

Setup

CLI

API

Benchmark

Configuration

Project Structure

Models to Evaluate in the Future

TrueMedia ML Models

GenFace / CAEL

Previously Integrated (Removed — Underperformed)

Other Evaluated Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages