chromium-screenshots

The missing screenshot service for Vision AI & Auth. Inject auth. Extract DOM. Zero-drift capture. Pixel-perfect Chromium.

⚡ Why this exists

Taking screenshots for Vision AI is hard. If you take a screenshot and then scrape the HTML separately, the page state drifts. Elements move. Popups appear. Your bounding boxes don't match the pixels.

chromium-screenshots guarantees Zero-Drift. It extracts the DOM coordinates (ground truth) and the screenshot (pixels) from the exact same render frame.

Visual Proof

Feature	Standard Tools	chromium-screenshots
Data Extraction	❌ Image Only	✅ Image + DOM + Bounding Boxes
Quality Control	❌ None (hope it loaded)	✅ Quality Score (Good/Low/Poor)
Auth Injection	❌ Cookies only	✅ Cookies + LocalStorage + SessionStorage
AI Integration	❌ Manual API calls	✅ Native MCP Server (Claude/Gemini)
SPA Support	❌ Fails on hydration	✅ Waits for selectors/network idle

🤖 Standardized AI Integration

This tool is a "visual cortex" for your AI agents. It implements the Model Context Protocol (MCP), allowing tools like Claude Desktop to natively control the browser.

screenshot: Returns base64 data for immediate analysis ("What does this button say?").
screenshot_to_file: Saves to disk to preserve context window tokens.
extract_dom: Returns text + coordinates for ground-truth verification.

Comparison with Alternatives

While many tools exist for browser automation and content extraction, chromium-screenshots is specifically designed to provide high-fidelity observation for AI agents, rather than just raw data or static images.

Tool Category	Examples	Screenshot	Structural Data	Quality Metric	Primary Focus
Agent Observation	This Repo	✅	✅ (Atomic DOM)	✅	AI Reliability & Context
LLM RAG Scrapers	Firecrawl, Jina	✅	❌ (Markdown)	❌	Text extraction for reading
Screenshot APIs	ScreenshotOne, ApiFlash	✅	❌ (HTML)	⚠️ (Basic)	Marketing & Archiving
Performance Audit	Lighthouse CI	✅	✅ (Full DOM)	✅	Speed & SEO Audits (Slow)
Visual Testing	Percy, Chromatic	✅	✅ (Snapshot)	✅	Regression Testing (Diffs)

🚀 Quick Start

Docker (Recommended)

Run the containerized service. No dependencies required.

docker compose up -d

The API is now active at http://localhost:8000.

Python (Local)

pip install -r requirements.txt
playwright install chromium
uvicorn app.main:app --reload

💡 Common Recipes

1. Vision AI Ground Truth

Capture screenshot + DOM data + Quality Score in one call.

curl -X POST "http://localhost:8000/screenshot" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "extract_dom": {
      "enabled": true,
      "selectors": ["span.titleline > a"],
      "max_elements": 50
    }
  }' -o hn_capture.png

2. The "Impossible" Auth Shot

Inject localStorage to capture authenticated dashboards (Wasp/Firebase).

curl -X POST "http://localhost:8000/screenshot" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://app.example.com/dashboard",
    "localStorage": {
      "wasp:sessionId": "secret_session_token",
      "theme": "dark"
    },
    "wait_for_selector": ".dashboard-grid"
  }' -o dashboard.png

3. Vision AI Optimization

Get quality metrics and model compatibility hints for Vision AI integrations.

curl -X POST "http://localhost:8000/screenshot/json" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://news.ycombinator.com",
    "extract_dom": {
      "enabled": true,
      "include_metrics": true,
      "include_vision_hints": true,
      "target_vision_model": "claude"
    }
  }' | jq '{quality: .dom_extraction.quality, hints: .vision_hints}'

📚 Documentation

Detailed references for core features:

API Reference - Full endpoint and parameter guide.
DOM Extraction - How to use ground-truth element coordinates.
Quality Assessment - Understanding extraction quality and warnings.
MCP Server - Integration with Claude Desktop & AI agents.

🧠 How It Works

The Zero-Drift Flow:

Inject Auth: Set cookies & localStorage.
Navigate: Load page and wait for networkidle.
Freeze: Pause execution.
Extract: Scrape DOM positions & Text (JS evaluation).
Audit: Run Quality Detection engine (count elements, check visibility).
Capture: Take screenshot.
Return: Send Image + JSON together.

sequenceDiagram
    participant U as 👤 User / Agent
    participant A as ⚡ API / MCP
    participant B as 🕸️ Chromium
    participant Q as 🔍 Quality Engine

    U->>A: POST /screenshot (extract_dom=true)
    A->>B: Create Context & Inject Auth
    B->>B: Navigate & Wait
    
    rect rgb(30, 30, 30)
        note right of B: Critical Section
        B->>B: Extract DOM (JS)
        B->>Q: Assess Quality
        Q-->>B: Quality: GOOD
        B->>B: Capture Pixels
    end
    
    B-->>A: Result (Image + Metadata)
    A-->>U: Return

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
app		app
docs		docs
screenshot_mcp		screenshot_mcp
tests		tests
.coverage		.coverage
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
requirements-mcp.txt		requirements-mcp.txt
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chromium-screenshots

⚡ Why this exists

Visual Proof

🤖 Standardized AI Integration

Comparison with Alternatives

🚀 Quick Start

Docker (Recommended)

Python (Local)

💡 Common Recipes

1. Vision AI Ground Truth

2. The "Impossible" Auth Shot

3. Vision AI Optimization

📚 Documentation

🧠 How It Works

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

samestrin/chromium-screenshots

Folders and files

Latest commit

History

Repository files navigation

chromium-screenshots

⚡ Why this exists

Visual Proof

🤖 Standardized AI Integration

Comparison with Alternatives

🚀 Quick Start

Docker (Recommended)

Python (Local)

💡 Common Recipes

1. Vision AI Ground Truth

2. The "Impossible" Auth Shot

3. Vision AI Optimization

📚 Documentation

🧠 How It Works

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages