The missing screenshot service for Vision AI & Auth. Inject auth. Extract DOM. Zero-drift capture. Pixel-perfect Chromium.
Taking screenshots for Vision AI is hard. If you take a screenshot and then scrape the HTML separately, the page state drifts. Elements move. Popups appear. Your bounding boxes don't match the pixels.
chromium-screenshots guarantees Zero-Drift. It extracts the DOM coordinates (ground truth) and the screenshot (pixels) from the exact same render frame.
| Feature | Standard Tools | chromium-screenshots |
|---|---|---|
| Data Extraction | ❌ Image Only | ✅ Image + DOM + Bounding Boxes |
| Quality Control | ❌ None (hope it loaded) | ✅ Quality Score (Good/Low/Poor) |
| Auth Injection | ❌ Cookies only | ✅ Cookies + LocalStorage + SessionStorage |
| AI Integration | ❌ Manual API calls | ✅ Native MCP Server (Claude/Gemini) |
| SPA Support | ❌ Fails on hydration | ✅ Waits for selectors/network idle |
This tool is a "visual cortex" for your AI agents. It implements the Model Context Protocol (MCP), allowing tools like Claude Desktop to natively control the browser.
screenshot: Returns base64 data for immediate analysis ("What does this button say?").screenshot_to_file: Saves to disk to preserve context window tokens.extract_dom: Returns text + coordinates for ground-truth verification.
While many tools exist for browser automation and content extraction, chromium-screenshots is specifically designed to provide high-fidelity observation for AI agents, rather than just raw data or static images.
| Tool Category | Examples | Screenshot | Structural Data | Quality Metric | Primary Focus |
|---|---|---|---|---|---|
| Agent Observation | This Repo | ✅ | ✅ (Atomic DOM) | ✅ | AI Reliability & Context |
| LLM RAG Scrapers | Firecrawl, Jina | ✅ | ❌ (Markdown) | ❌ | Text extraction for reading |
| Screenshot APIs | ScreenshotOne, ApiFlash | ✅ | ❌ (HTML) | Marketing & Archiving | |
| Performance Audit | Lighthouse CI | ✅ | ✅ (Full DOM) | ✅ | Speed & SEO Audits (Slow) |
| Visual Testing | Percy, Chromatic | ✅ | ✅ (Snapshot) | ✅ | Regression Testing (Diffs) |
Run the containerized service. No dependencies required.
docker compose up -dThe API is now active at
http://localhost:8000.
pip install -r requirements.txt
playwright install chromium
uvicorn app.main:app --reloadCapture screenshot + DOM data + Quality Score in one call.
curl -X POST "http://localhost:8000/screenshot" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"extract_dom": {
"enabled": true,
"selectors": ["span.titleline > a"],
"max_elements": 50
}
}' -o hn_capture.pngInject localStorage to capture authenticated dashboards (Wasp/Firebase).
curl -X POST "http://localhost:8000/screenshot" \
-H "Content-Type: application/json" \
-d '{
"url": "https://app.example.com/dashboard",
"localStorage": {
"wasp:sessionId": "secret_session_token",
"theme": "dark"
},
"wait_for_selector": ".dashboard-grid"
}' -o dashboard.pngGet quality metrics and model compatibility hints for Vision AI integrations.
curl -X POST "http://localhost:8000/screenshot/json" \
-H "Content-Type: application/json" \
-d '{
"url": "https://news.ycombinator.com",
"extract_dom": {
"enabled": true,
"include_metrics": true,
"include_vision_hints": true,
"target_vision_model": "claude"
}
}' | jq '{quality: .dom_extraction.quality, hints: .vision_hints}'Detailed references for core features:
- API Reference - Full endpoint and parameter guide.
- DOM Extraction - How to use ground-truth element coordinates.
- Quality Assessment - Understanding extraction quality and warnings.
- MCP Server - Integration with Claude Desktop & AI agents.
The Zero-Drift Flow:
- Inject Auth: Set
cookies&localStorage. - Navigate: Load page and wait for
networkidle. - Freeze: Pause execution.
- Extract: Scrape DOM positions & Text (JS evaluation).
- Audit: Run Quality Detection engine (count elements, check visibility).
- Capture: Take screenshot.
- Return: Send Image + JSON together.
sequenceDiagram
participant U as 👤 User / Agent
participant A as ⚡ API / MCP
participant B as 🕸️ Chromium
participant Q as 🔍 Quality Engine
U->>A: POST /screenshot (extract_dom=true)
A->>B: Create Context & Inject Auth
B->>B: Navigate & Wait
rect rgb(30, 30, 30)
note right of B: Critical Section
B->>B: Extract DOM (JS)
B->>Q: Assess Quality
Q-->>B: Quality: GOOD
B->>B: Capture Pixels
end
B-->>A: Result (Image + Metadata)
A-->>U: Return
