Stop writing loops to classify data. classify turns CSV classification into a single command, handles batching automatically, and gives you prompt caching for free.
- Automatic batching - Point at your CSV, get classified data back
- Structured outputs - Define your schema, get valid JSON every time
- Prompt caching - System prompt cached across all rows for significant savings
- Cost estimation - See exact costs before submitting
- Reasoning support - Get explanations for each classification
- Progress tracking - Check status, download results when ready
Requires Python 3.12+
# Install as an isolated tool (recommended)
uv tool install git+https://github.com/alfranz/classify.git
# Or install in current environment
git clone https://github.com/alfranz/classify.git
cd classify
uv pip install -e .
# Or run without installing
uvx --from git+https://github.com/alfranz/classify.git classify --helpSet your API key:
export ANTHROPIC_API_KEY=your_api_key_hereTry the included example:
# Check the example and see cost estimate
classify check examples/example_config.yaml
# Submit the batch
classify run examples/example_config.yaml
# Check status (processing takes ~30-60 minutes)
classify status <batch_id>
# Download and merge results when done
classify pull <batch_id>classify init my_config.yamlThis generates a template like:
settings:
reasoning: true # Add explanations for each field
batch_size: 10000 # Max requests per batch
model: claude-sonnet-4-5-20250929
input:
file: data.csv
columns: [title, description, author]
id_column: user_id # Optional: use existing column as ID (must be unique)
prompt:
system: "You are an expert at categorizing content."
template: |
Categorize this content:
Title: {title}
Description: {description}
Author: {author}
output:
fields:
- name: category
type: string
description: "The content category"
enum: ["cooking", "tech", "sports", "other"]
- name: score
type: integer
description: "Score from 1-5"
- name: confidence
type: number
description: "Confidence from 0.0 to 1.0"
- name: is_flagged
type: boolean
description: "Whether the item should be flagged for review"classify check my_config.yamlThis shows:
- CSV validation (sample rows)
- Token counts per request
- Detailed cost breakdown with caching
- Total estimated cost
classify run my_config.yamlYou'll get a batch ID like batch_abc123def456
classify status batch_abc123def456Batches typically complete in 30-60 minutes.
# Auto-merge with original data (creates <input>_classified.csv)
classify pull batch_abc123def456
# Or specify custom output name
classify pull batch_abc123def456 --output my_results.csv
# Get raw API results without merging (for debugging)
classify pull batch_abc123def456 --rawThis automatically merges classification columns with your original CSV.
Your CSV (10,000 rows)
↓
[classify]
↓
Claude's Batch API
- Prompt caching (cheaper cache hits)
- No rate limits
- ~1 hour processing
↓
Classified CSV
Each row becomes a separate API request with:
- Cached: System prompt + schema (same for all rows)
- Input: Your row data (unique per row)
- Output: Structured classification result
# Initialize new config
classify init config.yaml
# Validate and estimate costs
classify check config.yaml
# Submit batch job
classify run config.yaml
classify run config.yaml --dry-run # Generate files without submitting
# Check status
classify status <batch_id>
# List all batches
classify list
# Download and merge results
classify pull <batch_id>
classify pull <batch_id> --output custom.csv # Custom output name
classify pull <batch_id> --raw # Raw results without mergingDefine your output fields with:
- type:
string,integer,number,boolean - description: What the field represents (include range constraints here for numbers)
- enum: Allowed values (for strings)
output:
fields:
- name: sentiment
type: string
description: "Overall sentiment"
enum: ["positive", "negative", "neutral"]
- name: score
type: integer
description: "Score from 1-10"
- name: has_urgency
type: boolean
description: "Whether the content indicates urgency"With reasoning: true, you also get {field}_reasoning columns explaining each classification.
- Start small: Test with 10-50 rows first to validate your config
- Use reasoning: Adds cost but dramatically improves accuracy and gives you explanations
- Preview before submitting: Run
classify checkto validate your config and see cost estimates - Batch wisely: Default 10k batch size works well; split larger datasets into multiple batches
MIT
Check the documentation for detailed walkthroughs and configuration reference.
