Crucible

Alpha software. Crucible works for the author's use case (autonomous ML research on RunPod). It may work for yours. APIs will change. Bug reports and PRs welcome.

Autonomous ML research on rental GPUs. LLM-driven hypothesis generation + fleet orchestration on RunPod/SSH.

You bring a training script. Crucible decides what experiments to run, provisions the compute, executes them across tiers, and learns from the results.

Why Crucible?

No single existing tool combines fleet orchestration on rental GPUs with autonomous experiment design. The closest alternatives:

SkyPilot provisions GPUs across 20+ clouds but doesn't decide what experiments to run
Optuna/Ax optimize hyperparameters mathematically but don't provision compute or reason about architectures
AI Scientist generates hypotheses but runs single-machine with a 42% failure rate and no fleet management
W&B/MLflow track experiments but don't execute them autonomously

Crucible connects these concerns into one loop: analyze → hypothesize → provision → execute → reflect → promote or kill.

Origins

Born from OpenAI Parameter Golf (March–April 2026), a competition to train the best 16MB language model on 8xH100s in 10 minutes. The autonomous research infrastructure we built for the competition turned out to be general-purpose. Crucible extracts and generalizes it.

What Works Today

Fleet orchestration on RunPod (provision, bootstrap, dispatch, collect, destroy)
Generic SSH provider for any machine you can SSH into
Experiment execution with live output parsing, OOM retry, tier presets
Claude-driven autonomous research loop (hypothesis → batch → execute → reflect)
MCP server so Claude can control experiments via tool use
Model zoo with transformer components (RMSNorm, RoPE, GQA, SmearGate, etc.)
Analysis: leaderboard, sensitivity analysis, Pareto frontier
YAML project configuration (crucible.yaml)
Experiment notes (attach freeform observations to runs with YAML frontmatter)
Research tracks (group projects by research direction in the Crucible Hub)
Crucible Hub (~/.crucible-hub/) for cross-project knowledge sharing, git-synced
Research briefing (LLM session orientation with project context and findings)
REST API server (crucible serve) — 10 FastAPI endpoints wrapping MCP tools
W&B bridge with image logging and run annotation support
77 MCP tools for AI agent integration (fleet, design, context, notes, hub, tracks, briefing, architecture composition, tree search, training generalization)
Interactive TUI for browsing experiment designs grouped by status

What's Coming

SkyPilot provider (20+ cloud support)
Optuna/Ax integration (mathematical HPO alongside LLM-driven search)
Code-level search (LLM modifies training scripts, not just configs)
Research strategy plugins (custom loop phases)
PyPI release

Quick Start

# Install from source
pip install -e ".[all]"

# Initialize a project
crucible init

# Edit crucible.yaml — point at your training script

# Run a smoke test
crucible run experiment --preset smoke

# Or go autonomous
crucible research start --budget-hours 10 --tier proxy

Core Concepts

crucible.yaml

Like docker-compose for ML experiments:

name: my-project
provider:
  type: runpod
  gpu_types: ["NVIDIA GeForce RTX 4090"]
training:
  - backend: torch
    script: train.py
presets:
  smoke: { MAX_WALLCLOCK_SECONDS: "60", ITERATIONS: "400" }
  proxy: { MAX_WALLCLOCK_SECONDS: "1800", ITERATIONS: "6000" }
researcher:
  model: claude-sonnet-4-6-20250514
  budget_hours: 10.0

Training Contract

Crucible doesn't own your training code. Any script that reads env vars and prints parseable output works:

Input (env vars):

ITERATIONS, MAX_WALLCLOCK_SECONDS, TRAIN_BATCH_TOKENS
MODEL_FAMILY, NUM_LAYERS, MODEL_DIM, etc.
RUN_ID, RUN_BACKEND, RUN_PRESET

Output (stdout patterns):

step:{step}/{total} train_loss:{loss}
step:{step}/{total} val_loss:{loss} val_bpb:{bpb}
Serialized model ... {N} bytes

Experiment Tiers

Experiments earn their way to expensive compute:

Tier	Duration	Use Case
smoke	~1 min	Quick validation
proxy	~30 min	Main exploration
medium	~1 hr	Extended runs
promotion	~2 hrs	Best candidates

Fleet Management

crucible fleet provision --count 4
crucible fleet bootstrap
crucible fleet status
crucible fleet destroy

Autonomous Researcher

Claude-powered: analyze results, generate hypotheses, design batches, execute, reflect, promote or kill.

crucible research start --budget-hours 10 --tier proxy --dry-run

MCP Integration

crucible mcp serve  # starts stdio MCP server for Claude (77 tools)
crucible serve      # starts REST API server (FastAPI, 10 endpoints)

CLI Reference

crucible init
crucible fleet {status|provision|destroy|bootstrap|sync|monitor}
crucible run {experiment|queue|enqueue|dispatch|collect|day|night}
crucible analyze {rank|sensitivity|pareto|export|summary}
crucible research {start|status}
crucible data {download|sync|status}
crucible mcp serve
crucible models list
crucible hub {status|sync|findings}
crucible track {create|list|switch}
crucible note {add|get|search}
crucible serve [--port PORT]
crucible tui
crucible store {list|diff|get}

Installation

pip install crucible-ml[all]        # everything
pip install crucible-ml             # minimal (orchestration only)
pip install crucible-ml[torch]      # model zoo
pip install crucible-ml[anthropic]  # autonomous researcher
pip install crucible-ml[mcp]        # MCP server

Validated Workflow (Tested 2026-03-21)

This exact sequence was run and confirmed working on 2 RunPod pods:

cd /path/to/your-ml-project
crucible fleet provision --count 2 --name-prefix crucible-test
crucible fleet bootstrap --train-shards 1
crucible run enqueue --spec experiments.json --limit 3
crucible run dispatch
crucible fleet monitor --watch 60
crucible run collect
crucible analyze rank --top 10
crucible fleet destroy

Contributing

See CONTRIBUTING.md. Highest-impact areas:

Compute providers: Modal, Lambda, SkyPilot backends
Search strategies: Optuna, Ax integration
Training script examples: Show Crucible working with your framework
Bug reports: File issues, we'll fix them

Roadmap

See ROADMAP.md for the full plan — what works, what's next, what we won't build, and honest competitive assessment.

Project Structure

src/crucible/
├── core/          # Config, env, I/O, types, logging, finding, hub
├── fleet/         # Provider-abstracted fleet management
│   └── providers/ # RunPod, SSH backends
├── runner/        # Experiment execution, output parsing, presets, tracking, notes
├── models/        # Model zoo (components + architectures)
├── researcher/    # LLM-driven autonomous research loop, briefing
├── analysis/      # Leaderboard, sensitivity, Pareto frontier
├── data/          # Manifest-driven HuggingFace data pipeline
├── mcp/           # MCP server for Claude agent integration (77 tools)
├── training/      # Training backends (torch) — factored from train_gpt.py
├── api/           # Lightweight REST API server (FastAPI)
├── tui/           # Interactive experiment design browser (Textual)
└── cli/           # CLI entry points

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
docs		docs
examples		examples
scripts		scripts
specs		specs
src/crucible		src/crucible
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
crucible.yaml		crucible.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train_gpt.py		train_gpt.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crucible

Why Crucible?

Origins

What Works Today

What's Coming

Quick Start

Core Concepts

crucible.yaml

Training Contract

Experiment Tiers

Fleet Management

Autonomous Researcher

MCP Integration

CLI Reference

Installation

Validated Workflow (Tested 2026-03-21)

Contributing

Roadmap

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Crucible

Why Crucible?

Origins

What Works Today

What's Coming

Quick Start

Core Concepts

crucible.yaml

Training Contract

Experiment Tiers

Fleet Management

Autonomous Researcher

MCP Integration

CLI Reference

Installation

Validated Workflow (Tested 2026-03-21)

Contributing

Roadmap

Project Structure

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages