feat: git-native artifact storage — configurable repo for eval run checkpoints

## Objective

Add a git-native storage backend for eval run artifacts, inspired by [entireio/cli](https://github.com/entireio/cli). Eval results (metadata, transcripts, scores) are committed to a configurable branch and remote, making runs self-contained, versionable, and independent of cloud storage.

## Design

### Storage model

Each eval run produces one commit containing:

```
<runId[:2]>/<runId[2:]>/
  metadata.json        # run config, model, timestamps, scores, source commit/repo
  transcript.jsonl     # full session transcript (redacted)
  summary.json         # condensed stats for indexing
```

Sharding by first two hex chars of run ID (up to 256 buckets) prevents directory bloat.

### Branch creation logic

```
Branch exists?
  YES → commit to it (orphan, regular, main — doesn't matter)
  NO  → is it the repo's default branch?
          YES → error (refuse to create main/master)
          NO  → create as orphan, commit to it
```

Once a branch exists, all operations are identical regardless of how it was created.

### Configuration

```yaml
# .agentv/config.yaml
artifacts:
  backend: git                    # "git" | "local" (default: local)
  git:
    remote: agentv-evals          # git remote name or URL
    branch: agentv/checkpoints/v1 # branch name (default)
    path: .agentv/runs            # optional subdirectory prefix (useful on shared branches like main)
```

- **Default backend**: `local` (current behavior, no change)
- **`git` backend**: commits to configured branch, pushes to configured remote
- Remote can be the same repo (`origin`) or a separate repo — user's choice
- **`path`**: when committing to a shared branch like `main`, scopes artifacts under a subdirectory to avoid polluting the root

### Example configurations

**Dedicated eval repo (recommended)**
```yaml
artifacts:
  backend: git
  git:
    remote: git@github.com:org/agentv-evals.git
    branch: agentv/checkpoints/v1
```

**Same repo, orphan branch**
```yaml
artifacts:
  backend: git
  git:
    remote: origin
    branch: agentv/checkpoints/v1
```

**Same repo, main branch (mixed human + machine artifacts)**
```yaml
artifacts:
  backend: git
  git:
    remote: origin
    branch: main
    path: .agentv/runs
```

### Write flow

1. Eval run completes → runner has result payload
2. Git storage backend:
   - Branch exists → fetch latest
   - Branch doesn't exist and isn't default branch → create as orphan
   - Branch doesn't exist and is default branch → error
   - Build tree object with sharded path (under `path` prefix if configured)
   - Commit with message `Run: <runId>` and trailers (`AgentV-Eval`, `AgentV-Model`, `Source-Commit`)
   - Push to configured remote
3. On conflict (concurrent runs): fetch, rebase, retry (append-only so always fast-forward compatible)

### Read flow

- `agentv results list` → `git log <branch> --oneline`
- `agentv results show <runId>` → `git show <branch>:<path>/<shard>/<id>/metadata.json`
- Dashboard / web UI reads from the git remote directly

### Cross-repo linking

Each `metadata.json` includes:
```json
{
  "sourceRepo": "org/repo",
  "sourceCommit": "abc123def",
  "evalFile": "evals/my-eval.yaml",
  "runId": "a3b2c4d5e6f78901",
  "model": "claude-sonnet-4-6",
  "scores": { ... },
  "timestamp": "2026-03-25T12:00:00Z"
}
```

This solves the multi-repo eval problem — runs from different codebases all land in one eval results repo with provenance.

## Why git-native

- **No cloud dependency** — works offline, self-hosted, air-gapped
- **Familiar tooling** — `git log`, `git show`, `git diff` for querying results
- **Access control** — inherits git remote permissions
- **Auditability** — immutable append-only history
- **CI-friendly** — runners just need git push access to the eval repo
- **Separation of concerns** — eval data scales independently of source code

## Why separate repo (recommended default)

- Source repo stays lean (eval transcripts are large, append-only)
- Different retention policies (prune old runs without touching code)
- Scoped CI permissions (eval runners don't need code repo write access)
- Natural home for cross-repo evals

Using `main` on the same repo is fully supported for teams that prefer a single repo with human-editable artifacts alongside automated results.

## Implementation plan

### Phase 1: Git storage backend
1. Add `artifacts.git` config schema to config loader
2. Implement `GitArtifactStore` class with `write(runResult)` and `list()`/`get(runId)` methods
3. Branch creation logic: exists → use it, new + non-default → orphan, new + default → error
4. Sharded path builder: `runId` → `<id[:2]>/<id[2:]>/`
5. Commit with trailers, push to remote

### Phase 2: CLI integration
6. Wire `GitArtifactStore` into eval runner via backend config
7. `agentv results list` — read from git branch
8. `agentv results show <runId>` — read metadata/transcript from git branch

### Phase 3: Concurrency & robustness
9. Fetch-rebase-retry loop for concurrent pushes
10. Graceful handling of missing remote, auth failures, network errors (fall back to local with warning)

### Phase 4: Dashboard integration
11. Dashboard reads results from git remote (extends #563)

## Prior art

- [entireio/cli](https://github.com/entireio/cli) — two-tier model with shadow branches + orphan checkpoint branch, `checkpoint_remote` for separate repo support
- Git notes — similar concept but limited to annotating existing commits

## Acceptance signals

- [ ] `artifacts.backend: git` config option is respected
- [ ] Branch creation follows the exists/orphan/error logic
- [ ] Eval results written to sharded paths on the branch
- [ ] `path` prefix respected when configured (for shared branches)
- [ ] Push to configured remote after each run
- [ ] `agentv results list/show` reads from the git branch
- [ ] Concurrent runs don't corrupt the branch
- [ ] Existing `local` backend unchanged (default)

## Non-goals

- Shadow branches / mid-run checkpointing (entireio's Tier 1) — not needed since we write after run completion
- Git hooks integration — eval runs are triggered by CLI, not git commit
- Transcript deduplication across runs — git's object dedup handles this naturally

## Related

- #333 — session recording and replay
- #563 — self-hosted dashboard (could read from this branch)
- #700 — post-processing patterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: git-native artifact storage — configurable repo for eval run checkpoints #761

Objective

Design

Storage model

Branch creation logic

Configuration

Example configurations

Write flow

Read flow

Cross-repo linking

Why git-native

Why separate repo (recommended default)

Implementation plan

Phase 1: Git storage backend

Phase 2: CLI integration

Phase 3: Concurrency & robustness

Phase 4: Dashboard integration

Prior art

Acceptance signals

Non-goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: git-native artifact storage — configurable repo for eval run checkpoints #761

Description

Objective

Design

Storage model

Branch creation logic

Configuration

Example configurations

Write flow

Read flow

Cross-repo linking

Why git-native

Why separate repo (recommended default)

Implementation plan

Phase 1: Git storage backend

Phase 2: CLI integration

Phase 3: Concurrency & robustness

Phase 4: Dashboard integration

Prior art

Acceptance signals

Non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions