Skip to content

feat: wire data_diff tool for deterministic data validation#107

Closed
suryaiyer95 wants to merge 10000 commits intomainfrom
feat/data-validation-mode
Closed

feat: wire data_diff tool for deterministic data validation#107
suryaiyer95 wants to merge 10000 commits intomainfrom
feat/data-validation-mode

Conversation

@suryaiyer95
Copy link
Contributor

@suryaiyer95 suryaiyer95 commented Mar 11, 2026

What does this PR do?

Adds a data_diff tool and data-diff agent mode that wraps the Rust reladiff engine for deterministic table-to-table data validation. Tested end-to-end on Snowflake with up to 1M rows.

Pipeline:

LLM (data-diff mode) → data_diff tool (TS) → Bridge.call("data_diff.run")
→ JSON-RPC → server.py → run_data_diff() → ReladiffSession (Rust)
→ cooperative loop (SQL tasks ↔ ConnectionRegistry) → structured result

Files changed:

  • data-diff-run.ts — TypeScript tool calling Bridge.call("data_diff.run")
  • data_diff.py — Python orchestrator driving the cooperative state machine loop
  • server.py — Registers data_diff.run in JSON-RPC dispatcher
  • protocol.tsDataDiffRunParams/DataDiffRunResult bridge protocol types
  • agent.tsdata-diff agent mode with SQL/warehouse tool permissions
  • data-diff.txt — System prompt for data-diff agent
  • SKILL.md/data-validate skill for guided validation workflows
  • guard.py — Updated docstrings (no longer requires API keys)

Type of change

  • New feature (non-breaking change which adds functionality)

How did you verify your code works?

End-to-end tested on Snowflake across all 4 algorithms and at scale (up to 1M rows, <12s).

Issue for this PR

Internal feature — data validation mode for altimate-code CLI.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.