Skip to content

feat: MCP Server Architecture for Checkpoint-Based Workflow Execution#200

Draft
nhorton wants to merge 38 commits intomainfrom
mcp-variant
Draft

feat: MCP Server Architecture for Checkpoint-Based Workflow Execution#200
nhorton wants to merge 38 commits intomainfrom
mcp-variant

Conversation

@nhorton
Copy link
Contributor

@nhorton nhorton commented Feb 4, 2026

Summary

This PR introduces a major architectural shift from skill-file-based workflow execution to a Model Context Protocol (MCP) server that guides agents through workflows via checkpoint calls with quality gate enforcement.

Key changes:

  • New MCP Server (deepwork serve) with three tools: get_workflows, start_workflow, finished_step
  • Quality Gates that evaluate step outputs against criteria using Claude Code subprocess
  • Nested Workflow Support with stack-based execution and abort_workflow capability
  • Simplified Skill Generation - single /deepwork entry point instead of per-step skills
  • Rules System Removed - entire rules subsystem (parser, queue, pattern matcher, hooks) deleted

Why This Change?

The previous architecture relied heavily on skill files with embedded instructions and rules-based hooks. This had several limitations:

  1. Complex rules evaluation at every agent stop event
  2. Difficult to track workflow state across steps
  3. No structured quality enforcement
  4. Hard to resume or debug workflows

The MCP approach provides:

  1. Centralized state - Session state persisted and visible in .deepwork/tmp/
  2. Quality gates - Automated validation before proceeding to next step
  3. Structured checkpoints - Clear handoff points between steps
  4. Resumability - Sessions can be loaded and resumed
  5. Observability - All state changes logged and inspectable

Changes by Area

New MCP Module (src/deepwork/mcp/)

  • server.py - FastMCP server definition
  • tools.py - MCP tool implementations
  • state.py - Workflow session state management
  • schemas.py - Pydantic models for I/O
  • quality_gate.py - Quality gate with review agent

New CLI Command

  • deepwork serve - Starts MCP server (stdio or SSE transport)

Updated deepwork_jobs Standard Job

  • New steps: iterate, errata, test, fix_jobs, fix_settings
  • Streamlined define, implement, learn steps

Removed Components

  • Entire rules system (rules_parser.py, rules_queue.py, pattern_matcher.py, rules_check.py)
  • Command executor (command_executor.py)
  • deepwork_rules standard job
  • Per-step skill templates
  • Many hook scripts
  • commit and manual_tests jobs

Documentation

  • New doc/mcp_interface.md - MCP tool reference
  • New doc/reference/calling_claude_in_print_mode.md - Claude CLI subprocess guide
  • Updated doc/architecture.md with Part 4: MCP Server Architecture
  • Updated README.md to remove rules references

Test plan

  • Run deepwork install --platform claude in a test project
  • Verify MCP server starts with deepwork serve
  • Test workflow execution via /deepwork skill
  • Verify quality gate evaluation works
  • Run existing test suite: uv run pytest

🤖 Generated with Claude Code

nhorton and others added 25 commits February 3, 2026 12:14
- Add configurable quality_gate settings to config.yml (agent_review_command,
  default_timeout, default_max_attempts)
- Update installer to create quality_gate config section with defaults
- Refactor QualityGate to separate system instructions from user payload
- Use -s flag to pass instructions as system prompt to review agent
- Change file separator format to 20 dashes for clearer delineation
- Remove step_instructions from QualityGate interface (not useful for review)
- Add quality_review_override_reason to finished_step to skip quality gate
- Add JSON schema validation for quality gate responses
- Add comprehensive integration tests with mock review agent subprocess
- Remove block_bash_with_instructions hook (commit skill not available)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update e2e tests for Claude Code integration
- Add quality_criteria to fruits job fixture
- Fix test assertions for updated install flow
- Minor sync.py adjustments

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rules system was removed in commit 6b3e1a2. This cleans up
stale documentation references to rules_check in hook-related code.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- StateManager now uses a session stack instead of single active session
- Starting a workflow while one is active pushes onto the stack
- Completing a workflow pops from stack and resumes parent
- Added abort_workflow tool with explanation parameter
- All tool responses include stack field [{workflow, step}, ...]
- Added logging to all MCP tool calls with stack info
- Updated server instructions to document nesting and abort

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add `from None` to raise in except clause (B904)
- Remove unused variables in tests (F841)
- Rename unused loop variable to underscore prefix (B007)
- Apply ruff formatting to 14 files

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace flake-utils with uv2nix/pyproject-nix for proper Python
dependency management in Nix. This provides hermetic builds directly
from uv.lock and supports editable installs for development.

Key changes:
- Use uv2nix to generate Python package set from uv.lock
- Add pyproject-build-systems for build dependency resolution
- Add editables to build-system requires (needed by hatchling for
  editable wheel builds)
- Remove .venv management from shell hook (Nix handles it now)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix quality_gate.py to handle Claude CLI --output-format json wrapper
  objects by extracting the 'result' field before parsing
- Add tests for wrapper object handling with strong comments explaining
  the mock design
- Remove deprecated 'exposed' field from learn step in deepwork_jobs
- Add 'learn' workflow to make orphaned step accessible via MCP
- Add 'update' workflow to update job for MCP compatibility
- Migrate stop_hooks to quality_criteria in update job
- Clean up settings.json by removing obsolete Skill permissions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the major architectural changes including:
- New MCP server with checkpoint-based workflow execution
- Removal of the rules system
- Simplified skill generation
- New deepwork_jobs steps

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@nhorton nhorton changed the title Mcp variant feat: MCP Server Architecture for Checkpoint-Based Workflow Execution Feb 5, 2026
Mark 0.7.0 as alpha prerelease so that `uv add deepwork` continues
to install the stable 0.5.1 by default, requiring explicit version
specification for the new alpha.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
nhorton and others added 4 commits February 5, 2026 13:41
Doc specs were never enforced programmatically — the infrastructure
to parse them exists but was never wired into quality gates. Remove
all doc spec guidance from job instructions to avoid misleading users
into creating artifacts that have no effect.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
nhorton and others added 6 commits February 5, 2026 16:28
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
step_expected_outputs is now an array of ExpectedOutput objects (name, type,
description, syntax_for_finished_step_tool) instead of a plain list of names.
This tells agents exactly what format to use when calling finished_step —
"filepath" for file outputs and "array of filepaths for all individual files"
for files outputs — eliminating the string-vs-list type mismatch errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants