feat: MCP Server Architecture for Checkpoint-Based Workflow Execution#200
Draft
feat: MCP Server Architecture for Checkpoint-Based Workflow Execution#200
Conversation
- Add configurable quality_gate settings to config.yml (agent_review_command, default_timeout, default_max_attempts) - Update installer to create quality_gate config section with defaults - Refactor QualityGate to separate system instructions from user payload - Use -s flag to pass instructions as system prompt to review agent - Change file separator format to 20 dashes for clearer delineation - Remove step_instructions from QualityGate interface (not useful for review) - Add quality_review_override_reason to finished_step to skip quality gate - Add JSON schema validation for quality gate responses - Add comprehensive integration tests with mock review agent subprocess - Remove block_bash_with_instructions hook (commit skill not available) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update e2e tests for Claude Code integration - Add quality_criteria to fruits job fixture - Fix test assertions for updated install flow - Minor sync.py adjustments Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The rules system was removed in commit 6b3e1a2. This cleans up stale documentation references to rules_check in hook-related code. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- StateManager now uses a session stack instead of single active session
- Starting a workflow while one is active pushes onto the stack
- Completing a workflow pops from stack and resumes parent
- Added abort_workflow tool with explanation parameter
- All tool responses include stack field [{workflow, step}, ...]
- Added logging to all MCP tool calls with stack info
- Updated server instructions to document nesting and abort
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add `from None` to raise in except clause (B904) - Remove unused variables in tests (F841) - Rename unused loop variable to underscore prefix (B007) - Apply ruff formatting to 14 files Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace flake-utils with uv2nix/pyproject-nix for proper Python dependency management in Nix. This provides hermetic builds directly from uv.lock and supports editable installs for development. Key changes: - Use uv2nix to generate Python package set from uv.lock - Add pyproject-build-systems for build dependency resolution - Add editables to build-system requires (needed by hatchling for editable wheel builds) - Remove .venv management from shell hook (Nix handles it now) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix quality_gate.py to handle Claude CLI --output-format json wrapper objects by extracting the 'result' field before parsing - Add tests for wrapper object handling with strong comments explaining the mock design - Remove deprecated 'exposed' field from learn step in deepwork_jobs - Add 'learn' workflow to make orphaned step accessible via MCP - Add 'update' workflow to update job for MCP compatibility - Migrate stop_hooks to quality_criteria in update job - Clean up settings.json by removing obsolete Skill permissions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the major architectural changes including: - New MCP server with checkpoint-based workflow execution - Removal of the rules system - Simplified skill generation - New deepwork_jobs steps Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Mark 0.7.0 as alpha prerelease so that `uv add deepwork` continues to install the stable 0.5.1 by default, requiring explicit version specification for the new alpha. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Doc specs were never enforced programmatically — the infrastructure to parse them exists but was never wired into quality gates. Remove all doc spec guidance from job instructions to avoid misleading users into creating artifacts that have no effect. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
step_expected_outputs is now an array of ExpectedOutput objects (name, type, description, syntax_for_finished_step_tool) instead of a plain list of names. This tells agents exactly what format to use when calling finished_step — "filepath" for file outputs and "array of filepaths for all individual files" for files outputs — eliminating the string-vs-list type mismatch errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a major architectural shift from skill-file-based workflow execution to a Model Context Protocol (MCP) server that guides agents through workflows via checkpoint calls with quality gate enforcement.
Key changes:
deepwork serve) with three tools:get_workflows,start_workflow,finished_stepabort_workflowcapability/deepworkentry point instead of per-step skillsWhy This Change?
The previous architecture relied heavily on skill files with embedded instructions and rules-based hooks. This had several limitations:
The MCP approach provides:
.deepwork/tmp/Changes by Area
New MCP Module (
src/deepwork/mcp/)server.py- FastMCP server definitiontools.py- MCP tool implementationsstate.py- Workflow session state managementschemas.py- Pydantic models for I/Oquality_gate.py- Quality gate with review agentNew CLI Command
deepwork serve- Starts MCP server (stdio or SSE transport)Updated
deepwork_jobsStandard Jobiterate,errata,test,fix_jobs,fix_settingsdefine,implement,learnstepsRemoved Components
rules_parser.py,rules_queue.py,pattern_matcher.py,rules_check.py)command_executor.py)deepwork_rulesstandard jobcommitandmanual_testsjobsDocumentation
doc/mcp_interface.md- MCP tool referencedoc/reference/calling_claude_in_print_mode.md- Claude CLI subprocess guidedoc/architecture.mdwith Part 4: MCP Server ArchitectureREADME.mdto remove rules referencesTest plan
deepwork install --platform claudein a test projectdeepwork serve/deepworkskilluv run pytest🤖 Generated with Claude Code