fix(runtime): 3623 close runtime integrity gaps across persistence, MCP, skills, and orchestration#3628
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
There was a problem hiding this comment.
Pull request overview
This PR addresses the runtime-integrity gaps tracked in #3623 by making persistence/telemetry more faithful, ensuring MCP lifecycle tools are honest about runtime availability, routing MCP skills operations through tau-skills, and aligning plan_executor documentation with its actual (non-executing) surface.
Changes:
- Persist action-history on primary prompt paths and record real turn/latency telemetry for tool executions in
tau-agent-core. - Replace MCP lifecycle “fake success” responses with explicit runtime-unavailable error contracts, and delegate MCP skills list/info/install behavior to
tau-skills. - Align
plan_executordocs/tests with its reporting + deadlock-analysis-only behavior, and add milestone/spec artifacts for M329/#3623 slices.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| specs/milestones/m329/index.md | Adds milestone container defining scope/exit criteria for the runtime integrity wave. |
| specs/3623/spec.md | Defines epic binding spec for closing persistence/telemetry/MCP/skills/orchestration integrity gaps. |
| specs/3623/plan.md | Plans the work in slices with verification commands and risks. |
| specs/3623/tasks.md | Tracks ordered tasks + test mapping for the epic. |
| specs/3623/pr.md | PR handoff/review instructions + scoped staging list and verification notes. |
| specs/3624/spec.md | Spec for prompt-path persistence + tool telemetry fidelity. |
| specs/3624/plan.md | Plan for preserving prompt persistence fix and threading real telemetry. |
| specs/3624/tasks.md | Tasks + test mapping for action-history persistence/telemetry slice. |
| specs/3625/spec.md | Spec for honest MCP lifecycle tool behavior (runtime-unavailable contracts). |
| specs/3625/plan.md | Plan for replacing fake-success MCP lifecycle responses + docs/tests. |
| specs/3625/tasks.md | Tasks + test mapping for MCP lifecycle honesty slice. |
| specs/3626/spec.md | Spec for routing MCP skills list/info/install through tau-skills. |
| specs/3626/plan.md | Plan for skills parity and lockfile metadata writes via tau-skills. |
| specs/3626/tasks.md | Tasks + test mapping for skills parity slice. |
| specs/3627/spec.md | Spec for aligning plan_executor docs with reporting/deadlock-only reality. |
| specs/3627/plan.md | Plan for docs correction + regression tests in orchestrator. |
| specs/3627/tasks.md | Tasks + test mapping for plan_executor alignment slice. |
| crates/tau-agent-core/src/lib.rs | Saves action history at end of prompt/continue-turn; records tool telemetry with real turn + measured latency. |
| crates/tau-agent-core/src/tests/action_history.rs | Adds regressions proving prompt-path persistence and real turn/latency telemetry recording. |
| crates/tau-tools/src/mcp_server_runtime.rs | Adds runtime-unavailable helper for lifecycle tools; delegates skills operations to tau-skills; adds regressions for new contracts. |
| crates/tau-ops/src/mcp_sdk.rs | Updates generated MCP tool docs to mark lifecycle tools as runtime-dependent; adds regression test. |
| crates/tau-orchestrator/src/plan_executor.rs | Updates module/item docs to reporting/deadlock-only; adds regression test locking wording. |
| Cargo.lock | Adds tau-skills dependency where needed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let install_sources = if name_override.is_some() { | ||
| let root = std::env::temp_dir().join(format!( | ||
| "tau-mcp-skill-install-{}", | ||
| std::time::SystemTime::now() | ||
| .duration_since(std::time::UNIX_EPOCH) | ||
| .map(|duration| duration.as_nanos()) | ||
| .unwrap_or(0) | ||
| )); | ||
| std::fs::create_dir_all(&root) | ||
| .with_context(|| format!("failed to create {}", root.display()))?; | ||
| let staged_path = root.join(&dest_name); | ||
| let content = std::fs::read_to_string(source_path) | ||
| .with_context(|| format!("failed to read skill source {}", source_path.display()))?; | ||
| std::fs::write(&staged_path, content) | ||
| .with_context(|| format!("failed to stage skill source {}", staged_path.display()))?; |
There was a problem hiding this comment.
name_override is interpolated into dest_name and then joined onto temp_dir()/written without any sanitization. If a caller passes an absolute path or ../ segments, root.join(&dest_name) can escape the staging dir (and even ignore root entirely for absolute paths), resulting in arbitrary file writes outside the skills area. Validate/sanitize the override to a safe filename (e.g., reject path separators/absolute paths, normalize to Path::new(name).file_name()), or use a safer staging approach that cannot escape the intended directory.
| if let Some(cached) = self.lookup_tool_result_cache(&call) { | ||
| pending.push((call, PendingToolResult::Ready(cached))); | ||
| pending.push(( | ||
| call, | ||
| PendingToolResult::Ready(TimedToolExecutionResult { | ||
| result: cached, | ||
| latency_ms: 1, | ||
| }), | ||
| )); |
There was a problem hiding this comment.
In the parallel tool-call path, cached tool results are assigned a hard-coded latency_ms: 1 rather than a measured value. This makes persisted telemetry look like a real timing measurement even though it’s a placeholder. Consider measuring the actual wall time for handling the tool call (even when served from cache), or explicitly representing cached results so latency isn’t fabricated.
| let latency_ms = timed_result.latency_ms; | ||
| let result = self | ||
| .maybe_execute_branch_followup(&call, result, &mut branch_slot_holders) | ||
| .maybe_execute_branch_followup( | ||
| &call, | ||
| timed_result.result, | ||
| &mut branch_slot_holders, | ||
| ) | ||
| .await; | ||
| self.store_tool_result_cache(&call, &result); | ||
| let is_error = self.record_tool_result(call, result); | ||
| let is_error = self.record_tool_result(turn, call, result, latency_ms); | ||
| if is_error { |
There was a problem hiding this comment.
Tool latency is measured inconsistently between the serial and parallel execution paths. In the parallel path, latency_ms is taken from execute_tool_call_inner before maybe_execute_branch_followup, but in the serial path latency includes follow-up time (and cache lookup). This can skew telemetry depending on max_parallel_tool_calls. Pick a consistent definition (tool-only vs tool+followup) and measure it the same way in both paths.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e6b98d536f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| match tau_skills::write_skills_lockfile(&state.skills_dir, &lock_path, &[lock_hint]) { | ||
| Ok(lockfile) => lockfile, |
There was a problem hiding this comment.
Skip lockfile write that breaks directory-skill installs
Calling write_skills_lockfile unconditionally here makes tau.skills_install fail in catalogs that contain multiple directory-backed skills (each has SKILL.md), because lockfile validation rejects duplicate file entries. In that case the install itself succeeds but this post-step returns an error ("skill installed but lockfile update failed"), so users get a failed tool call for a successful install. This regression is triggered whenever the skills dir has 2+ folder skills and then any install is attempted.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
|
CI follow-up on
Evidence used locally:
So this PR-specific CI follow-up is complete; the remaining blocker is pre-existing base-branch policy debt, not a regression from #3623. |
|
Follow-up created for the remaining non-3623 blocker:
This keeps the |
Summary
This PR closes the runtime-integrity gaps tracked in #3623 on top of
feat/radical-simplification-self-improvement.It fixes prompt-path action-history persistence, records real tool turn/latency telemetry, makes MCP lifecycle tools return explicit runtime-unavailable contracts instead of fake success, routes MCP skills operations through
tau-skills, and alignsplan_executordocumentation with its actual reporting/deadlock-analysis surface.Links
specs/milestones/m329/index.mdspecs/3623/spec.mdspecs/3623/plan.mdspecs/3623/tasks.mdSpec Verification (AC -> tests)
tests::action_history::regression_prompt_persists_action_history_store_when_enabled,tests::action_history::regression_prompt_json_persists_action_history_store_when_enabled,tests::action_history::regression_prompt_with_stream_persists_action_history_store_when_enabledtests::action_history::spec_3624_c03_tool_history_records_real_turn_and_latency_valuesmcp_server_runtime::tests::regression_training_trigger_reports_runtime_unavailable_error,mcp_server_runtime::tests::regression_agent_lifecycle_tools_report_runtime_unavailable_errors,mcp_sdk::tests::regression_generate_mcp_tool_docs_marks_runtime_dependent_lifecycle_toolsmcp_server_runtime::tests::regression_skills_list_and_info_follow_tau_skills_catalog_resolution,mcp_server_runtime::tests::regression_skills_install_writes_skills_lockfile_metadata,tau-skills install_skills,tau-skills skills_lockfileplan_executor::tests::regression_source_docs_describe_reporting_and_deadlock_helpers_only,cargo test -p tau-orchestrator plan_executor -- --test-threads=1TDD Evidence
RED:
crates/tau-agent-core/src/tests/action_history.rs.crates/tau-tools/src/mcp_server_runtime.rsand docs regression coverage incrates/tau-ops/src/mcp_sdk.rs.crates/tau-orchestrator/src/plan_executor.rs.GREEN:
CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-agent-coreCARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-integration-tests --test agent_tool_memory_roundtripCARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-tools mcp_server_runtime -- --test-threads=1CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-ops mcp_sdk -- --test-threads=1CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-skills install_skills -- --nocaptureCARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-skills skills_lockfile -- --nocaptureCARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-orchestrator plan_executor -- --test-threads=1Verification:
rustfmt --check --edition 2021 --config skip_children=true crates/tau-agent-core/src/lib.rsrustfmt --check --edition 2021 crates/tau-agent-core/src/tests/action_history.rs crates/tau-tools/src/mcp_server_runtime.rs crates/tau-ops/src/mcp_sdk.rs crates/tau-orchestrator/src/plan_executor.rsTest Tiers
tau-agent-core,tau-tools,tau-ops,tau-skills,tau-orchestrator3624-3627cargo test -p tau-integration-tests --test agent_tool_memory_roundtripRisks / Rollback
feat/radical-simplification-self-improvement, notmaster, because this branch is exactly one commit on top of that feature branch and would otherwise carry 50 commits againstmaster.cargo fmt --checkandcargo clippy -- -D warningsremain blocked by pre-existing workspace drift and existingtau-extensionsdeprecation warnings outside this PR’s scope.e6b98d53.Docs
Updated spec and planning artifacts:
specs/milestones/m329/index.mdspecs/3623/specs/3624/specs/3625/specs/3626/specs/3627/