Skip to content

fix(runtime): 3623 close runtime integrity gaps across persistence, MCP, skills, and orchestration#3628

Open
njfio wants to merge 2 commits intofeat/radical-simplification-self-improvementfrom
codex/issue-3623-runtime-integrity-closure
Open

fix(runtime): 3623 close runtime integrity gaps across persistence, MCP, skills, and orchestration#3628
njfio wants to merge 2 commits intofeat/radical-simplification-self-improvementfrom
codex/issue-3623-runtime-integrity-closure

Conversation

@njfio
Copy link
Copy Markdown
Owner

@njfio njfio commented Mar 30, 2026

Summary

This PR closes the runtime-integrity gaps tracked in #3623 on top of feat/radical-simplification-self-improvement.

It fixes prompt-path action-history persistence, records real tool turn/latency telemetry, makes MCP lifecycle tools return explicit runtime-unavailable contracts instead of fake success, routes MCP skills operations through tau-skills, and aligns plan_executor documentation with its actual reporting/deadlock-analysis surface.

Links

Spec Verification (AC -> tests)

AC Status Test(s)
AC-1 Prompt-path action history persists consistently tests::action_history::regression_prompt_persists_action_history_store_when_enabled, tests::action_history::regression_prompt_json_persists_action_history_store_when_enabled, tests::action_history::regression_prompt_with_stream_persists_action_history_store_when_enabled
AC-2 Persisted telemetry reflects real execution data tests::action_history::spec_3624_c03_tool_history_records_real_turn_and_latency_values
AC-3 MCP stateful tools are honest about runtime effects mcp_server_runtime::tests::regression_training_trigger_reports_runtime_unavailable_error, mcp_server_runtime::tests::regression_agent_lifecycle_tools_report_runtime_unavailable_errors, mcp_sdk::tests::regression_generate_mcp_tool_docs_marks_runtime_dependent_lifecycle_tools
AC-4 MCP skills operations use the trusted skills surface mcp_server_runtime::tests::regression_skills_list_and_info_follow_tau_skills_catalog_resolution, mcp_server_runtime::tests::regression_skills_install_writes_skills_lockfile_metadata, tau-skills install_skills, tau-skills skills_lockfile
AC-5 Orchestration execution claims match reality plan_executor::tests::regression_source_docs_describe_reporting_and_deadlock_helpers_only, cargo test -p tau-orchestrator plan_executor -- --test-threads=1

TDD Evidence

RED:

  • Added failing prompt-path persistence and telemetry tests in crates/tau-agent-core/src/tests/action_history.rs.
  • Added failing MCP runtime-unavailable and skills-parity tests in crates/tau-tools/src/mcp_server_runtime.rs and docs regression coverage in crates/tau-ops/src/mcp_sdk.rs.
  • Added failing documentation regression coverage in crates/tau-orchestrator/src/plan_executor.rs.

GREEN:

  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-agent-core
  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-integration-tests --test agent_tool_memory_roundtrip
  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-tools mcp_server_runtime -- --test-threads=1
  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-ops mcp_sdk -- --test-threads=1
  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-skills install_skills -- --nocapture
  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-skills skills_lockfile -- --nocapture
  • CARGO_TARGET_DIR=/tmp/tau-3623-clean-target cargo test -p tau-orchestrator plan_executor -- --test-threads=1

Verification:

  • rustfmt --check --edition 2021 --config skip_children=true crates/tau-agent-core/src/lib.rs
  • rustfmt --check --edition 2021 crates/tau-agent-core/src/tests/action_history.rs crates/tau-tools/src/mcp_server_runtime.rs crates/tau-ops/src/mcp_sdk.rs crates/tau-orchestrator/src/plan_executor.rs

Test Tiers

Tier ✅/❌/N/A Tests N/A Why
Unit targeted crate tests across tau-agent-core, tau-tools, tau-ops, tau-skills, tau-orchestrator
Property N/A no new invariant/property surfaces were introduced
Contract/DbC N/A touched APIs do not use DbC/contracts in this slice
Snapshot N/A no snapshot-managed outputs involved
Functional prompt/tool/MCP/skills/plan-executor flows in targeted tests
Conformance AC/C-case-mapped regressions across issues 3624-3627
Integration cargo test -p tau-integration-tests --test agent_tool_memory_roundtrip
Fuzz N/A no parser/untrusted-input fuzz surface added
Mutation N/A not run for this slice; follow existing repo policy on critical-path mutation separately
Regression new regression tests in agent core, MCP runtime, MCP docs, and orchestrator docs
Performance N/A no performance-sensitive algorithm/runtime scheduling change

Risks / Rollback

  • Base branch is feat/radical-simplification-self-improvement, not master, because this branch is exactly one commit on top of that feature branch and would otherwise carry 50 commits against master.
  • Repo-level cargo fmt --check and cargo clippy -- -D warnings remain blocked by pre-existing workspace drift and existing tau-extensions deprecation warnings outside this PR’s scope.
  • Rollback is a clean revert of commit e6b98d53.

Docs

Updated spec and planning artifacts:

  • specs/milestones/m329/index.md
  • specs/3623/
  • specs/3624/
  • specs/3625/
  • specs/3626/
  • specs/3627/

Copilot AI review requested due to automatic review settings March 30, 2026 12:11
Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses the runtime-integrity gaps tracked in #3623 by making persistence/telemetry more faithful, ensuring MCP lifecycle tools are honest about runtime availability, routing MCP skills operations through tau-skills, and aligning plan_executor documentation with its actual (non-executing) surface.

Changes:

  • Persist action-history on primary prompt paths and record real turn/latency telemetry for tool executions in tau-agent-core.
  • Replace MCP lifecycle “fake success” responses with explicit runtime-unavailable error contracts, and delegate MCP skills list/info/install behavior to tau-skills.
  • Align plan_executor docs/tests with its reporting + deadlock-analysis-only behavior, and add milestone/spec artifacts for M329/#3623 slices.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
specs/milestones/m329/index.md Adds milestone container defining scope/exit criteria for the runtime integrity wave.
specs/3623/spec.md Defines epic binding spec for closing persistence/telemetry/MCP/skills/orchestration integrity gaps.
specs/3623/plan.md Plans the work in slices with verification commands and risks.
specs/3623/tasks.md Tracks ordered tasks + test mapping for the epic.
specs/3623/pr.md PR handoff/review instructions + scoped staging list and verification notes.
specs/3624/spec.md Spec for prompt-path persistence + tool telemetry fidelity.
specs/3624/plan.md Plan for preserving prompt persistence fix and threading real telemetry.
specs/3624/tasks.md Tasks + test mapping for action-history persistence/telemetry slice.
specs/3625/spec.md Spec for honest MCP lifecycle tool behavior (runtime-unavailable contracts).
specs/3625/plan.md Plan for replacing fake-success MCP lifecycle responses + docs/tests.
specs/3625/tasks.md Tasks + test mapping for MCP lifecycle honesty slice.
specs/3626/spec.md Spec for routing MCP skills list/info/install through tau-skills.
specs/3626/plan.md Plan for skills parity and lockfile metadata writes via tau-skills.
specs/3626/tasks.md Tasks + test mapping for skills parity slice.
specs/3627/spec.md Spec for aligning plan_executor docs with reporting/deadlock-only reality.
specs/3627/plan.md Plan for docs correction + regression tests in orchestrator.
specs/3627/tasks.md Tasks + test mapping for plan_executor alignment slice.
crates/tau-agent-core/src/lib.rs Saves action history at end of prompt/continue-turn; records tool telemetry with real turn + measured latency.
crates/tau-agent-core/src/tests/action_history.rs Adds regressions proving prompt-path persistence and real turn/latency telemetry recording.
crates/tau-tools/src/mcp_server_runtime.rs Adds runtime-unavailable helper for lifecycle tools; delegates skills operations to tau-skills; adds regressions for new contracts.
crates/tau-ops/src/mcp_sdk.rs Updates generated MCP tool docs to mark lifecycle tools as runtime-dependent; adds regression test.
crates/tau-orchestrator/src/plan_executor.rs Updates module/item docs to reporting/deadlock-only; adds regression test locking wording.
Cargo.lock Adds tau-skills dependency where needed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1723 to +1737
let install_sources = if name_override.is_some() {
let root = std::env::temp_dir().join(format!(
"tau-mcp-skill-install-{}",
std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|duration| duration.as_nanos())
.unwrap_or(0)
));
std::fs::create_dir_all(&root)
.with_context(|| format!("failed to create {}", root.display()))?;
let staged_path = root.join(&dest_name);
let content = std::fs::read_to_string(source_path)
.with_context(|| format!("failed to read skill source {}", source_path.display()))?;
std::fs::write(&staged_path, content)
.with_context(|| format!("failed to stage skill source {}", staged_path.display()))?;
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name_override is interpolated into dest_name and then joined onto temp_dir()/written without any sanitization. If a caller passes an absolute path or ../ segments, root.join(&dest_name) can escape the staging dir (and even ignore root entirely for absolute paths), resulting in arbitrary file writes outside the skills area. Validate/sanitize the override to a safe filename (e.g., reject path separators/absolute paths, normalize to Path::new(name).file_name()), or use a safer staging approach that cannot escape the intended directory.

Copilot uses AI. Check for mistakes.
Comment on lines 3315 to +3322
if let Some(cached) = self.lookup_tool_result_cache(&call) {
pending.push((call, PendingToolResult::Ready(cached)));
pending.push((
call,
PendingToolResult::Ready(TimedToolExecutionResult {
result: cached,
latency_ms: 1,
}),
));
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the parallel tool-call path, cached tool results are assigned a hard-coded latency_ms: 1 rather than a measured value. This makes persisted telemetry look like a real timing measurement even though it’s a placeholder. Consider measuring the actual wall time for handling the tool call (even when served from cache), or explicitly representing cached results so latency isn’t fabricated.

Copilot uses AI. Check for mistakes.
Comment on lines +3342 to 3352
let latency_ms = timed_result.latency_ms;
let result = self
.maybe_execute_branch_followup(&call, result, &mut branch_slot_holders)
.maybe_execute_branch_followup(
&call,
timed_result.result,
&mut branch_slot_holders,
)
.await;
self.store_tool_result_cache(&call, &result);
let is_error = self.record_tool_result(call, result);
let is_error = self.record_tool_result(turn, call, result, latency_ms);
if is_error {
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool latency is measured inconsistently between the serial and parallel execution paths. In the parallel path, latency_ms is taken from execute_tool_call_inner before maybe_execute_branch_followup, but in the serial path latency includes follow-up time (and cache lookup). This can skew telemetry depending on max_parallel_tool_calls. Pick a consistent definition (tool-only vs tool+followup) and measure it the same way in both paths.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e6b98d536f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1773 to +1774
match tau_skills::write_skills_lockfile(&state.skills_dir, &lock_path, &[lock_hint]) {
Ok(lockfile) => lockfile,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip lockfile write that breaks directory-skill installs

Calling write_skills_lockfile unconditionally here makes tau.skills_install fail in catalogs that contain multiple directory-backed skills (each has SKILL.md), because lockfile validation rejects duplicate file entries. In that case the install itself succeeds but this post-step returns an error ("skill installed but lockfile update failed"), so users get a failed tool call for a successful install. This regression is triggered whenever the skills dir has 2+ folder skills and then any install is attempted.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@njfio
Copy link
Copy Markdown
Owner Author

njfio commented Mar 30, 2026

CI follow-up on a21e27f5:

  • Docs Quality (links + architecture freshness) is now passing.
  • The remaining Quality (fmt + clippy + tests) failure is the oversized-file guard on crates/tau-coding-agent/src/live_rl_runtime.rs.
  • That file is unchanged between base feat/radical-simplification-self-improvement and this PR head, and it is 4460 lines on both sides.

Evidence used locally:

  • git diff --name-only feat/radical-simplification-self-improvement...HEAD -- crates/tau-coding-agent/src/live_rl_runtime.rs -> no diff
  • git show feat/radical-simplification-self-improvement:crates/tau-coding-agent/src/live_rl_runtime.rs | wc -l -> 4460
  • wc -l crates/tau-coding-agent/src/live_rl_runtime.rs -> 4460

So this PR-specific CI follow-up is complete; the remaining blocker is pre-existing base-branch policy debt, not a regression from #3623.

@njfio
Copy link
Copy Markdown
Owner Author

njfio commented Mar 30, 2026

Follow-up created for the remaining non-3623 blocker:

This keeps the live_rl_runtime.rs oversized-file failure owned and separate from the runtime-integrity changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants