docs: update tool safety and workflow docs#21
docs: update tool safety and workflow docs#21LeeLaugh-jason wants to merge 4 commits intoRethinkCraft:mainfrom
Conversation
- docs(readme): refresh phase 1 capability summary and recovery guidance in english - docs(readme): refresh phase 1 capability summary and recovery guidance in chinese - docs(book): clarify approval gates, recoverable failures, and git packaging tool behavior
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly updates the project's documentation, focusing on the agent's tool safety mechanisms and workflow. It clarifies how various tools, especially those interacting with Git, are approved and executed, and refines the agent's failure recovery strategies to provide more nuanced guidance rather than immediate termination for all issues. These changes aim to improve understanding of the system's boundaries and operational logic. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request updates the documentation, specifically the README files and the 04-tools-and-safety.md chapter, to reflect a more refined agent tool execution and safety model. Key changes include clarifying the approval process for repository-aware tools like git_add and git_commit (which now require both mutating and execution approvals), and detailing a more nuanced failure handling mechanism in the agent loop that distinguishes between recoverable failures (providing guidance) and fatal failures (leading to a stop). The review comments suggest minor formatting improvements for consistency and clarity, recommending the use of backticks for specific keywords and failure categories within the 04-tools-and-safety.md file.
| 定义默认工具集合,把名称、描述、参数 schema、类别和执行函数绑在一起。这里决定的是“模型能请求什么”。 | ||
| - `src/tool_registry.cpp` | ||
| 是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”。 | ||
| 是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”,并把缺失的是 mutating、execution,还是两者都缺失,显式写回结构化结果。 |
There was a problem hiding this comment.
For consistency with how these terms are used elsewhere in the document (e.g., lines 56-57), mutating and execution should be formatted as code using backticks.
| 是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”,并把缺失的是 mutating、execution,还是两者都缺失,显式写回结构化结果。 | |
| 是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”,并把缺失的是 `mutating`、`execution`,还是两者都缺失,显式写回结构化结果。 |
| 2. 再看 `src/tool_registry.cpp`,确认默认策略如何区分只读、变更和执行。 | ||
| 3. 然后看具体 executor,理解每类工具的真实风险差异。 | ||
| 4. 最后回到 `src/agent_loop.cpp`,看系统在工具失败后如何停下。 | ||
| 4. 最后回到 `src/agent_loop.cpp`,看系统如何区分 blocked、retryable、needs-inspection、fatal,并决定是补 guidance 还是停机。 |
There was a problem hiding this comment.
For clarity and consistency, the failure categories (blocked, retryable, needs-inspection, fatal) should be formatted as code using backticks. This helps distinguish them as specific system states/keywords.
| 4. 最后回到 `src/agent_loop.cpp`,看系统如何区分 blocked、retryable、needs-inspection、fatal,并决定是补 guidance 还是停机。 | |
| 4. 最后回到 `src/agent_loop.cpp`,看系统如何区分 `blocked`、`retryable`、`needs-inspection`、`fatal`,并决定是补 guidance 还是停机。 |
|
|
||
| ### 误解二:approval 是人工逐条审批 | ||
| 也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程。 | ||
| 也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程;而且像 `git_add` 与 `git_commit` 这样的仓库封装工具,可能同时缺 mutating 与 execution 两类审批。 |
There was a problem hiding this comment.
For consistency, mutating and execution should be formatted as code using backticks, similar to how they are used on lines 56-57 and how git_add is formatted in this same line.
| 也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程;而且像 `git_add` 与 `git_commit` 这样的仓库封装工具,可能同时缺 mutating 与 execution 两类审批。 | |
| 也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程;而且像 `git_add` 与 `git_commit` 这样的仓库封装工具,可能同时缺 `mutating` 与 `execution` 两类审批。 |
… isolation - Add core data structures for 4 Subagents (Change Analyst, Context Router, Patch Writer, Micro Reviewer) - Implement independent LLM client with per-subagent configuration - Create orchestrator pipeline for sequential workflow - Add comprehensive unit tests for data structures - Fix build integration and string literal issues - Optimize token consumption through local context processing
- fix(agent_loop): lower compaction threshold from 5 to 3 and fix boundary condition - fix(tests): increase max_context_bytes from 100 to 1000 in compaction tests - feat(agent_loop): implement subagent delegation with strict child limits - feat(agent_tools): add delegate_subagent tool to schema - test(agent_loop): add coverage for compaction and subagent scenarios
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a documentation generation pipeline and a subagent delegation system, allowing the agent to spawn bounded child sessions for specific tasks. It also implements a context compaction mechanism to summarize conversation history when context limits are reached. Technical feedback highlights the need for more robust JSON extraction to handle nested braces, consistency between documentation patch simulation and actual tool behavior, and the completion of support for all defined patch actions. Additionally, the compaction logic requires simplification and the removal of mixed-language comments to improve maintainability.
| std::string LlmClient::extract_json_from_response(const std::string& content) { | ||
| std::string trimmed; | ||
| for (char c : content) { | ||
| if (c != ' ' && c != '\n' && c != '\r' && c != '\t') { | ||
| trimmed += c; | ||
| } | ||
| } | ||
|
|
||
| if (!trimmed.empty() && trimmed[0] == '{') { | ||
| size_t brace_count = 0; | ||
| size_t end_pos = 0; | ||
| for (size_t i = 0; i < trimmed.size(); ++i) { | ||
| if (trimmed[i] == '{') ++brace_count; | ||
| else if (trimmed[i] == '}') { | ||
| --brace_count; | ||
| if (brace_count == 0) { | ||
| end_pos = i + 1; | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| if (end_pos > 0) { | ||
| return trimmed.substr(0, end_pos); | ||
| } | ||
| } | ||
|
|
||
| std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)"); | ||
| std::smatch match; | ||
| if (std::regex_search(content, match, json_block)) { | ||
| return match[1].str(); | ||
| } | ||
|
|
||
| return content; | ||
| } |
There was a problem hiding this comment.
The logic in extract_json_from_response for finding a JSON object by counting braces is not robust. It doesn't handle cases where brace characters { or } appear inside string values within the JSON, which would cause it to find the wrong end of the JSON object and lead to parsing failures.
std::string LlmClient::extract_json_from_response(const std::string& content) {
// First, try to find a JSON block in markdown, which is the most reliable.
std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)");
std::smatch match;
if (std::regex_search(content, match, json_block)) {
return match[1].str();
}
// As a fallback, find the first opening brace and the last closing brace.
// This is more robust than brace counting without a proper parser.
auto first_brace = content.find('{');
auto last_brace = content.rfind('}');
if (first_brace != std::string::npos && last_brace != std::string::npos && last_brace > first_brace) {
return content.substr(first_brace, last_brace - first_brace + 1);
}
// If no JSON structure is found, return the original content and let the parser handle it.
return content;
}| auto after = original; | ||
| for (const auto& p : patch_result->patches) { | ||
| if (p.action == PatchAction::Replace || p.action == PatchAction::Delete) { | ||
| auto pos = after.find(p.old_text); | ||
| if (pos != std::string::npos) { | ||
| after.replace(pos, p.old_text.length(), p.new_text); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The patch simulation logic here for generating the after text is not consistent with the actual apply_patch tool. This simulation uses after.find(p.old_text), which will always replace the first occurrence of old_text.
However, the apply_patch tool (specifically apply_patch_single) will fail if there are multiple matches. This discrepancy means the review_patch function might be called with an after text that could never be generated by the real tool, potentially leading it to approve a patch that would fail to apply later.
To fix this, the simulation should also check for multiple occurrences of p.old_text and handle it consistently with apply_patch.
src/agent_loop.cpp
Outdated
| if (non_system_indices.size() == kCompactionRecentNonSystemMessages) { | ||
| // 当非系统消息数量正好等于保留数量时,至少保留一条消息用于压缩 | ||
| // 跳过第一条非系统消息,从第二条开始作为最近消息 | ||
| recent_start_index = non_system_indices[1]; | ||
| LOG_INFO("Compaction boundary case: using recent_start_index={} (second non-system message)", recent_start_index); | ||
| } else { | ||
| recent_start_index = non_system_indices[non_system_indices.size() - kCompactionRecentNonSystemMessages]; | ||
| LOG_INFO("Compaction normal case: recent_start_index={}", recent_start_index); | ||
| } |
There was a problem hiding this comment.
The special handling for non_system_indices.size() == kCompactionRecentNonSystemMessages is a bit confusing. It makes the number of messages to keep inconsistent (N-1 in this case vs. N otherwise) and includes a comment in Chinese in an otherwise English codebase, which harms maintainability.
Consider simplifying this logic. If the goal is to always compact when over the size limit, even with the minimum number of messages, perhaps this can be handled more explicitly. A single, consistent rule for how many messages to keep would be easier to understand and maintain.
| std::vector<PatchEntry> DocgenPipeline::to_patch_entries( | ||
| const PatchWriterResult& patches, | ||
| const std::string& original | ||
| ) { | ||
| std::vector<PatchEntry> entries; | ||
|
|
||
| for (const auto& p : patches.patches) { | ||
| if (p.action == PatchAction::Replace || p.action == PatchAction::Delete) { | ||
| entries.push_back({p.old_text, p.new_text}); | ||
| } | ||
| } | ||
|
|
||
| return entries; | ||
| } |
There was a problem hiding this comment.
- fix(docgen): improve JSON extraction robustness (match code block, fallback to braces) - fix(docgen): unify patch simulation with real tool behavior (count occurrences, reject multi-matches) - fix(agent): simplify compaction boundary logic, remove special case and Chinese comments - feat(docgen): support InsertBefore/InsertAfter actions in to_patch_entries - test: all relevant tests pass (compaction, apply_patch, docgen_types)
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a documentation generation pipeline and a subagent delegation system, enabling the agent to manage complex tasks via isolated child sessions. It also implements context compaction in the agent loop to handle long conversations and updates the tool safety documentation to reflect new approval gates and recovery guidance. Feedback focuses on improving code consistency through raw string literals for prompts, enhancing JSON extraction robustness to handle arrays, and streamlining logic by removing redundant checks and unreachable code in the delegation and pipeline implementations.
| inline constexpr std::string_view kPatchWriterPrompt = | ||
| "You are a precise document patch generator. Output ONLY valid JSON.\n\n" | ||
| "INPUT:\n" | ||
| "- original_text: the exact text block to modify (line-numbered)\n" | ||
| "- intent_summary: what change is needed\n\n" | ||
| "OUTPUT FORMAT (strict JSON):\n" | ||
| "{\n" | ||
| " \"patches\": [\n" | ||
| " {\n" | ||
| " \"action\": \"replace|insert_before|insert_after|delete\",\n" | ||
| " \"old_text\": \"exact text to find (for replace/delete)\",\n" | ||
| " \"new_text\": \"replacement content (empty for delete)\"\n" | ||
| " }\n" | ||
| " ],\n" | ||
| " \"rationale\": \"one-line explanation\"\n" | ||
| "}\n\n" | ||
| "RULES:\n" | ||
| "1. old_text MUST match EXACTLY (whitespace-sensitive) for replace/delete\n" | ||
| "2. Each patch operates on original_text independently (not sequential)\n" | ||
| "3. Use multiple patches only when truly independent\n" | ||
| "4. NO prose, NO markdown code fences, ONLY the JSON object\n" | ||
| "5. If no change needed, return {\"patches\": [], \"rationale\": \"no change required\"}"; | ||
|
|
There was a problem hiding this comment.
For consistency with other prompt definitions in this file (kChangeAnalystPrompt, kContextRouterPrompt), consider using a raw string literal R"(...)" for kPatchWriterPrompt. This improves readability and maintainability by avoiding the need for escaping characters and string concatenation.
inline constexpr std::string_view kPatchWriterPrompt = R"(You are a precise document patch generator. Output ONLY valid JSON.
INPUT:
- original_text: the exact text block to modify (line-numbered)
- intent_summary: what change is needed
OUTPUT FORMAT (strict JSON):
{
"patches": [
{
"action": "replace|insert_before|insert_after|delete",
"old_text": "exact text to find (for replace/delete)",
"new_text": "replacement content (empty for delete)"
}
],
"rationale": "one-line explanation"
}
RULES:
1. old_text MUST match EXACTLY (whitespace-sensitive) for replace/delete
2. Each patch operates on original_text independently (not sequential)
3. Use multiple patches only when truly independent
4. NO prose, NO markdown code fences, ONLY the JSON object
5. If no change needed, return {"patches": [], "rationale": "no change required"}
)";| inline constexpr std::string_view kMicroReviewerPrompt = | ||
| "You are a minimal document patch reviewer. Output ONLY valid JSON.\n\n" | ||
| "INPUT:\n" | ||
| "- before: original text\n" | ||
| "- after: modified text\n" | ||
| "- intent_summary: intended change\n\n" | ||
| "OUTPUT FORMAT (strict JSON):\n" | ||
| "{\n" | ||
| " \"verdict\": \"approve|reject\",\n" | ||
| " \"reason\": \"one sentence\",\n" | ||
| " \"issues\": [\"optional list of problems if rejected\"]\n" | ||
| "}\n\n" | ||
| "RULES:\n" | ||
| "1. approve if: change matches intent, no syntax errors, maintains document structure\n" | ||
| "2. reject if: wrong content, broken formatting, unrelated changes\n" | ||
| "3. NO prose, NO markdown code fences, ONLY the JSON object"; |
There was a problem hiding this comment.
For consistency with other prompt definitions in this file, consider using a raw string literal R"(...)" for kMicroReviewerPrompt. This improves readability and maintainability.
inline constexpr std::string_view kMicroReviewerPrompt = R"(You are a minimal document patch reviewer. Output ONLY valid JSON.
INPUT:
- before: original text
- after: modified text
- intent_summary: intended change
OUTPUT FORMAT (strict JSON):
{
"verdict": "approve|reject",
"reason": "one sentence",
"issues": ["optional list of problems if rejected"]
}
RULES:
1. approve if: change matches intent, no syntax errors, maintains document structure
2. reject if: wrong content, broken formatting, unrelated changes
3. NO prose, NO markdown code fences, ONLY the JSON object
)";| case PatchAction::InsertAfter: return "insert_after"; | ||
| case PatchAction::Delete: return "delete"; | ||
| } | ||
| return "replace"; |
There was a problem hiding this comment.
The fallback return "replace"; can hide bugs if a new value is added to the PatchAction enum but not handled in this switch statement. It would be safer to throw an exception for unhandled cases to ensure all enum values are explicitly handled.
| return "replace"; | |
| throw std::logic_error("Invalid PatchAction value"); |
| if (child_limit >= parent_limit) { | ||
| child_limit = parent_limit - 1; | ||
| } |
There was a problem hiding this comment.
This check if (child_limit >= parent_limit) appears to be redundant. Given that parent_limit must be greater than 1 (checked at the beginning of the function), child_limit is initialized to std::max(1, parent_limit / 2). For any integer parent_limit > 1, std::max(1, parent_limit / 2) will always be strictly less than parent_limit. Therefore, this condition will never be met and this block can be safely removed.
| std::string LlmClient::extract_json_from_response(const std::string& content) { | ||
| // 1. Try to extract JSON from Markdown code block first | ||
| std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)"); | ||
| std::smatch match; | ||
| if (std::regex_search(content, match, json_block)) { | ||
| return match[1].str(); | ||
| } | ||
|
|
||
| // 2. Fallback: find the outermost braces from first '{' to last '}' | ||
| const size_t first_brace = content.find('{'); | ||
| if (first_brace == std::string::npos) { | ||
| return content; // No opening brace found | ||
| } | ||
|
|
||
| const size_t last_brace = content.rfind('}'); | ||
| if (last_brace == std::string::npos || last_brace <= first_brace) { | ||
| return content; // No matching closing brace | ||
| } | ||
|
|
||
| // Return the substring including both braces | ||
| return content.substr(first_brace, last_brace - first_brace + 1); | ||
| } |
There was a problem hiding this comment.
The current implementation of extract_json_from_response only handles JSON objects ({...}). It would be more robust to handle top-level JSON arrays ([...]) as well, both in the regex and the fallback logic. While current prompts expect objects, this would future-proof the function.
std::string LlmClient::extract_json_from_response(const std::string& content) {
// 1. Try to extract JSON from Markdown code block first
std::regex json_block(R"(```(?:json)?\s*((?:\{[\s\S]*?\}|\[[\s\S]*?\]))\s*```)");
std::smatch match;
if (std::regex_search(content, match, json_block)) {
return match[1].str();
}
// 2. Fallback: find the outermost JSON structure
const size_t first_char_pos = content.find_first_of("{[");
if (first_char_pos == std::string::npos) {
return content; // No JSON structure found
}
const char first_char = content[first_char_pos];
const char last_char = (first_char == '{') ? '}' : ']';
const size_t last_char_pos = content.rfind(last_char);
if (last_char_pos == std::string::npos || last_char_pos <= first_char_pos) {
return content; // No matching closing character
}
// Return the substring including both braces/brackets
return content.substr(first_char_pos, last_char_pos - first_char_pos + 1);
}| default: | ||
| patches_skipped++; | ||
| break; |
Summary
git_addandgit_commitrequire both mutating and execution approvalsTesting