Skip to content

docs: update tool safety and workflow docs#21

Open
LeeLaugh-jason wants to merge 4 commits intoRethinkCraft:mainfrom
LeeLaugh-jason:docph1
Open

docs: update tool safety and workflow docs#21
LeeLaugh-jason wants to merge 4 commits intoRethinkCraft:mainfrom
LeeLaugh-jason:docph1

Conversation

@LeeLaugh-jason
Copy link
Copy Markdown
Collaborator

Summary

  • refresh README and README_zh Phase 1 descriptions to include repo-aware git packaging tools and recovery guidance
  • clarify the tools and safety chapter around approval gates, recoverable failures, and guidance vs fail-fast behavior
  • document why git_add and git_commit require both mutating and execution approvals

Testing

  • not run (documentation-only changes)

- docs(readme): refresh phase 1 capability summary and recovery guidance in english
- docs(readme): refresh phase 1 capability summary and recovery guidance in chinese
- docs(book): clarify approval gates, recoverable failures, and git packaging tool behavior
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly updates the project's documentation, focusing on the agent's tool safety mechanisms and workflow. It clarifies how various tools, especially those interacting with Git, are approved and executed, and refines the agent's failure recovery strategies to provide more nuanced guidance rather than immediate termination for all issues. These changes aim to improve understanding of the system's boundaries and operational logic.

Highlights

  • README Updates: Refreshed the Phase 1 descriptions in both English and Chinese READMEs to incorporate repository-aware git packaging tools and recovery guidance.
  • Tool Safety Clarification: Enhanced the tool safety documentation to detail approval gates, differentiate between recoverable and fatal failures, and explain the agent's guidance versus fail-fast behavior.
  • Git Tool Approval Logic: Documented the specific requirement for git_add and git_commit tools to satisfy both mutating and execution approvals due to their dual impact on repository state and potential for executing hooks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the documentation, specifically the README files and the 04-tools-and-safety.md chapter, to reflect a more refined agent tool execution and safety model. Key changes include clarifying the approval process for repository-aware tools like git_add and git_commit (which now require both mutating and execution approvals), and detailing a more nuanced failure handling mechanism in the agent loop that distinguishes between recoverable failures (providing guidance) and fatal failures (leading to a stop). The review comments suggest minor formatting improvements for consistency and clarity, recommending the use of backticks for specific keywords and failure categories within the 04-tools-and-safety.md file.

定义默认工具集合,把名称、描述、参数 schema、类别和执行函数绑在一起。这里决定的是“模型能请求什么”。
- `src/tool_registry.cpp`
是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”。
是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”,并把缺失的是 mutating、execution,还是两者都缺失,显式写回结构化结果
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with how these terms are used elsewhere in the document (e.g., lines 56-57), mutating and execution should be formatted as code using backticks.

Suggested change
是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”,并把缺失的是 mutatingexecution,还是两者都缺失,显式写回结构化结果。
是策略门。它决定的是“当前这个请求在本轮运行里能不能执行”,并把缺失的是 `mutating``execution`,还是两者都缺失,显式写回结构化结果。

2. 再看 `src/tool_registry.cpp`,确认默认策略如何区分只读、变更和执行。
3. 然后看具体 executor,理解每类工具的真实风险差异。
4. 最后回到 `src/agent_loop.cpp`,看系统在工具失败后如何停下
4. 最后回到 `src/agent_loop.cpp`,看系统如何区分 blocked、retryable、needs-inspection、fatal,并决定是补 guidance 还是停机
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, the failure categories (blocked, retryable, needs-inspection, fatal) should be formatted as code using backticks. This helps distinguish them as specific system states/keywords.

Suggested change
4. 最后回到 `src/agent_loop.cpp`,看系统如何区分 blockedretryableneeds-inspectionfatal,并决定是补 guidance 还是停机。
4. 最后回到 `src/agent_loop.cpp`,看系统如何区分 `blocked``retryable``needs-inspection``fatal`,并决定是补 guidance 还是停机。


### 误解二:approval 是人工逐条审批
也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程。
也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程;而且像 `git_add` 与 `git_commit` 这样的仓库封装工具,可能同时缺 mutating 与 execution 两类审批
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency, mutating and execution should be formatted as code using backticks, similar to how they are used on lines 56-57 and how git_add is formatted in this same line.

Suggested change
也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程;而且像 `git_add``git_commit` 这样的仓库封装工具,可能同时缺 mutating 与 execution 两类审批。
也不对。当前实现里的 approval 是运行前确定的策略开关,不是交互式确认流程;而且像 `git_add``git_commit` 这样的仓库封装工具,可能同时缺 `mutating``execution` 两类审批。

… isolation

- Add core data structures for 4 Subagents (Change Analyst, Context Router, Patch Writer, Micro Reviewer)
- Implement independent LLM client with per-subagent configuration
- Create orchestrator pipeline for sequential workflow
- Add comprehensive unit tests for data structures
- Fix build integration and string literal issues
- Optimize token consumption through local context processing
- fix(agent_loop): lower compaction threshold from 5 to 3 and fix boundary condition
- fix(tests): increase max_context_bytes from 100 to 1000 in compaction tests
- feat(agent_loop): implement subagent delegation with strict child limits
- feat(agent_tools): add delegate_subagent tool to schema
- test(agent_loop): add coverage for compaction and subagent scenarios
@LeeLaugh-jason
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a documentation generation pipeline and a subagent delegation system, allowing the agent to spawn bounded child sessions for specific tasks. It also implements a context compaction mechanism to summarize conversation history when context limits are reached. Technical feedback highlights the need for more robust JSON extraction to handle nested braces, consistency between documentation patch simulation and actual tool behavior, and the completion of support for all defined patch actions. Additionally, the compaction logic requires simplification and the removal of mixed-language comments to improve maintainability.

Comment on lines +117 to +150
std::string LlmClient::extract_json_from_response(const std::string& content) {
std::string trimmed;
for (char c : content) {
if (c != ' ' && c != '\n' && c != '\r' && c != '\t') {
trimmed += c;
}
}

if (!trimmed.empty() && trimmed[0] == '{') {
size_t brace_count = 0;
size_t end_pos = 0;
for (size_t i = 0; i < trimmed.size(); ++i) {
if (trimmed[i] == '{') ++brace_count;
else if (trimmed[i] == '}') {
--brace_count;
if (brace_count == 0) {
end_pos = i + 1;
break;
}
}
}
if (end_pos > 0) {
return trimmed.substr(0, end_pos);
}
}

std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)");
std::smatch match;
if (std::regex_search(content, match, json_block)) {
return match[1].str();
}

return content;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The logic in extract_json_from_response for finding a JSON object by counting braces is not robust. It doesn't handle cases where brace characters { or } appear inside string values within the JSON, which would cause it to find the wrong end of the JSON object and lead to parsing failures.

std::string LlmClient::extract_json_from_response(const std::string& content) {
    // First, try to find a JSON block in markdown, which is the most reliable.
    std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)");
    std::smatch match;
    if (std::regex_search(content, match, json_block)) {
        return match[1].str();
    }
    
    // As a fallback, find the first opening brace and the last closing brace.
    // This is more robust than brace counting without a proper parser.
    auto first_brace = content.find('{');
    auto last_brace = content.rfind('}');
    if (first_brace != std::string::npos && last_brace != std::string::npos && last_brace > first_brace) {
        return content.substr(first_brace, last_brace - first_brace + 1);
    }
    
    // If no JSON structure is found, return the original content and let the parser handle it.
    return content;
}

Comment on lines +58 to +66
auto after = original;
for (const auto& p : patch_result->patches) {
if (p.action == PatchAction::Replace || p.action == PatchAction::Delete) {
auto pos = after.find(p.old_text);
if (pos != std::string::npos) {
after.replace(pos, p.old_text.length(), p.new_text);
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The patch simulation logic here for generating the after text is not consistent with the actual apply_patch tool. This simulation uses after.find(p.old_text), which will always replace the first occurrence of old_text.

However, the apply_patch tool (specifically apply_patch_single) will fail if there are multiple matches. This discrepancy means the review_patch function might be called with an after text that could never be generated by the real tool, potentially leading it to approve a patch that would fail to apply later.

To fix this, the simulation should also check for multiple occurrences of p.old_text and handle it consistently with apply_patch.

Comment on lines +461 to +469
if (non_system_indices.size() == kCompactionRecentNonSystemMessages) {
// 当非系统消息数量正好等于保留数量时,至少保留一条消息用于压缩
// 跳过第一条非系统消息,从第二条开始作为最近消息
recent_start_index = non_system_indices[1];
LOG_INFO("Compaction boundary case: using recent_start_index={} (second non-system message)", recent_start_index);
} else {
recent_start_index = non_system_indices[non_system_indices.size() - kCompactionRecentNonSystemMessages];
LOG_INFO("Compaction normal case: recent_start_index={}", recent_start_index);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The special handling for non_system_indices.size() == kCompactionRecentNonSystemMessages is a bit confusing. It makes the number of messages to keep inconsistent (N-1 in this case vs. N otherwise) and includes a comment in Chinese in an otherwise English codebase, which harms maintainability.

Consider simplifying this logic. If the goal is to always compact when over the size limit, even with the minimum number of messages, perhaps this can be handled more explicitly. A single, consistent rule for how many messages to keep would be easier to understand and maintain.

Comment on lines +216 to +229
std::vector<PatchEntry> DocgenPipeline::to_patch_entries(
const PatchWriterResult& patches,
const std::string& original
) {
std::vector<PatchEntry> entries;

for (const auto& p : patches.patches) {
if (p.action == PatchAction::Replace || p.action == PatchAction::Delete) {
entries.push_back({p.old_text, p.new_text});
}
}

return entries;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The to_patch_entries function only handles PatchAction::Replace and PatchAction::Delete. However, the PatchAction enum and the kPatchWriterPrompt also define insert_before and insert_after actions. This function should be updated to handle all possible patch actions to fully support the feature.

- fix(docgen): improve JSON extraction robustness (match code block, fallback to braces)
- fix(docgen): unify patch simulation with real tool behavior (count occurrences, reject multi-matches)
- fix(agent): simplify compaction boundary logic, remove special case and Chinese comments
- feat(docgen): support InsertBefore/InsertAfter actions in to_patch_entries
- test: all relevant tests pass (compaction, apply_patch, docgen_types)
@LeeLaugh-jason
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a documentation generation pipeline and a subagent delegation system, enabling the agent to manage complex tasks via isolated child sessions. It also implements context compaction in the agent loop to handle long conversations and updates the tool safety documentation to reflect new approval gates and recovery guidance. Feedback focuses on improving code consistency through raw string literals for prompts, enhancing JSON extraction robustness to handle arrays, and streamlining logic by removing redundant checks and unreachable code in the delegation and pipeline implementations.

Comment on lines +79 to +101
inline constexpr std::string_view kPatchWriterPrompt =
"You are a precise document patch generator. Output ONLY valid JSON.\n\n"
"INPUT:\n"
"- original_text: the exact text block to modify (line-numbered)\n"
"- intent_summary: what change is needed\n\n"
"OUTPUT FORMAT (strict JSON):\n"
"{\n"
" \"patches\": [\n"
" {\n"
" \"action\": \"replace|insert_before|insert_after|delete\",\n"
" \"old_text\": \"exact text to find (for replace/delete)\",\n"
" \"new_text\": \"replacement content (empty for delete)\"\n"
" }\n"
" ],\n"
" \"rationale\": \"one-line explanation\"\n"
"}\n\n"
"RULES:\n"
"1. old_text MUST match EXACTLY (whitespace-sensitive) for replace/delete\n"
"2. Each patch operates on original_text independently (not sequential)\n"
"3. Use multiple patches only when truly independent\n"
"4. NO prose, NO markdown code fences, ONLY the JSON object\n"
"5. If no change needed, return {\"patches\": [], \"rationale\": \"no change required\"}";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other prompt definitions in this file (kChangeAnalystPrompt, kContextRouterPrompt), consider using a raw string literal R"(...)" for kPatchWriterPrompt. This improves readability and maintainability by avoiding the need for escaping characters and string concatenation.

inline constexpr std::string_view kPatchWriterPrompt = R"(You are a precise document patch generator. Output ONLY valid JSON.

INPUT:
- original_text: the exact text block to modify (line-numbered)
- intent_summary: what change is needed

OUTPUT FORMAT (strict JSON):
{
  "patches": [
    {
      "action": "replace|insert_before|insert_after|delete",
      "old_text": "exact text to find (for replace/delete)",
      "new_text": "replacement content (empty for delete)"
    }
  ],
  "rationale": "one-line explanation"
}

RULES:
1. old_text MUST match EXACTLY (whitespace-sensitive) for replace/delete
2. Each patch operates on original_text independently (not sequential)
3. Use multiple patches only when truly independent
4. NO prose, NO markdown code fences, ONLY the JSON object
5. If no change needed, return {"patches": [], "rationale": "no change required"}
)";

Comment on lines +102 to +117
inline constexpr std::string_view kMicroReviewerPrompt =
"You are a minimal document patch reviewer. Output ONLY valid JSON.\n\n"
"INPUT:\n"
"- before: original text\n"
"- after: modified text\n"
"- intent_summary: intended change\n\n"
"OUTPUT FORMAT (strict JSON):\n"
"{\n"
" \"verdict\": \"approve|reject\",\n"
" \"reason\": \"one sentence\",\n"
" \"issues\": [\"optional list of problems if rejected\"]\n"
"}\n\n"
"RULES:\n"
"1. approve if: change matches intent, no syntax errors, maintains document structure\n"
"2. reject if: wrong content, broken formatting, unrelated changes\n"
"3. NO prose, NO markdown code fences, ONLY the JSON object";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other prompt definitions in this file, consider using a raw string literal R"(...)" for kMicroReviewerPrompt. This improves readability and maintainability.

inline constexpr std::string_view kMicroReviewerPrompt = R"(You are a minimal document patch reviewer. Output ONLY valid JSON.

INPUT:
- before: original text
- after: modified text
- intent_summary: intended change

OUTPUT FORMAT (strict JSON):
{
  "verdict": "approve|reject",
  "reason": "one sentence",
  "issues": ["optional list of problems if rejected"]
}

RULES:
1. approve if: change matches intent, no syntax errors, maintains document structure
2. reject if: wrong content, broken formatting, unrelated changes
3. NO prose, NO markdown code fences, ONLY the JSON object
)";

case PatchAction::InsertAfter: return "insert_after";
case PatchAction::Delete: return "delete";
}
return "replace";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback return "replace"; can hide bugs if a new value is added to the PatchAction enum but not handled in this switch statement. It would be safer to throw an exception for unhandled cases to ensure all enum values are explicitly handled.

Suggested change
return "replace";
throw std::logic_error("Invalid PatchAction value");

Comment on lines +699 to +701
if (child_limit >= parent_limit) {
child_limit = parent_limit - 1;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check if (child_limit >= parent_limit) appears to be redundant. Given that parent_limit must be greater than 1 (checked at the beginning of the function), child_limit is initialized to std::max(1, parent_limit / 2). For any integer parent_limit > 1, std::max(1, parent_limit / 2) will always be strictly less than parent_limit. Therefore, this condition will never be met and this block can be safely removed.

Comment on lines +117 to +138
std::string LlmClient::extract_json_from_response(const std::string& content) {
// 1. Try to extract JSON from Markdown code block first
std::regex json_block(R"(```(?:json)?\s*(\{[\s\S]*?\})\s*```)");
std::smatch match;
if (std::regex_search(content, match, json_block)) {
return match[1].str();
}

// 2. Fallback: find the outermost braces from first '{' to last '}'
const size_t first_brace = content.find('{');
if (first_brace == std::string::npos) {
return content; // No opening brace found
}

const size_t last_brace = content.rfind('}');
if (last_brace == std::string::npos || last_brace <= first_brace) {
return content; // No matching closing brace
}

// Return the substring including both braces
return content.substr(first_brace, last_brace - first_brace + 1);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of extract_json_from_response only handles JSON objects ({...}). It would be more robust to handle top-level JSON arrays ([...]) as well, both in the regex and the fallback logic. While current prompts expect objects, this would future-proof the function.

std::string LlmClient::extract_json_from_response(const std::string& content) {
    // 1. Try to extract JSON from Markdown code block first
    std::regex json_block(R"(```(?:json)?\s*((?:\{[\s\S]*?\}|\[[\s\S]*?\]))\s*```)");
    std::smatch match;
    if (std::regex_search(content, match, json_block)) {
        return match[1].str();
    }
    
    // 2. Fallback: find the outermost JSON structure
    const size_t first_char_pos = content.find_first_of("{[");
    if (first_char_pos == std::string::npos) {
        return content; // No JSON structure found
    }
    
    const char first_char = content[first_char_pos];
    const char last_char = (first_char == '{') ? '}' : ']';
    const size_t last_char_pos = content.rfind(last_char);
    
    if (last_char_pos == std::string::npos || last_char_pos <= first_char_pos) {
        return content; // No matching closing character
    }
    
    // Return the substring including both braces/brackets
    return content.substr(first_char_pos, last_char_pos - first_char_pos + 1);
}

Comment on lines +110 to +112
default:
patches_skipped++;
break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The if condition at lines 77-78 already ensures that p.action is one of the four handled cases. Therefore, the default case in this switch statement is unreachable and can be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant