-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Dear maintainers,
First of all, thank you for your great work on this project! I have been studying the details and would like to ask a few questions regarding the <BEGIN TRAJECTORY><END TRAJECTORY> construct and the overall support for various agent behaviors. Your insights would be very helpful.
Q1. Does the TRAJECTORY include the Agent’s Response?
It seems that the TRAJECTORY might primarily capture the agent’s input (e.g., user messages, agent thought, action and env feedback) for safety checks. However, it is not entirely clear whether the agent’s own responses are also part of the trajectory. If the current design supports including agent responses, how should they be inserted? For example, should they be placed inside the <BEGIN TRAJECTORY> tags along with user inputs, or is there a separate mechanism?
Q2. Support for multi-turn agent behavior
The examples in the paper appear to show single-turn interactions. Does the current implementation support multi-turn dialogues where the agent’s previous responses become part of the context for subsequent turns? In a multi-turn setting, naturally the agent’s own previous responses would need to be included in the trajectory to provide full context. Is this supported, and if so, how should the trajectory be formatted to accommodate multiple turns ?
Q3. Handling of parallel tool calls
Some agents may invoke multiple tools in parallel within a single turn (e.g., calling several functions simultaneously). Does the current design account for such parallel tool calls? If yes, how should they be represented in the trajectory (e.g., as separate entries or combined)? Are there any special considerations for safety checks in such scenarios?
Q4. Compatibility with missing "thought" steps
Some agent architectures do not include an explicit "thought" field before taking actions; they may directly generate tool calls or responses. Is the model compatible with trajectories that lack a thought step? Would the safety mechanism still work correctly if the thought is absent, or is it required to have a thought for every turn?
I would greatly appreciate any clarification or pointers to relevant documentation or examples. Thank you for your time and for building such an interesting project!