Skip to content

Clarification on TRAJECTORY Content, Multi-turn Support, Parallel Tool Calls, and Handling of Missing Thoughts #1

@YuanBoXie

Description

@YuanBoXie

Dear maintainers,

First of all, thank you for your great work on this project! I have been studying the details and would like to ask a few questions regarding the <BEGIN TRAJECTORY><END TRAJECTORY> construct and the overall support for various agent behaviors. Your insights would be very helpful.

Q1. Does the TRAJECTORY include the Agent’s Response?
It seems that the TRAJECTORY might primarily capture the agent’s input (e.g., user messages, agent thought, action and env feedback) for safety checks. However, it is not entirely clear whether the agent’s own responses are also part of the trajectory. If the current design supports including agent responses, how should they be inserted? For example, should they be placed inside the <BEGIN TRAJECTORY> tags along with user inputs, or is there a separate mechanism?

Q2. Support for multi-turn agent behavior
The examples in the paper appear to show single-turn interactions. Does the current implementation support multi-turn dialogues where the agent’s previous responses become part of the context for subsequent turns? In a multi-turn setting, naturally the agent’s own previous responses would need to be included in the trajectory to provide full context. Is this supported, and if so, how should the trajectory be formatted to accommodate multiple turns ?

Q3. Handling of parallel tool calls
Some agents may invoke multiple tools in parallel within a single turn (e.g., calling several functions simultaneously). Does the current design account for such parallel tool calls? If yes, how should they be represented in the trajectory (e.g., as separate entries or combined)? Are there any special considerations for safety checks in such scenarios?

Q4. Compatibility with missing "thought" steps
Some agent architectures do not include an explicit "thought" field before taking actions; they may directly generate tool calls or responses. Is the model compatible with trajectories that lack a thought step? Would the safety mechanism still work correctly if the thought is absent, or is it required to have a thought for every turn?

I would greatly appreciate any clarification or pointers to relevant documentation or examples. Thank you for your time and for building such an interesting project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions