.Net: Fix non-streaming function calling text and usage aggregation#13429
Open
Cozmopolit wants to merge 3 commits intomicrosoft:mainfrom
Open
.Net: Fix non-streaming function calling text and usage aggregation#13429Cozmopolit wants to merge 3 commits intomicrosoft:mainfrom
Cozmopolit wants to merge 3 commits intomicrosoft:mainfrom
Conversation
Preserve intermediate LLM text content (e.g., 'Let me check that for you...') and aggregate token usage across all iterations in the auto function calling loop. - Add StringBuilder for text aggregation across loop iterations - Accumulate InputTokens/OutputTokens and store as 'AggregatedUsage' metadata - Apply aggregated state to final response (or filter-terminated response) - Add 5 unit tests covering text aggregation, usage aggregation, single iteration, empty content, and filter termination scenarios Fixes microsoft#13420
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Fixes #13420
When using auto function invocation in non-streaming mode (
GetChatMessageContentAsync), intermediate text content generated by the LLM before tool calls is silently discarded. Additionally, token usage is not aggregated across multiple API calls in the auto-invoke loop.Problem: If the LLM responds with "Let me check that for you..." before requesting a tool call, and then provides a final answer after the tool result, only the final answer is returned. The intermediate text is lost.
Scenario: Users relying on non-streaming mode with auto function invocation expect to receive all text the LLM generated, not just the final response.
Description
This PR modifies the non-streaming auto function calling loop in all three affected connectors to:
StringBuilderAffected Connectors:
Connectors.OpenAI/Core/ClientCore.ChatCompletion.csConnectors.Google/Core/Gemini/Clients/GeminiChatCompletionClient.csConnectors.MistralAI/Client/MistralClient.csImplementation approach:
aggregatedContent(StringBuilder) and token counters\n\nseparator) and token countsAggregatedUsagemetadata (only when multiple iterations occurred)Out of Scope:
FunctionInvokingChatClientNew Tests
Added 5 unit tests for the OpenAI connector in
FunctionCallingContentAggregationTests.cs:NonStreaming_IntermediateTextBeforeToolCall_IsAggregatedInFinalResponseAsyncNonStreaming_TokenUsage_IsAggregatedAcrossAllIterationsAsyncAggregatedUsagemetadata contains sum of all tokensNonStreaming_SingleIteration_NoAggregationMetadataAddedAsyncNonStreaming_ToolCallWithoutIntermediateText_OnlyFinalTextReturnedAsyncNonStreaming_FilterTerminatesEarly_AggregatedContentStillAppliedAsyncContribution Checklist