Skip to content

Normalize openapi_call/openapi_call_output types in evaluator preprocessing#4832

Open
Copilot wants to merge 6 commits intomainfrom
copilot/normalize-tool-call-types
Open

Normalize openapi_call/openapi_call_output types in evaluator preprocessing#4832
Copilot wants to merge 6 commits intomainfrom
copilot/normalize-tool-call-types

Conversation

Copy link
Contributor

Copilot AI commented Mar 5, 2026

Evaluators fail when given conversations containing OpenAPI tool responses because _normalize_function_call_types only handled function_call/function_call_outputtool_call/tool_result, not the openapi_call/openapi_call_output variants.

Changes

Core normalization (_normalize_function_call_types)

Added openapi_calltool_call and openapi_call_outputtool_result (with openapi_call_output field rename to tool_result), mirroring the existing function_call handling:

elif t == "openapi_call":
    item["type"] = "tool_call"
elif t == "openapi_call_output":
    item["type"] = "tool_result"
    if "openapi_call_output" in item:
        item["tool_result"] = item.pop("openapi_call_output")

Validation (UNSUPPORTED_TOOLS)

For all evaluators (_tool_call_success, _tool_call_accuracy, _tool_selection, _tool_input_accuracy, _tool_output_utilization, _intent_resolution, _task_completion, _task_adherence, _groundedness, _relevance, _retrieval, _fluency, _coherence, _customer_satisfaction, _deflection_rate, _response_completeness):

  • "openapi_call" remains in UNSUPPORTED_TOOLS — conversations containing openapi_call content are rejected as not supported.
  • Name resolution for openapi_call content types uses the "openapi_call" string, consistent with the existing pattern.
  • Intermediate response detection (_is_intermediate_response) only handles function_call and mcp_approval_requestopenapi_call is not added.
Original prompt

Tool Call Success evaluator fails with OpenAPI responses — openapi_call type not normalized to tool_call

Work Item Details

Note: Please focus on the descriptions and information that provide context about the task requirements, functionality, and implementation details. Dates, priorities, and administrative metadata are less relevant for coding tasks.

Repro Steps

     1. Call the Tool Call Success evaluator with a conversation containing an OpenAPI tool response (content items with

   "type": "openapi_call" and "type": "openapi_call_output")

   2. The evaluator fails because it expects tool_call / tool_result types but receives openapi_call /

  openapi_call_output

Acceptance Criteria

   - _normalize_function_call_types normalizes openapi_call → tool_call and openapi_call_output → tool_result (with

  corresponding field rename)

   - test_openapi passes for Tool Call Success (and all other evaluators that use this function)

Comments

Mohamed Hussein (Thu, 05 Mar 2026 01:08:02 GMT):
  Root Cause: _normalize_function_call_types currently only normalizes:

   - function_call → tool_call

   - function_call_output → tool_result

  It does not handle:

   - openapi_call → tool_call

   - openapi_call_output → tool_result

  Suggested Fix: Add openapi_call / openapi_call_output normalization to _normalize_function_call_types in the same

  way function_call / function_call_output are handled. This function exists in multiple evaluators:

   - _tool_call_success.py:611

   - _tool_call_accuracy.py:718

   - _tool_selection.py:987

   - _tool_input_accuracy.py:966

   - _tool_output_utilization.py:873

Work item: AB#5061130
Created via Azure DevOps


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Tool Call Success evaluator for OpenAPI responses Normalize openapi_call/openapi_call_output types in evaluator preprocessing Mar 5, 2026
"code_interpreter_call",
"computer_call",
"azure_fabric",
"openapi_call",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't remove it from unsupported tools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored "openapi_call" to UNSUPPORTED_TOOLS in all quality evaluators (coherence, fluency, customer_satisfaction, groundedness, relevance, intent_resolution, task_completion, task_adherence). Only the 5 tool evaluators mentioned in the issue have it removed. Commit: 96d5768

# Raise error in case of unsupported tools for evaluators that enabled check_for_unsupported_tools
if self.check_for_unsupported_tools:
if content_type == ContentType.TOOL_CALL or content_type == ContentType.OPENAPI_CALL:
name = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't do that either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the name resolution change in all quality evaluators — they now keep the original "openapi_call" if content_type == ContentType.OPENAPI_CALL logic. Commit: 96d5768

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
"code_interpreter_call",
"computer_call",
"azure_fabric",
"openapi_call",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undo this in all evaluators.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored "openapi_call" to UNSUPPORTED_TOOLS in all 5 tool evaluators. Commit: 4c8f881

last_content = content[-1]
if (isinstance(last_content, dict) and
last_content.get("type") in ("function_call", "mcp_approval_request")):
last_content.get("type") in ("function_call", "mcp_approval_request", "openapi_call")):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the handling of the openapi_call in the _handle_intermediate_response method in all evaluators you added this in, we only handle function_call and mcp_approval_request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "openapi_call" from _is_intermediate_response in all 16 evaluator files — the check is now back to only ("function_call", "mcp_approval_request"). Commit: 48487e9

"code_interpreter_call",
"computer_call",
"azure_fabric",
"openapi_call",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openapi_call is still unsupported here, do not remove it

},
]

OPENAPI_CALL_ONLY_RESPONSE: List[Dict[str, Any]] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed because we only handle mcp_approval_request-only and function_call_only responses. Remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed OPENAPI_CALL_ONLY_RESPONSE, OPENAPI_CALL_FULL_RESPONSE fixtures and test_openapi_call_response from all test files. Commit: 48487e9

…tests

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Test Results for assets-test

677 tests   677 ✅  29s ⏱️
 13 suites    0 💤
 13 files      0 ❌

Results for commit d663fbd.

♻️ This comment has been updated with latest results.

@salma-elshafey
Copy link
Contributor

@copilot you should run the code health validation script to ensure it passes.

@salma-elshafey salma-elshafey marked this pull request as ready for review March 5, 2026 21:02
@salma-elshafey salma-elshafey requested review from a team as code owners March 5, 2026 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants