Normalize openapi_call/openapi_call_output types in evaluator preprocessing by Copilot · Pull Request #4832 · Azure/azureml-assets

Copilot · 2026-03-05T01:10:01Z

Evaluators fail when given conversations containing OpenAPI tool responses because _normalize_function_call_types only handled function_call/function_call_output → tool_call/tool_result, not the openapi_call/openapi_call_output variants.

Changes

Core normalization (`_normalize_function_call_types`)

Added openapi_call → tool_call and openapi_call_output → tool_result (with openapi_call_output field rename to tool_result), mirroring the existing function_call handling:

elif t == "openapi_call":
    item["type"] = "tool_call"
elif t == "openapi_call_output":
    item["type"] = "tool_result"
    if "openapi_call_output" in item:
        item["tool_result"] = item.pop("openapi_call_output")

Validation (`UNSUPPORTED_TOOLS`)

For all evaluators (_tool_call_success, _tool_call_accuracy, _tool_selection, _tool_input_accuracy, _tool_output_utilization, _intent_resolution, _task_completion, _task_adherence, _groundedness, _relevance, _retrieval, _fluency, _coherence, _customer_satisfaction, _deflection_rate, _response_completeness):

"openapi_call" remains in UNSUPPORTED_TOOLS — conversations containing openapi_call content are rejected as not supported.
Name resolution for openapi_call content types uses the "openapi_call" string, consistent with the existing pattern.
Intermediate response detection (_is_intermediate_response) only handles function_call and mcp_approval_request — openapi_call is not added.

Original prompt

Tool Call Success evaluator fails with OpenAPI responses — openapi_call type not normalized to tool_call

Work Item Details

Note: Please focus on the descriptions and information that provide context about the task requirements, functionality, and implementation details. Dates, priorities, and administrative metadata are less relevant for coding tasks.

Repro Steps

1. Call the Tool Call Success evaluator with a conversation containing an OpenAPI tool response (content items with

"type": "openapi_call" and "type": "openapi_call_output")

2. The evaluator fails because it expects tool_call / tool_result types but receives openapi_call /

openapi_call_output

Acceptance Criteria

- _normalize_function_call_types normalizes openapi_call → tool_call and openapi_call_output → tool_result (with

corresponding field rename)

- test_openapi passes for Tool Call Success (and all other evaluators that use this function)

Comments

Mohamed Hussein (Thu, 05 Mar 2026 01:08:02 GMT):
Root Cause: _normalize_function_call_types currently only normalizes:

- function_call → tool_call

- function_call_output → tool_result

It does not handle:

- openapi_call → tool_call

- openapi_call_output → tool_result

Suggested Fix: Add openapi_call / openapi_call_output normalization to _normalize_function_call_types in the same

way function_call / function_call_output are handled. This function exists in multiple evaluators:

- _tool_call_success.py:611

- _tool_call_accuracy.py:718

- _tool_selection.py:987

- _tool_input_accuracy.py:966

- _tool_output_utilization.py:873

Work item: AB#5061130
Created via Azure DevOps

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

m7md7sien · 2026-03-05T19:34:18Z

assets/evaluators/builtin/coherence/evaluator/_coherence.py

        "code_interpreter_call",
        "computer_call",
        "azure_fabric",
-        "openapi_call",


We shouldn't remove it from unsupported tools.

Restored "openapi_call" to UNSUPPORTED_TOOLS in all quality evaluators (coherence, fluency, customer_satisfaction, groundedness, relevance, intent_resolution, task_completion, task_adherence). Only the 5 tool evaluators mentioned in the issue have it removed. Commit: 96d5768

m7md7sien · 2026-03-05T19:35:40Z

assets/evaluators/builtin/coherence/evaluator/_coherence.py

                    # Raise error in case of unsupported tools for evaluators that enabled check_for_unsupported_tools
                    if self.check_for_unsupported_tools:
                        if content_type == ContentType.TOOL_CALL or content_type == ContentType.OPENAPI_CALL:
-                            name = (


We shouldn't do that either.

Reverted the name resolution change in all quality evaluators — they now keep the original "openapi_call" if content_type == ContentType.OPENAPI_CALL logic. Commit: 96d5768

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

m7md7sien · 2026-03-05T19:49:58Z

assets/evaluators/builtin/tool_call_success/evaluator/_tool_call_success.py

        "code_interpreter_call",
        "computer_call",
        "azure_fabric",
-        "openapi_call",


Undo this in all evaluators.

Restored "openapi_call" to UNSUPPORTED_TOOLS in all 5 tool evaluators. Commit: 4c8f881

assets/evaluators/builtin/tool_call_success/evaluator/_tool_call_success.py

salma-elshafey · 2026-03-05T20:00:40Z

assets/evaluators/builtin/coherence/evaluator/_coherence.py

                last_content = content[-1]
                if (isinstance(last_content, dict) and
-                        last_content.get("type") in ("function_call", "mcp_approval_request")):
+                        last_content.get("type") in ("function_call", "mcp_approval_request", "openapi_call")):


Remove the handling of the openapi_call in the _handle_intermediate_response method in all evaluators you added this in, we only handle function_call and mcp_approval_request

Removed "openapi_call" from _is_intermediate_response in all 16 evaluator files — the check is now back to only ("function_call", "mcp_approval_request"). Commit: 48487e9

salma-elshafey · 2026-03-05T20:03:13Z

assets/evaluators/builtin/tool_call_accuracy/evaluator/_tool_call_accuracy.py

        "code_interpreter_call",
        "computer_call",
        "azure_fabric",
-        "openapi_call",


openapi_call is still unsupported here, do not remove it

salma-elshafey · 2026-03-05T20:04:21Z

assets/evaluators/tests/test_evaluators_behavior/base_evaluator_behavior_test.py

        },
    ]

+    OPENAPI_CALL_ONLY_RESPONSE: List[Dict[str, Any]] = [


This is not needed because we only handle mcp_approval_request-only and function_call_only responses. Remove it.

Removed OPENAPI_CALL_ONLY_RESPONSE, OPENAPI_CALL_FULL_RESPONSE fixtures and test_openapi_call_response from all test files. Commit: 48487e9

…tests Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

github-actions · 2026-03-05T20:37:44Z

Test Results for assets-test

677 tests 677 ✅ 29s ⏱️
13 suites 0 💤
13 files 0 ❌

Results for commit d663fbd.

♻️ This comment has been updated with latest results.

salma-elshafey · 2026-03-05T20:38:54Z

@copilot you should run the code health validation script to ensure it passes.

Initial plan

b8c9664

Copilot AI assigned Copilot and m7md7sien Mar 5, 2026

Copilot started work on behalf of m7md7sien March 5, 2026 01:10 View session

Add openapi_call/openapi_call_output normalization to all evaluators

e64b0cb

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix Tool Call Success evaluator for OpenAPI responses~~ Normalize openapi_call/openapi_call_output types in evaluator preprocessing Mar 5, 2026

Copilot finished work on behalf of m7md7sien March 5, 2026 01:29

m7md7sien reviewed Mar 5, 2026

View reviewed changes

m7md7sien requested changes Mar 5, 2026

View reviewed changes

Copilot started work on behalf of m7md7sien March 5, 2026 19:36 View session

Restore openapi_call in UNSUPPORTED_TOOLS for quality evaluators

96d5768

Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot finished work on behalf of m7md7sien March 5, 2026 19:43

m7md7sien reviewed Mar 5, 2026

View reviewed changes

assets/evaluators/builtin/tool_call_success/evaluator/_tool_call_success.py Outdated Show resolved Hide resolved

m7md7sien requested changes Mar 5, 2026

View reviewed changes

Copilot started work on behalf of m7md7sien March 5, 2026 19:50 View session

salma-elshafey requested changes Mar 5, 2026

View reviewed changes

Restore openapi_call in UNSUPPORTED_TOOLS for all evaluators, update …

4c8f881

…tests Co-authored-by: m7md7sien <16615690+m7md7sien@users.noreply.github.com>

Copilot finished work on behalf of m7md7sien March 5, 2026 20:07

Copilot AI temporarily deployed to Testing March 5, 2026 20:36 Inactive

Copilot AI temporarily deployed to Testing March 5, 2026 20:37 Inactive

Copilot started work on behalf of salma-elshafey March 5, 2026 20:39 View session

Copilot AI temporarily deployed to Testing March 5, 2026 20:39 Inactive

Copilot AI temporarily deployed to Testing March 5, 2026 20:49 Inactive

Merge branch 'main' into copilot/normalize-tool-call-types

d663fbd

salma-elshafey marked this pull request as ready for review March 5, 2026 21:02

salma-elshafey requested review from a team as code owners March 5, 2026 21:02

salma-elshafey temporarily deployed to Testing March 5, 2026 21:03 — with GitHub Actions Inactive

salma-elshafey temporarily deployed to Testing March 5, 2026 21:05 — with GitHub Actions Inactive

salma-elshafey approved these changes Mar 5, 2026

View reviewed changes

m7md7sien approved these changes Mar 9, 2026

View reviewed changes

Conversation

Copilot AI commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Core normalization (_normalize_function_call_types)

Validation (UNSUPPORTED_TOOLS)

Tool Call Success evaluator fails with OpenAPI responses — openapi_call type not normalized to tool_call

Work Item Details

Repro Steps

Acceptance Criteria

Comments

Uh oh!

m7md7sien Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

m7md7sien Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

m7md7sien Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

salma-elshafey Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

salma-elshafey Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

salma-elshafey Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results for assets-test

Uh oh!

salma-elshafey commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Mar 5, 2026 •

edited

Loading

Core normalization (`_normalize_function_call_types`)

Validation (`UNSUPPORTED_TOOLS`)

github-actions bot commented Mar 5, 2026 •

edited

Loading