【BugFix】Fix model configuration compatibility in datasets and postprocessors by GaoHuaZhang · Pull Request #190 · AISBench/benchmark

GaoHuaZhang · 2026-03-13T09:09:58Z

Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
感谢您的贡献，我们非常重视。以下说明将使您的拉取请求更健康，更易于获得反馈。如果您不理解某些项目，请不要担心，只需提交拉取请求并从维护人员那里寻求帮助即可。

PR Type / PR类型

Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)

🔍 Motivation / 变更动机

The current model configuration handling causes compatibility issues between dataset loading and model postprocessing, which leads to incorrect or failed runs under some configurations.
目前的模型配置处理方式在数据集加载与模型后处理之间存在兼容性问题，在部分配置下会导致运行失败或结果不符合预期。

📝 Modification / 修改内容

Align dataset configuration usage in ais_bench/benchmark/datasets/utils/datasets.py with the latest model configuration schema to avoid field mismatch.
Simplify and clean up logic in ais_bench/benchmark/utils/model_postprocessors.py, removing deprecated or unused branches that depended on old configuration formats.
Ensure postprocessors read model-related options in a consistent way so that different models/configs can share the same code path when possible.

对应变更包括：

在 ais_bench/benchmark/datasets/utils/datasets.py 中修正与模型配置相关的字段使用，保证与最新配置结构保持一致，避免键名/默认值不匹配。
在 ais_bench/benchmark/utils/model_postprocessors.py 中大幅删减旧逻辑，移除依赖过时配置格式的代码分支，统一配置读取方式。
让不同模型在后处理阶段对配置项的解析方式保持一致，从而提升整体兼容性与可维护性。

📐 Associated Test Results / 关联测试结果

Manually ran benchmark tasks that previously failed due to configuration mismatch, and confirmed they now run successfully.
Verified typical model configurations (e.g., general chat models and API-based models) produce expected outputs after the change.

（如有 CI 链接或具体命令，可在此补充：CI pipeline 链接、测试命令等）

⚠️ BC-breaking (Optional) / 向后不兼容变更（可选）

This change is not expected to introduce backward-incompatible behavior for normal users, since it mainly cleans up deprecated paths and aligns with the current configuration schema.
本次修改主要是对已废弃的配置路径进行清理，并与当前模型配置结构对齐，对正常使用场景不应引入向后不兼容的行为。

如果下游项目直接依赖被删除的旧字段或旧后处理分支，可能需要：

更新其模型配置为当前推荐格式；或
调整对 model_postprocessors 的调用方式以使用新的兼容接口。

⚠️ Performance degradation (Optional) / 性能下降（可选）

No known performance regressions. The removal of redundant logic in model_postprocessors.py may slightly simplify the runtime path.
目前未发现性能下降问题；删除多余逻辑后，理论上运行路径更简洁。

🌟 Use cases (Optional) / 使用案例（可选）

Running benchmarks with different model configurations (local / API-based / chat models) now share a unified, compatible postprocessing path.
Users can switch model configs without needing to adjust dataset or postprocessor code manually.

✅ Checklist / 检查列表

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues. / 使用预提交或其他 linting 工具来修复潜在的 lint 问题。
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests. / 修复的 Bug 已完全由单元测试覆盖，导致 Bug 的情况应在单元测试中添加。
The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness. / 此拉取请求中的修改已完全由单元测试覆盖。如果不是，请添加更多单元测试以确保正确性。
All relevant documentation (API docs, docstrings, example tutorials) has been updated to reflect these changes. / 所有相关文档（API 文档、文档字符串、示例教程）已更新以反映这些更改。

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects. / 如果此拉取请求对下游或其他相关项目有潜在影响，应在那些项目中测试此 PR。
CLA has been signed and all committers have signed the CLA in this PR. / CLA 已签署，且本 PR 中的所有提交者均已签署 CLA。

👥 Collaboration Info / 协作信息

Suggested Reviewers / 建议审核人: @xxx
Relevant Module Owners / 相关模块负责人: @xxx
Other Collaboration Notes / 其他协作说明：

🌟 Useful CI Command / 实用的CI命令

Command / 命令	Introduction / 介绍
`/gemini review`	Performs a code review for the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 执行代码审核。
`/gemini summary`	Provides a summary of the current pull request in its current state by Gemini. / 对当前拉取请求在当前状态下由 Gemini 提供摘要。
`/gemini help`	Displays a list of available commands of Gemini. / 显示 Gemini 可用命令的列表。
`/readthedocs build`	Triggers a build of the documentation for the current pull request in its current state by Read the Docs. / 触发当前拉取请求在当前状态下由 Read the Docs 构建文档。

gemini-code-assist · 2026-03-13T09:10:15Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses compatibility issues between dataset loading and model postprocessing by updating the list of supported multimodal APIs and performing a significant cleanup of deprecated post-processing code. The changes ensure that model configurations are handled consistently, preventing failures and incorrect outputs under various settings, while also improving code maintainability.

Highlights

Multimodal API Compatibility: Updated the list of supported multimodal APIs to include VLLMCustomAPIChatStream, enhancing compatibility for dataset loading.
Post-processing Logic Refactoring: Performed a significant cleanup in model_postprocessors.py by removing extensive deprecated post-processing logic, including naive and xfinder extractors, to streamline the codebase and align with current configuration schemas.
Improved Debugging: Enhanced debugging capabilities in model_postprocessors.py by adding detailed logging to the list_decorator and extract_non_reasoning_content functions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

ais_bench/benchmark/datasets/utils/datasets.py
- Added VLLMCustomAPIChatStream to the MM_APIS list.
ais_bench/benchmark/utils/model_postprocessors.py
- Removed several deprecated post-processing functions (naive_model_postprocess, xfinder_postprocess) and their associated imports and helper functions.
- Introduced debug logging within the list_decorator and extract_non_reasoning_content functions.

Activity

Manually ran benchmark tasks that previously failed due to configuration mismatch, confirming successful execution.
Verified typical model configurations (e.g., general chat models and API-based models) produce expected outputs after the change.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to fix compatibility issues with model configurations by updating the list of multimodal APIs and refactoring the model post-processing logic. The changes include adding a new stream API to MM_APIS and removing deprecated naive and xfinder post-processing functions. While the code removal simplifies the module, it appears to break existing unit tests which have not been updated. I've also suggested a minor formatting improvement for better code style consistency.

I am having trouble creating individual review comments. Click here to see my feedback.

ais_bench/benchmark/utils/model_postprocessors.py (31-140)

The removal of naive_model_postprocess and xfinder_postprocess functions will cause existing unit tests in tests/UT/utils/test_model_postprocessors.py to fail (specifically test_naive_model_postprocess and test_xfinder_postprocess). The PR checklist indicates that tests have been updated, but this seems to have been overlooked. Please update or remove the corresponding tests to reflect these changes and ensure the test suite passes.

ais_bench/benchmark/datasets/utils/datasets.py (26-28)

The formatting of this list is inconsistent and does not follow common Python style guides like PEP 8. For better readability and consistency, when a list is split across multiple lines, it's recommended to place each item on a new line, indented.

MM_APIS = [
    "ais_bench.benchmark.models.VLLMCustomAPIChat",
    "ais_bench.benchmark.models.VLLMCustomAPIChatStream",
]

References

According to PEP 8, for multi-line constructs, you can use hanging indents. When doing so, it is conventional to place the first element on a new line and indent subsequent lines to distinguish them as continuation lines. This improves readability. ^(link)

Fix compatibility issues with model configuration

5ec059b

GaoHuaZhang temporarily deployed to smoke-test-approval March 13, 2026 09:10 — with GitHub Actions Inactive

github-actions bot added the bugfix label Mar 13, 2026

GaoHuaZhang changed the title ~~Fix compatibility issues with model configuration~~ 【BugFix Mar 13, 2026

GaoHuaZhang changed the title ~~【BugFix~~ 【BugFix】Fix model configuration compatibility in datasets and postprocessors Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

GaoHuaZhang temporarily deployed to smoke-test-approval March 13, 2026 09:28 — with GitHub Actions Inactive

del not use UT

510fe1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【BugFix】Fix model configuration compatibility in datasets and postprocessors#190

【BugFix】Fix model configuration compatibility in datasets and postprocessors#190
GaoHuaZhang wants to merge 2 commits intoAISBench:masterfrom
GaoHuaZhang:bugfix

GaoHuaZhang commented Mar 13, 2026

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GaoHuaZhang commented Mar 13, 2026

🔍 Motivation / 变更动机

📝 Modification / 修改内容

📐 Associated Test Results / 关联测试结果

⚠️ BC-breaking (Optional) / 向后不兼容变更（可选）

⚠️ Performance degradation (Optional) / 性能下降（可选）

🌟 Use cases (Optional) / 使用案例（可选）

✅ Checklist / 检查列表

👥 Collaboration Info / 协作信息

🌟 Useful CI Command / 实用的CI命令

Uh oh!

gemini-code-assist bot commented Mar 13, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

ais_bench/benchmark/utils/model_postprocessors.py (31-140)

ais_bench/benchmark/datasets/utils/datasets.py (26-28)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant