Skip to content

fix nextstep loading issue#1640

Open
xin3he wants to merge 2 commits intomainfrom
xinhe/3-30a
Open

fix nextstep loading issue#1640
xin3he wants to merge 2 commits intomainfrom
xinhe/3-30a

Conversation

@xin3he
Copy link
Copy Markdown
Contributor

@xin3he xin3he commented Mar 30, 2026

Description

fix nextstep loading issue

example_prompt = "A REALISTIC PHOTOGRAPH OF A WALL WITH \"TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE\" PROMINENTLY DISPLAYED"

Raw model output:

image

W4A16 model output with torch backend on CPU:

image

W4A16 model output with gptqmodel:marlin backend on CUDA:

image

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

Signed-off-by: Xin He <xin3.he@intel.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes model loading for the “nextstep” model type by selecting an appropriate AutoModel loader, and adjusts multimodal key detection to recognize “image”-named components.

Changes:

  • Force AutoModel for model_type == "nextstep" during MLLM model loading.
  • Add "image" to MM_KEYS to broaden multimodal component detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
auto_round/utils/model.py Adds a NextStep-specific loader class override to resolve loading failures.
auto_round/utils/common.py Extends multimodal key matching to include "image" for downstream detection/mapping.

Signed-off-by: Xin He <xin3.he@intel.com>
@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Mar 30, 2026

The exllama backend has accuracy issue for nextstep generation.
The marlin backend requires main branch so fix it in this PR.
cc @wenhuach21

@xin3he xin3he requested a review from wenhuach21 March 30, 2026 13:57
)
backend = kwargs.pop("backend", BACKEND.MARLIN)
if NEW_VERSION_6_0:
# gptqmodel >= 6.0.0: BaseQuantLinear no longer accepts group_size/sym/desc_act/pack_dtype
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you help fix other gptqmodel backend issues, such as gptqmodel:exllamav2,gptqmodel:awq,etc.

Since the GPTQModel API is changing frequently, should we consider other repos, such as using vLLM directly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only this backend is using BaseQuantLinear. Other backend should work as expected.
The vLLM kernel looks promising, but installing vLLM involves a large number of dependencies, which could result in a poor user experience.
vllm/csrc/quantization

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we bypass the GPTQModel API and use our own interface to directly leverage its kernels? I assume the kernel implementations change far less frequently.

@wenhuach21
Copy link
Copy Markdown
Contributor

better add next_step to mllm support matrix

@xin3he
Copy link
Copy Markdown
Contributor Author

xin3he commented Mar 31, 2026

I need to upstream a model before updating the support matrix (requires model link).

@wenhuach21
Copy link
Copy Markdown
Contributor

I need to upstream a model before updating the support matrix (requires model link).

If the model’s license allows upstreaming, we can upload it. Otherwise, we can leave the link blank.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants