fix nextstep loading issue by xin3he · Pull Request #1640 · intel/auto-round

xin3he · 2026-03-30T12:59:35Z

Description

fix nextstep loading issue

example_prompt = "A REALISTIC PHOTOGRAPH OF A WALL WITH \"TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE\" PROMINENTLY DISPLAYED"

Raw model output:

W4A16 model output with torch backend on CPU:

W4A16 model output with `gptqmodel:marlin` backend on CUDA:

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

Signed-off-by: Xin He <xin3.he@intel.com>

Copilot

Pull request overview

Fixes model loading for the “nextstep” model type by selecting an appropriate AutoModel loader, and adjusts multimodal key detection to recognize “image”-named components.

Changes:

Force AutoModel for model_type == "nextstep" during MLLM model loading.
Add "image" to MM_KEYS to broaden multimodal component detection.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`auto_round/utils/model.py`	Adds a NextStep-specific loader class override to resolve loading failures.
`auto_round/utils/common.py`	Extends multimodal key matching to include `"image"` for downstream detection/mapping.

auto_round/utils/model.py

auto_round/utils/common.py

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he · 2026-03-30T13:52:54Z

The exllama backend has accuracy issue for nextstep generation.
The marlin backend requires main branch so fix it in this PR.
cc @wenhuach21

wenhuach21 · 2026-03-31T01:42:57Z

auto_round_extension/cuda/gptqmodel_marlin.py

-            )
+            backend = kwargs.pop("backend", BACKEND.MARLIN)
+            if NEW_VERSION_6_0:
+                # gptqmodel >= 6.0.0: BaseQuantLinear no longer accepts group_size/sym/desc_act/pack_dtype


could you help fix other gptqmodel backend issues, such as gptqmodel:exllamav2,gptqmodel:awq,etc.

Since the GPTQModel API is changing frequently, should we consider other repos, such as using vLLM directly?

Only this backend is using BaseQuantLinear. Other backend should work as expected.
The vLLM kernel looks promising, but installing vLLM involves a large number of dependencies, which could result in a poor user experience.
vllm/csrc/quantization

What if we bypass the GPTQModel API and use our own interface to directly leverage its kernels? I assume the kernel implementations change far less frequently.

wenhuach21 · 2026-03-31T01:44:35Z

better add next_step to mllm support matrix

xin3he · 2026-03-31T08:07:09Z

I need to upstream a model before updating the support matrix (requires model link).

wenhuach21 · 2026-03-31T08:57:43Z

I need to upstream a model before updating the support matrix (requires model link).

If the model’s license allows upstreaming, we can upload it. Otherwise, we can leave the link blank.

fix nextstep loading issue

2bc3697

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he requested review from Copilot, mengniwang95 and n1ck-guo March 30, 2026 12:59

Copilot started reviewing on behalf of xin3he March 30, 2026 13:11 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

auto_round/utils/model.py Show resolved Hide resolved

auto_round/utils/common.py Show resolved Hide resolved

support 6.0.0 gptqmodel

2bb744a

Signed-off-by: Xin He <xin3.he@intel.com>

xin3he requested a review from wenhuach21 March 30, 2026 13:57

wenhuach21 reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix nextstep loading issue#1640

fix nextstep loading issue#1640
xin3he wants to merge 2 commits intomainfrom
xinhe/3-30a

xin3he commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 30, 2026 •

edited

Loading

Uh oh!

wenhuach21 Mar 31, 2026

Uh oh!

xin3he Mar 31, 2026

Uh oh!

wenhuach21 Mar 31, 2026

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Raw model output:

W4A16 model output with torch backend on CPU:

W4A16 model output with gptqmodel:marlin backend on CUDA:

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

xin3he commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenhuach21 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

xin3he Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

xin3he commented Mar 31, 2026

Uh oh!

wenhuach21 commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xin3he commented Mar 30, 2026 •

edited

Loading

W4A16 model output with `gptqmodel:marlin` backend on CUDA:

xin3he commented Mar 30, 2026 •

edited

Loading