Conversation
Signed-off-by: Xin He <xin3.he@intel.com>
There was a problem hiding this comment.
Pull request overview
Fixes model loading for the “nextstep” model type by selecting an appropriate AutoModel loader, and adjusts multimodal key detection to recognize “image”-named components.
Changes:
- Force
AutoModelformodel_type == "nextstep"during MLLM model loading. - Add
"image"toMM_KEYSto broaden multimodal component detection.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
auto_round/utils/model.py |
Adds a NextStep-specific loader class override to resolve loading failures. |
auto_round/utils/common.py |
Extends multimodal key matching to include "image" for downstream detection/mapping. |
Signed-off-by: Xin He <xin3.he@intel.com>
|
The |
| ) | ||
| backend = kwargs.pop("backend", BACKEND.MARLIN) | ||
| if NEW_VERSION_6_0: | ||
| # gptqmodel >= 6.0.0: BaseQuantLinear no longer accepts group_size/sym/desc_act/pack_dtype |
There was a problem hiding this comment.
could you help fix other gptqmodel backend issues, such as gptqmodel:exllamav2,gptqmodel:awq,etc.
Since the GPTQModel API is changing frequently, should we consider other repos, such as using vLLM directly?
There was a problem hiding this comment.
Only this backend is using BaseQuantLinear. Other backend should work as expected.
The vLLM kernel looks promising, but installing vLLM involves a large number of dependencies, which could result in a poor user experience.
vllm/csrc/quantization
There was a problem hiding this comment.
What if we bypass the GPTQModel API and use our own interface to directly leverage its kernels? I assume the kernel implementations change far less frequently.
|
better add next_step to mllm support matrix |
|
I need to upstream a model before updating the support matrix (requires model link). |
If the model’s license allows upstreaming, we can upload it. Otherwise, we can leave the link blank. |
Description
fix nextstep loading issue
example_prompt = "A REALISTIC PHOTOGRAPH OF A WALL WITH \"TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE\" PROMINENTLY DISPLAYED"Raw model output:
W4A16 model output with torch backend on CPU:
W4A16 model output with
gptqmodel:marlinbackend on CUDA:Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting