Support ByteDance-Seed/BAGEL-7B-MoT quantization in w4a16 format by lvliang-intel · Pull Request #1633 · intel/auto-round

lvliang-intel · 2026-03-27T12:44:37Z

Description

This PR adds proper BAGEL model quantization support to the standard AutoRound LLM quantization flow and fixes the exported quantization metadata required by downstream vLLM-Omni loading.

Main changes:

(1) Route BAGEL through the LLM compressor.
(2) Load BAGEL with a dedicated custom loader because transformers does not natively recognize the bagel architecture.
(3) Gracefully handle AutoConfig.from_pretrained failures for unsupported model types such as bagel.
(4) Export the correct block_name_to_quantize metadata so downstream runtimes only quantize BAGEL LLM blocks instead of non-LLM modules like connector or vision components.
(5) Add a BAGEL-specific ignore policy to preserve image-generation-sensitive modules in FP16:
a. all moe_gen modules
b. shared attention projections (q_proj, k_proj, v_proj, o_proj)
(6) Fix save_pretrained() in the BAGEL custom loader to use state_dict() instead of named_parameters(), ensuring registered buffers (e.g., rotary embedding caches) are included in the saved model.safetensors for correct reload and inference.

Type of Change

Related Issues

#1608

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…upport_bagel_mot

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Adds BAGEL-7B-MoT (ByteDance-Seed/BAGEL-7B-MoT) support to AutoRound’s quantization flow, including custom model loading and metadata/ignore-layer handling needed for downstream runtimes (e.g., vLLM-Omni).

Changes:

Introduces a custom BAGEL loader and routes BAGEL through the LLM compressor flow.
Adds BAGEL-specific block selection/ignore-layer policies and extends “extra files” copying for BAGEL sub-configs.
Improves robustness by handling AutoConfig.from_pretrained(...) failures for unsupported model types.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
auto_round/utils/model.py	Routes BAGEL loading, adjusts MLLM detection, and adds extra model files + quant-block hinting.
auto_round/utils/bagel_loader.py	New BAGEL custom loader/wrapper and save logic for vLLM-Omni compatibility.
auto_round/special_model_handler.py	Registers BAGEL multimodal blocks + BAGEL ignore-layer policy.
auto_round/modeling/unfused_moe/init.py	Makes config pre-check resilient to unsupported/unknown model types.
auto_round/compressors/base.py	Makes config loading resilient and adds support for model-provided quant-block hints.

auto_round/utils/bagel_loader.py

Copilot · 2026-03-27T12:55:19Z

auto_round/utils/bagel_loader.py

+def load_bagel_model(model_path, torch_dtype="auto", device_map=None):
+    """Load a BAGEL model for quantization.
+
+    Args:
+        model_path: Path to the BAGEL model directory.
+        torch_dtype: Data type for model weights.
+        device_map: Device map for model placement.
+
+    Returns:
+        Tuple of (model, tokenizer).
+    """
+    # Load configs
+    config_path = os.path.join(model_path, "config.json")
+    with open(config_path, "r", encoding="utf-8") as f:
+        bagel_config_dict = json.load(f)
+


load_bagel_model() assumes model_path is a local directory and immediately opens os.path.join(model_path, "config.json"). However callers (e.g., mllm_load_model) may pass a HF repo id. Add a guard at the start to resolve repo ids to a local snapshot (e.g., if not os.path.isdir(model_path): model_path = download_or_get_path(model_path, ...)) so BAGEL loading works for both local and remote models.

@copilot apply changes based on this feedback

auto_round/utils/bagel_loader.py

Copilot · 2026-03-27T12:55:19Z

auto_round/utils/model.py

+    # BAGEL requires a custom loader (Qwen2 + not extensions, not in transformers)
+    _config_path = (
+        os.path.join(pretrained_model_name_or_path, "config.json")
+        if os.path.isdir(pretrained_model_name_or_path)
+        else None
+    )
+    if _config_path and os.path.exists(_config_path):
+        with open(_config_path) as _f:
+            _mt = json.load(_f).get("model_type")
+        if _mt == "bagel":
+            from auto_round.utils.bagel_loader import load_bagel_model
+
+            model, tokenizer = load_bagel_model(
+                pretrained_model_name_or_path,
+                torch_dtype=torch_dtype,
+            )
+            model = _to_model_dtype(model, model_dtype)


This PR adds new BAGEL-specific branching in llm_load_model / mllm_load_model and new multimodal-detection behavior, but there are no corresponding unit tests under test/ to cover (a) BAGEL being treated as LLM-only by is_mllm_model for both local and remote paths, and (b) BAGEL routing to the custom loader. The test suite already covers similar branching for other model types (e.g., GLM image), so adding focused tests here would prevent regressions.

@copilot apply changes based on this feedback

Copilot · 2026-03-27T12:55:19Z

auto_round/utils/model.py

+    # BAGEL requires a custom loader (Qwen2 + not extensions, not in transformers)
+    _config_path = (
+        os.path.join(pretrained_model_name_or_path, "config.json")
+        if os.path.isdir(pretrained_model_name_or_path)
+        else None
+    )
+    if _config_path and os.path.exists(_config_path):
+        with open(_config_path) as _f:
+            _mt = json.load(_f).get("model_type")
+        if _mt == "bagel":
+            from auto_round.utils.bagel_loader import load_bagel_model
+
+            model, tokenizer = load_bagel_model(
+                pretrained_model_name_or_path,
+                torch_dtype=torch_dtype,
+            )
+            model = _to_model_dtype(model, model_dtype)


BAGEL routing here only triggers when pretrained_model_name_or_path is a local directory (checks os.path.isdir + reads local config.json). If the user passes a HF repo id (the common AutoRound flow), this branch is skipped and AutoModelForCausalLM.from_pretrained() will still be attempted, which is expected to fail for model_type=bagel. Consider detecting BAGEL for remote repos too (e.g., hf_hub_download config.json or download_or_get_path + read config.json) and then call load_bagel_model with the resolved local snapshot path.

@copilot apply changes based on this feedback

auto_round/utils/model.py

Copilot · 2026-03-27T12:55:20Z

auto_round/utils/model.py

    model_path = model_or_path if isinstance(model_or_path, str) else model_or_path.name_or_path
+
+    # Check model_type exclusion: some models have multimodal components
+    # but should be quantized as LLM (e.g., BAGEL not).
+    _model_type = None
+    if isinstance(model_or_path, torch.nn.Module) and hasattr(model_or_path, "config"):
+        _model_type = getattr(model_or_path.config, "model_type", None)
+    elif isinstance(model_path, str) and os.path.isdir(model_path):
+        _cfg_path = os.path.join(model_path, "config.json")
+        if os.path.exists(_cfg_path):
+            with open(_cfg_path) as _f:
+                _model_type = json.load(_f).get("model_type")
+    if _model_type in _LLM_ONLY_MODEL_TYPES:
+        return False
    # For dummy model, model_path could be "".
    if model_path and not os.path.isdir(model_path):


is_mllm_model() checks _LLM_ONLY_MODEL_TYPES (e.g., bagel) only before download_or_get_path() runs. For a remote HF repo id, _model_type stays None at that point, the model is downloaded, and the function then proceeds to detect multimodal artifacts (e.g., preprocessor_config.json) and will incorrectly return True for BAGEL. Move the model_type check to after the potential download (or re-check once model_path is resolved) so BAGEL is consistently treated as LLM-only for both local and remote inputs.

@copilot apply changes based on this feedback

auto_round/special_model_handler.py

lvliang-intel · 2026-03-27T13:03:42Z

Quantize Script
quantize_bagel.py

Run inference with vLLM-Omni(with patch for bagel mot model)

run_bagel.py

CUDA_VISIBLE_DEVICES=0 python run_bagel.py --model /mnt/disk4/lvl/BAGEL-7B-MoT/ --prompt "A cute cat sitting on a windowsill" --output orginal_bagel_model_output.png

CUDA_VISIBLE_DEVICES=0 python run_bagel.py --model /mnt/disk4/lvl/BAGEL-7B-MoT-W4A16/ --prompt "A cute cat sitting on a windowsill" --output quantized_bagel_model_output.png

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

…-round into lvl/support_bagel_mot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Agent-Logs-Url: https://github.com/intel/auto-round/sessions/57e2e340-88e0-42e8-9528-a24ad1bc7d61 Co-authored-by: lvliang-intel <104267837+lvliang-intel@users.noreply.github.com>

wenhuach21 · 2026-03-29T05:14:35Z

How about upstreaming the model once it’s supported (assuming the license allows it)? There’s no need to wait for the PR to be merged.

lvliang-intel added 2 commits March 27, 2026 20:37

Support BAGEL quantization

8e4d261

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'main' of https://github.com/intel/auto-round into lvl/s…

6432a6f

…upport_bagel_mot

Copilot AI review requested due to automatic review settings March 27, 2026 12:44

[pre-commit.ci] auto fixes from pre-commit.com hooks

8c23b97

for more information, see https://pre-commit.ci

Copilot started reviewing on behalf of lvliang-intel March 27, 2026 12:50 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

lvliang-intel added 2 commits March 27, 2026 21:05

update code

c1a2b94

Signed-off-by: lvliang-intel <liang1.lv@intel.com>

Merge branch 'lvl/support_bagel_mot' of https://github.com/intel/auto…

4d99381

…-round into lvl/support_bagel_mot

Copilot started work on behalf of lvliang-intel March 27, 2026 13:06 View session

lvliang-intel and others added 3 commits March 27, 2026 21:10

Update auto_round/utils/bagel_loader.py

ad34ebb

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update auto_round/utils/model.py

06ca2d3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Fix save_pretrained to use state_dict() instead of named_parameters()

39f8d2c

Agent-Logs-Url: https://github.com/intel/auto-round/sessions/57e2e340-88e0-42e8-9528-a24ad1bc7d61 Co-authored-by: lvliang-intel <104267837+lvliang-intel@users.noreply.github.com>

Copilot finished work on behalf of lvliang-intel March 27, 2026 13:15

Merge branch 'main' into lvl/support_bagel_mot

6992587

Conversation

lvliang-intel commented Mar 27, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

lvliang-intel Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

lvliang-intel Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

lvliang-intel Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

lvliang-intel Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lvliang-intel commented Mar 27, 2026

Uh oh!

wenhuach21 commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lvliang-intel commented Mar 27, 2026 •

edited by Copilot AI

Loading