Skip to content

[Bug]: ByteDance-Seed/BAGEL-7B-MoT int4 quantization: only partial layer quantization achievable without significant accuracy degradation #1645

@lvliang-intel

Description

@lvliang-intel

Problem Description

Intel/BAGEL-7B-MoT-int4-AutoRound is an int4 quantized version of ByteDance-Seed/BAGEL-7B-MoT, produced with AutoRound. During quantization experiments, we found that applying int4 to all transformer layers leads to unacceptable accuracy loss. Only a subset of layers can be safely quantized at int4, while sensitive layers must remain at higher precision.

Reproduction Steps

CUDA_VISIBLE_DEVICES=0 python quantize_bagel.py --model ByteDance-Seed/BAGEL-7B-MoTT --output ./BAGEL-7B-MoT-W4A16

quantize_bagel.py

Environment Information

No response

Error Logs

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions