Problem Description
Intel/BAGEL-7B-MoT-int4-AutoRound is an int4 quantized version of ByteDance-Seed/BAGEL-7B-MoT, produced with AutoRound. During quantization experiments, we found that applying int4 to all transformer layers leads to unacceptable accuracy loss. Only a subset of layers can be safely quantized at int4, while sensitive layers must remain at higher precision.
Reproduction Steps
CUDA_VISIBLE_DEVICES=0 python quantize_bagel.py --model ByteDance-Seed/BAGEL-7B-MoTT --output ./BAGEL-7B-MoT-W4A16
quantize_bagel.py
Environment Information
No response
Error Logs
Additional Context
No response