[Bug]: ByteDance-Seed/BAGEL-7B-MoT int4 quantization: only partial layer quantization achievable without significant accuracy degradation

### Problem Description

[Intel/BAGEL-7B-MoT-int4-AutoRound](https://huggingface.co/Intel/BAGEL-7B-MoT-int4-AutoRound) is an int4 quantized version of [ByteDance-Seed/BAGEL-7B-MoT](https://huggingface.co/ByteDance-Seed/BAGEL-7B-MoT), produced with AutoRound. During quantization experiments, we found that applying int4 to all transformer layers leads to unacceptable accuracy loss. Only a subset of layers can be safely quantized at int4, while sensitive layers must remain at higher precision. 

### Reproduction Steps

CUDA_VISIBLE_DEVICES=0 python quantize_bagel.py --model ByteDance-Seed/BAGEL-7B-MoTT --output ./BAGEL-7B-MoT-W4A16  

[quantize_bagel.py](https://github.com/user-attachments/files/26411569/quantize_bagel.py)

### Environment Information

_No response_

### Error Logs

```shell

```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: ByteDance-Seed/BAGEL-7B-MoT int4 quantization: only partial layer quantization achievable without significant accuracy degradation #1645

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: ByteDance-Seed/BAGEL-7B-MoT int4 quantization: only partial layer quantization achievable without significant accuracy degradation #1645

Description

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions