Skip to content

Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345

Open
hanbitmyths wants to merge 13 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-vl
Open

Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345
hanbitmyths wants to merge 13 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-vl

Conversation

@hanbitmyths
Copy link

@hanbitmyths hanbitmyths commented Mar 4, 2026

This PR adds support for exporting and optimizing Qwen3-VL (and Qwen2.5-VL) vision-language models through Olive, including new ONNX graph surgery passes, 8-bit quantization enhancements, and a cast chain elimination pass.

  • Add Qwen3-VL / Qwen2.5-VL model export support via Model Builder and torch export
  • New pass: CastChainElimination removes redundant Cast→Cast chains (e.g., fp32→fp16→fp32) by collapsing them into a single Cast or eliminating them entirely when source and target types match.
  • GemmToMatMulAdd graph surgery converts Gemm nodes to MatMul+Add for broader runtime compatibility.
  • ReciprocalMulToDiv graph surgery fuses Reciprocal→Mul patterns into a single Div node.
  • DeduplicateSubgraphInitializers graph surgery merges duplicate initializers that share identical tensor data.
  • DeduplicateNodes graph surgery removes duplicate nodes that have identical op_type, attributes, and inputs.
  • Add 8-bit integer Gather quantization into RTN quantization.
  • Skip quantization of unused initializers.

- graph_surgeries.py: add QwenVL-specific graph surgery passes for
  vision embedding merge and positional encoding fixup
- rtn_quantization.py: extend RTN quantization for multimodal models,
  handle vision encoder exclusion patterns
- cast_chain_elimination.py: new pass to eliminate redundant Cast chains
  in Dynamo-exported models (fp32->fp16->fp32 patterns)
- olive_config.json: register new passes
…surgery passes

- rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather
- common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import
- graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes
hanbitmyths and others added 4 commits March 3, 2026 22:55
- Apply ruff format to 4 files (cast_chain_elimination.py,
  rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py)
- Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with
  axis-aware _pack_int4_along_axis that correctly packs zero_point when
  k_blocks is small (e.g. 1), avoiding ValueError on reshape
- Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized
  always uses quantize_axis=data_rank-1, not pass_config['axis']
The upstream tuning_strategies.md page no longer exists, causing the
Sphinx linkcheck to fail with -W (warnings-as-errors).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant