Add Qwen3-VL model support + multi-image input support in Qwen VL family by hanbitmyths · Pull Request #2345 · microsoft/Olive

hanbitmyths · 2026-03-04T06:32:43Z

This PR adds support for exporting and optimizing Qwen3-VL (and Qwen2.5-VL) vision-language models through Olive, including new ONNX graph surgery passes, 8-bit quantization enhancements, and a cast chain elimination pass.

Add Qwen3-VL / Qwen2.5-VL model export support via Model Builder and torch export
New pass: CastChainElimination removes redundant Cast→Cast chains (e.g., fp32→fp16→fp32) by collapsing them into a single Cast or eliminating them entirely when source and target types match.
GemmToMatMulAdd graph surgery converts Gemm nodes to MatMul+Add for broader runtime compatibility.
ReciprocalMulToDiv graph surgery fuses Reciprocal→Mul patterns into a single Div node.
DeduplicateSubgraphInitializers graph surgery merges duplicate initializers that share identical tensor data.
DeduplicateNodes graph surgery removes duplicate nodes that have identical op_type, attributes, and inputs.
Add 8-bit integer Gather quantization into RTN quantization.
Skip quantization of unused initializers.

- graph_surgeries.py: add QwenVL-specific graph surgery passes for vision embedding merge and positional encoding fixup - rtn_quantization.py: extend RTN quantization for multimodal models, handle vision encoder exclusion patterns - cast_chain_elimination.py: new pass to eliminate redundant Cast chains in Dynamo-exported models (fp32->fp16->fp32 patterns) - olive_config.json: register new passes

…surgery passes - rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather - common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import - graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes

olive/passes/onnx/cast_chain_elimination.py

olive/passes/onnx/common.py

olive/passes/onnx/graph_surgeries.py

…author (TD002), fix formatting

- Apply ruff format to 4 files (cast_chain_elimination.py, rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py) - Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with axis-aware _pack_int4_along_axis that correctly packs zero_point when k_blocks is small (e.g. 1), avoiding ValueError on reshape - Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized always uses quantize_axis=data_rank-1, not pass_config['axis']

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

…t#2351)

hanbitmyths added 7 commits February 26, 2026 11:19

Fix ModelBuilder sys.path for ort-genai builders package import

514362d

Expose real ModelBuilder import error for debugging

cb1987b

Clean up ModelBuilder import fix (expose chain, not debug print)

2c2269e

Remove sys.path hack for onnxruntime-genai builder import

e77864f

Add unit tests for Qwen3-VL graph surgery and quantization passes

4d5283e

github-advanced-security bot found potential problems Mar 4, 2026

View reviewed changes

hanbitmyths mentioned this pull request Mar 4, 2026

Add Qwen3-VL-2B and Qwen2.5-VL-3B builtin optimization recipes microsoft/olive-recipes#254

Open

hanbitmyths and others added 4 commits March 3, 2026 22:55

Fix lintrunner warnings: rename uppercase variables (N806), add TODO …

9fc9bd3

…author (TD002), fix formatting

Merge branch 'main' into sunghcho/qwen3-vl

32cc2ce

Add linkcheck_ignore for broken intel/neural-compressor URL

62544da

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

hanbitmyths mentioned this pull request Mar 4, 2026

Add Qwen3-VL runtime, export, and Python guide support microsoft/onnxruntime-genai#1999

Open

hanbitmyths and others added 2 commits March 5, 2026 21:50

Merge branch 'main' into sunghcho/qwen3-vl

efe845f

Remove neural-compressor linkcheck_ignore (fixed upstream in microsof…

3d0029c

…t#2351)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345

Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345
hanbitmyths wants to merge 13 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-vl

hanbitmyths commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanbitmyths commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hanbitmyths commented Mar 4, 2026 •

edited

Loading