diff --git a/docs/execution-providers/OpenVINO-ExecutionProvider.md b/docs/execution-providers/OpenVINO-ExecutionProvider.md index 04b37aa2c516d..eaf3a5c81b3ac 100644 --- a/docs/execution-providers/OpenVINO-ExecutionProvider.md +++ b/docs/execution-providers/OpenVINO-ExecutionProvider.md @@ -30,9 +30,9 @@ ONNX Runtime OpenVINO™ Execution Provider is compatible with three latest rele |ONNX Runtime|OpenVINO™|Notes| |---|---|---| +|1.24.1|2025.4.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.9)| |1.23.0|2025.3|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.8)| |1.22.0|2025.1|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.7)| -|1.21.0|2025.0|[Details](https://github.com/intel/onnxruntime/releases/tag/v5.6)| ## Build @@ -79,7 +79,7 @@ Runtime parameters set during OpenVINO Execution Provider initialization to cont | [**num_of_threads**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Control number of inference threads | | [**num_streams**](#num_of_threads--num_streams) | string | Any positive integer > 0 | size_t | Set parallel execution streams for throughput | | [**cache_dir**](#cache_dir) | string | Valid filesystem path | string | Enable openvino model caching for improved latency | -| [**load_config**](#load_config) | string | JSON file path | string | Load and set custom/HW specific OpenVINO properties from JSON | +| [**load_config**](#load_config) | string | JSON string | string | Load and set custom/HW specific OpenVINO properties from JSON | | [**enable_qdq_optimizer**](#enable_qdq_optimizer) | string | True/False | boolean | Enable QDQ optimization for NPU | | [**disable_dynamic_shapes**](#disable_dynamic_shapes) | string | True/False | boolean | Convert dynamic models to static shapes | | [**reshape_input**](#reshape_input) | string | input_name[shape_bounds] | string | Specify upper and lower bound for dynamic shaped inputs for improved performance with NPU | @@ -87,7 +87,7 @@ Runtime parameters set during OpenVINO Execution Provider initialization to cont **Deprecation Notice** -The following provider options are **deprecated** and should be migrated to `load_config` for better compatibility with future releases. +The following provider options are **deprecated since ORT 1.23** and should be migrated to `load_config` for better compatibility with future releases. | Deprecated Provider Option | `load_config` Equivalent | Recommended Migration | |---------------------------|------------------------|----------------------| @@ -147,7 +147,7 @@ Runs the same model on multiple devices in parallel to improve device utilizatio --- ### `precision` -**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `INFERENCE_PRECISION_HINT` property. +**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `INFERENCE_PRECISION_HINT` property. - Controls numerical precision during inference, balancing **performance** and **accuracy**. **Precision Support on Devices:** @@ -167,7 +167,7 @@ Runs the same model on multiple devices in parallel to improve device utilizatio --- ### `num_of_threads` & `num_streams` -**DEPRECATED:** These options are deprecated and can be set via `load_config` using the `INFERENCE_NUM_THREADS` and `NUM_STREAMS` properties respectively. +**DEPRECATED:** These options are deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `INFERENCE_NUM_THREADS` and `NUM_STREAMS` properties respectively. **Multi-Threading** @@ -185,31 +185,33 @@ Manages parallel inference streams for throughput optimization (default: `1` for ### `cache_dir` -**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `CACHE_DIR` property. +**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `CACHE_DIR` property. -Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU. + +Enables model caching to significantly reduce subsequent load times. Supports CPU, NPU, and GPU devices with kernel caching on iGPU/dGPU. **Benefits** -- Saves compiled models and `cl_cache` files for dynamic shapes +- Saves compiled models for faster subsequent loading - Eliminates recompilation overhead on subsequent runs -- Particularly useful for complex models and frequent application restarts - +- Particularly useful for optimizing application startup latencies, especially for complex models --- ### `load_config` -**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON configuration file during runtime. +**Recommended Configuration Method** for setting OpenVINO runtime properties. Provides direct access to OpenVINO properties through a JSON String during runtime. #### Overview -`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON file. This is the **preferred method** for configuring advanced OpenVINO features, offering: +`load_config` enables fine-grained control over OpenVINO inference behavior by loading properties from a JSON String. This is the **preferred method** for configuring advanced OpenVINO features, offering: - Direct access to OpenVINO runtime properties - Device-specific configuration - Better compatibility with future OpenVINO releases - No property name translation required + + #### JSON Configuration Format ```json { @@ -219,6 +221,33 @@ Enables model caching to significantly reduce subsequent load times. Supports CP } ``` +`load_config` now supports nested JSON objects up to **8 levels deep** for complex device configurations. + +**Maximum Nesting:** 8 levels deep. + +**Example: Multi-Level Nested Configuration** +```python +import onnxruntime as ort +import json + +# Complex nested configuration for AUTO device +config = { + "AUTO": { + "PERFORMANCE_HINT": "THROUGHPUT", + "DEVICE_PROPERTIES": { + "CPU": { + "PERFORMANCE_HINT": "LATENCY", + "NUM_STREAMS": "3" + }, + "GPU": { + "EXECUTION_MODE_HINT": "ACCURACY", + "PERFORMANCE_HINT": "LATENCY" + } + } + } +} +``` + **Supported Device Names:** - `"CPU"` - Intel CPU - `"GPU"` - Intel integrated/discrete GPU @@ -327,7 +356,7 @@ Property keys used in `load_config` JSON must match the string literal defined i ### `enable_qdq_optimizer` -**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `NPU_QDQ_OPTIMIZATION` property. +**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `NPU_QDQ_OPTIMIZATION` property. NPU-specific optimization for Quantize-Dequantize (QDQ) operations in the inference graph. This optimizer enhances ORT quantized models by: @@ -362,7 +391,7 @@ This configuration is required for optimal NPU memory allocation and management. ### `model_priority` -**DEPRECATED:** This option is deprecated and can be set via `load_config` using the `MODEL_PRIORITY` property. +**DEPRECATED:** This option is deprecated since OpenVINO 2025.3/ORT 1.23 and can be set via `load_config` using the `MODEL_PRIORITY` property. Configures resource allocation priority for multi-model deployment scenarios. @@ -401,31 +430,25 @@ Configures resource allocation priority for multi-model deployment scenarios. `input_image[NCHW],output_tensor[NC]` + --- ## Examples - ### Python - -#### Using load_config with JSON file +#### Using load_config with JSON string ```python import onnxruntime as ort import json -# Create config file +# Create config config = { "AUTO": { "PERFORMANCE_HINT": "THROUGHPUT", - "PERF_COUNT": "NO", - "DEVICE_PROPERTIES": "{CPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:3},GPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:5}}" + "DEVICE_PROPERTIES": "{GPU:{EXECUTION_MODE_HINT:ACCURACY,PERFORMANCE_HINT:LATENCY}}" } } - -with open("ov_config.json", "w") as f: - json.dump(config, f) - # Use config with session -options = {"device_type": "AUTO", "load_config": "ov_config.json"} +options = {"device_type": "AUTO", "load_config": json.dumps(config)} session = ort.InferenceSession("model.onnx", providers=[("OpenVINOExecutionProvider", options)]) ``` @@ -438,20 +461,14 @@ import json # Create CPU config config = { "CPU": { - "INFERENCE_PRECISION_HINT": "f32", - "NUM_STREAMS": "3", - "INFERENCE_NUM_THREADS": "8" + "PERFORMANCE_HINT": "LATENCY", + "NUM_STREAMS": "1" } } - -with open("cpu_config.json", "w") as f: - json.dump(config, f) - -options = {"device_type": "CPU", "load_config": "cpu_config.json"} +options = {"device_type": "CPU", "load_config": json.dumps(config)} session = ort.InferenceSession("model.onnx", providers=[("OpenVINOExecutionProvider", options)]) ``` - #### Using load_config for GPU ```python import onnxruntime as ort @@ -460,20 +477,16 @@ import json # Create GPU config with caching config = { "GPU": { - "INFERENCE_PRECISION_HINT": "f16", + "EXECUTION_MODE_HINT": "ACCURACY", "CACHE_DIR": "./model_cache", "PERFORMANCE_HINT": "LATENCY" } } - -with open("gpu_config.json", "w") as f: - json.dump(config, f) - -options = {"device_type": "GPU", "load_config": "gpu_config.json"} +options = {"device_type": "GPU", "load_config": json.dumps(config)} session = ort.InferenceSession("model.onnx", providers=[("OpenVINOExecutionProvider", options)]) -``` +``` --- ### Python API @@ -819,4 +832,4 @@ In order to showcase what you can do with the OpenVINO™ Execution Provider for [Tutorial: Using OpenVINO™ Execution Provider for ONNX Runtime Python Wheel Packages](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-execution-provider-for-onnx-runtime.html) ---- \ No newline at end of file +---