Failure to load HETERO models across multi GPU setup.

**Describe the bug**
When loading a larger model (Qwen3-32B) using a HETERO:GPU.0,GPU.1 config, the error Invalid Kernel Args appears.

**To Reproduce**
Steps to reproduce the behavior:
1. Download the Qwen3-32B model and convert using the latest export_model.py script. Command to convert as follows:

python export_model.py text_generation --source_model Qwen/Qwen3-32B --config_file_path /media/models/config.json --weight-format int4 --overwrite_models --target_device HETERO:GPU.0,GPU.1 --extra_quantization_params "--awq --group-size 128" --kv_cache_precision u8 --cache_size 6 --model_repository_path /media/models --reasoning_parser qwen3

2. Compile OVMS from source for Ubuntu24 and run using:

export LD_LIBRARY_PATH=~/devtools/openvino/ovms/lib
export PATH=$PATH:~/devtools/openvino/ovms/bin
export PYTHONPATH=~/devtools/openvino/ovms/lib/python
ovms --config_path /media/models/config.json --rest_port 9033 --log_level DEBUG

3. Kernel load error as seen below.

**Expected behavior**
As with previous versions (as well as previous drivers for the GPU), the model correctly loads and works. To be clear, this exact model with exact config DID PREVIOUSLY load on older versions of OVMS but no longer works.

**Logs**
[2025-11-24 07:07:08.665][33451][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-32B/./ exception: Exception from src/inference/src/cpp/core.cpp:114:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/hetero/src/compiled_model.cpp:36:
Standard exception from compilation library: Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS






[2025-11-24 07:07:08.665][33451][modelmanager][error][servable_initializer.cpp:437] Error during LLM node resources initialization: The LLM Node resource initialization failed

**Configuration**
1. OVMS version: built from source.
2. OVMS config.json file:

{
    "model_config_list": [
        {
            "config": {
                "name": "Qwen/Qwen3-32B",
                "base_path": "Qwen/Qwen3-32B"
            }
        }
    ]
}


3. CPU, accelerator's versions if applicable: 11600KF, 3 x Intel Arc A770's with latest drivers as installed by the Client GPU guide on Intel's official documentation, linked from the Openvino documentation.
4. /media/models/Qwen/Qwen3-32B/
5. Qwen3-32B fails when converted.

**Additional context**
I have tried smaller cache sizes but this doesn't do anything. The model used to load perfectly on versions of OVMS pre August, however this no longer seems to be the case. I have done a completely fresh install of Ubuntu Server 24.04.3 LTS with the latest drivers for the GPUs from Intel.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failure to load HETERO models across multi GPU setup. #3812

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failure to load HETERO models across multi GPU setup. #3812

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions