Skip to content

Failure to load HETERO models across multi GPU setup. #3812

@HumerousGorgon

Description

@HumerousGorgon

Describe the bug
When loading a larger model (Qwen3-32B) using a HETERO:GPU.0,GPU.1 config, the error Invalid Kernel Args appears.

To Reproduce
Steps to reproduce the behavior:

  1. Download the Qwen3-32B model and convert using the latest export_model.py script. Command to convert as follows:

python export_model.py text_generation --source_model Qwen/Qwen3-32B --config_file_path /media/models/config.json --weight-format int4 --overwrite_models --target_device HETERO:GPU.0,GPU.1 --extra_quantization_params "--awq --group-size 128" --kv_cache_precision u8 --cache_size 6 --model_repository_path /media/models --reasoning_parser qwen3

  1. Compile OVMS from source for Ubuntu24 and run using:

export LD_LIBRARY_PATH=/devtools/openvino/ovms/lib
export PATH=$PATH:
/devtools/openvino/ovms/bin
export PYTHONPATH=~/devtools/openvino/ovms/lib/python
ovms --config_path /media/models/config.json --rest_port 9033 --log_level DEBUG

  1. Kernel load error as seen below.

Expected behavior
As with previous versions (as well as previous drivers for the GPU), the model correctly loads and works. To be clear, this exact model with exact config DID PREVIOUSLY load on older versions of OVMS but no longer works.

Logs
[2025-11-24 07:07:08.665][33451][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-32B/./ exception: Exception from src/inference/src/cpp/core.cpp:114:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/hetero/src/compiled_model.cpp:36:
Standard exception from compilation library: Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS

[2025-11-24 07:07:08.665][33451][modelmanager][error][servable_initializer.cpp:437] Error during LLM node resources initialization: The LLM Node resource initialization failed

Configuration

  1. OVMS version: built from source.
  2. OVMS config.json file:

{
"model_config_list": [
{
"config": {
"name": "Qwen/Qwen3-32B",
"base_path": "Qwen/Qwen3-32B"
}
}
]
}

  1. CPU, accelerator's versions if applicable: 11600KF, 3 x Intel Arc A770's with latest drivers as installed by the Client GPU guide on Intel's official documentation, linked from the Openvino documentation.
  2. /media/models/Qwen/Qwen3-32B/
  3. Qwen3-32B fails when converted.

Additional context
I have tried smaller cache sizes but this doesn't do anything. The model used to load perfectly on versions of OVMS pre August, however this no longer seems to be the case. I have done a completely fresh install of Ubuntu Server 24.04.3 LTS with the latest drivers for the GPUs from Intel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions