-
Notifications
You must be signed in to change notification settings - Fork 231
Description
Describe the bug
When loading a larger model (Qwen3-32B) using a HETERO:GPU.0,GPU.1 config, the error Invalid Kernel Args appears.
To Reproduce
Steps to reproduce the behavior:
- Download the Qwen3-32B model and convert using the latest export_model.py script. Command to convert as follows:
python export_model.py text_generation --source_model Qwen/Qwen3-32B --config_file_path /media/models/config.json --weight-format int4 --overwrite_models --target_device HETERO:GPU.0,GPU.1 --extra_quantization_params "--awq --group-size 128" --kv_cache_precision u8 --cache_size 6 --model_repository_path /media/models --reasoning_parser qwen3
- Compile OVMS from source for Ubuntu24 and run using:
export LD_LIBRARY_PATH=/devtools/openvino/ovms/lib/devtools/openvino/ovms/bin
export PATH=$PATH:
export PYTHONPATH=~/devtools/openvino/ovms/lib/python
ovms --config_path /media/models/config.json --rest_port 9033 --log_level DEBUG
- Kernel load error as seen below.
Expected behavior
As with previous versions (as well as previous drivers for the GPU), the model correctly loads and works. To be clear, this exact model with exact config DID PREVIOUSLY load on older versions of OVMS but no longer works.
Logs
[2025-11-24 07:07:08.665][33451][serving][error][servable_initializer.cpp:214] Error during llm node initialization for models_path: /media/models/Qwen/Qwen3-32B/./ exception: Exception from src/inference/src/cpp/core.cpp:114:
Exception from src/inference/src/dev/plugin.cpp:53:
Exception from src/plugins/hetero/src/compiled_model.cpp:36:
Standard exception from compilation library: Exception from src/inference/src/dev/plugin.cpp:53:
Check 'false' failed at src/plugins/intel_gpu/src/plugin/program_builder.cpp:163:
[GPU] ProgramBuilder build failed!
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_common.hpp:40:
[GPU] clEnqueueNDRangeKernel, error code: -52 CL_INVALID_KERNEL_ARGS
[2025-11-24 07:07:08.665][33451][modelmanager][error][servable_initializer.cpp:437] Error during LLM node resources initialization: The LLM Node resource initialization failed
Configuration
- OVMS version: built from source.
- OVMS config.json file:
{
"model_config_list": [
{
"config": {
"name": "Qwen/Qwen3-32B",
"base_path": "Qwen/Qwen3-32B"
}
}
]
}
- CPU, accelerator's versions if applicable: 11600KF, 3 x Intel Arc A770's with latest drivers as installed by the Client GPU guide on Intel's official documentation, linked from the Openvino documentation.
- /media/models/Qwen/Qwen3-32B/
- Qwen3-32B fails when converted.
Additional context
I have tried smaller cache sizes but this doesn't do anything. The model used to load perfectly on versions of OVMS pre August, however this no longer seems to be the case. I have done a completely fresh install of Ubuntu Server 24.04.3 LTS with the latest drivers for the GPUs from Intel.