[Bug]: Qwen3-moe FULL_DECODE_ONLY inference crash

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Collecting environment information...
PyTorch version: 2.7.1+cpu
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0
Clang version: Could not collect
CMake version: version 4.1.2
Libc version: glibc-2.35

Python version: 3.11.13 (main, Nov  2 2025, 10:27:27) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-200.0.0.131.30.oe2203sp3.bclinux.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                       aarch64
CPU op-mode(s):                     64-bit
Byte Order:                         Little Endian
CPU(s):                             640
On-line CPU(s) list:                0-639
Vendor ID:                          HiSilicon
BIOS Vendor ID:                     HiSilicon
BIOS Model name:                    Kunpeng 920 7285Z
Model:                              0
Thread(s) per core:                 2
Core(s) per socket:                 80
Socket(s):                          4
Stepping:                           0x0
Frequency boost:                    disabled
CPU max MHz:                        3000.0000
CPU min MHz:                        400.0000
BogoMIPS:                           200.00
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint svei8mm svef32mm svef64mm svebf16 i8mm bf16 dgh rng ecv
L1d cache:                          20 MiB (320 instances)
L1i cache:                          20 MiB (320 instances)
L2 cache:                           400 MiB (320 instances)
L3 cache:                           560 MiB (8 instances)
NUMA node(s):                       8
NUMA node0 CPU(s):                  0-79
NUMA node1 CPU(s):                  80-159
NUMA node2 CPU(s):                  160-239
NUMA node3 CPU(s):                  240-319
NUMA node4 CPU(s):                  320-399
NUMA node5 CPU(s):                  400-479
NUMA node6 CPU(s):                  480-559
NUMA node7 CPU(s):                  560-639
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Not affected
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==27.1.0
[pip3] torch==2.7.1+cpu
[pip3] torch_npu==2.7.1
[pip3] torchvision==0.22.1
[pip3] transformers==4.57.1
[conda] Could not collect
vLLM Version: 0.11.0
vLLM Ascend Version: 0.11.0rc1

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=1
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 25.3.rc1                 Version: 25.3.rc1                                             |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip  Phy-ID              | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 0     Ascend910           | OK            | 155.0       37                0    / 0             |
| 0     0                   | 0000:9D:00.0  | 0           0    / 0          3144 / 65536         |
+------------------------------------------------------------------------------------------------+
| 0     Ascend910           | OK            | -           36                0    / 0             |
| 1     1                   | 0000:9F:00.0  | 0           0    / 0          2876 / 65536         |
+===========================+===============+====================================================+
| 1     Ascend910           | OK            | 164.0       37                0    / 0             |
| 0     2                   | 0000:99:00.0  | 0           0    / 0          3133 / 65536         |
+------------------------------------------------------------------------------------------------+
| 1     Ascend910           | OK            | -           36                0    / 0             |
| 1     3                   | 0000:9B:00.0  | 0           0    / 0          2888 / 65536         |
+===========================+===============+====================================================+
| 2     Ascend910           | OK            | 158.4       37                0    / 0             |
| 0     4                   | 0000:95:00.0  | 0           0    / 0          3144 / 65536         |
+------------------------------------------------------------------------------------------------+
| 2     Ascend910           | OK            | -           37                0    / 0             |
| 1     5                   | 0000:97:00.0  | 0           0    / 0          2876 / 65536         |
+===========================+===============+====================================================+
| 3     Ascend910           | OK            | 176.4       37                0    / 0             |
| 0     6                   | 0000:91:00.0  | 0           0    / 0          3143 / 65536         |
+------------------------------------------------------------------------------------------------+
| 3     Ascend910           | OK            | -           38                0    / 0             |
| 1     7                   | 0000:93:00.0  | 0           0    / 0          2877 / 65536         |
+===========================+===============+====================================================+
| 4     Ascend910           | OK            | 162.7       37                0    / 0             |
| 0     8                   | 0000:8D:00.0  | 0           0    / 0          2907 / 65536         |
+------------------------------------------------------------------------------------------------+
| 4     Ascend910           | OK            | -           37                0    / 0             |
| 1     9                   | 0000:8F:00.0  | 0           0    / 0          2870 / 65536         |
+===========================+===============+====================================================+
| 5     Ascend910           | OK            | 165.7       37                0    / 0             |
| 0     10                  | 0000:89:00.0  | 0           0    / 0          2908 / 65536         |
+------------------------------------------------------------------------------------------------+
| 5     Ascend910           | OK            | -           37                0    / 0             |
| 1     11                  | 0000:8B:00.0  | 0           0    / 0          2870 / 65536         |
+===========================+===============+====================================================+
| 6     Ascend910           | OK            | 162.7       37                0    / 0             |
| 0     12                  | 0000:85:00.0  | 0           0    / 0          2907 / 65536         |
+------------------------------------------------------------------------------------------------+
| 6     Ascend910           | OK            | -           36                0    / 0             |
| 1     13                  | 0000:87:00.0  | 0           0    / 0          2870 / 65536         |
+===========================+===============+====================================================+
| 7     Ascend910           | OK            | 172.3       37                0    / 0             |
| 0     14                  | 0000:81:00.0  | 0           0    / 0          2895 / 65536         |
+------------------------------------------------------------------------------------------------+
| 7     Ascend910           | OK            | -           37                0    / 0             |
| 1     15                  | 0000:83:00.0  | 0           0    / 0          2882 / 65536         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 0                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 2                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 3                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 4                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 5                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 6                                                            |
+===========================+===============+====================================================+
| No running processes found in NPU 7                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.3.RC1
innerversion=V100R001C23SPC001B235
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21],[V100R001C23]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.3.RC1/aarch64-linux
```

</details>


### 🐛 Describe the bug

## start script

```bash
export VLLM_USE_V1=1
#export ASCEND_RT_VISIBLE_DEVICES=4
export HCCL_OP_EXPANSION_MODE="AIV"
export VLLM_ASCEND_ENABLE_FLASHCOMM=1
export HCCL_BUFFSIZE=1024
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=10
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export VLLM_VERSION=0.11.0
#export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libjemalloc.so.2:$LD_PRELOAD


python -m vllm.entrypoints.openai.api_server  \
       --model /data/Qwen3-30B-A3B-Instruct-2507-w8a8 \
       --served-model-name Qwen3 \
       --trust-remote-code \
       --max-num-seqs 128 \
       --max-model-len 32768 \
       --max-num-batched-tokens 16384 \
       --data-parallel-size 8 \
       --enable-expert-parallel \
       --port 9082 \
       --distributed_executor_backend "mp" \
       --no-enable-prefix-caching \
       --async-scheduling True \
       --quantization ascend \
       --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \
       --gpu-memory-utilization 0.9
```

## server log
```log
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]                             ^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise output
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 439, in output_handler
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     outputs = await engine_core.get_output_async()
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 846, in get_output_async
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145]     raise self._format_exception(outputs) from None
(APIServer pid=11033) ERROR 11-21 02:54:31 [serving_chat.py:1145] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(APIServer pid=11033) INFO:     Shutting down
```

## one maybe usefull plog:
```log
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.184 [pywrapper.cpp:95][CANNKB][Tid:847]"Traceback (most recent call last):
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.193 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/managers.py", line 814, in _callmethod
    conn = self._tls.connection
           ^^^^^^^^^^^^^^^^^^^^
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.195 [pywrapper.cpp:95][CANNKB][Tid:847]"AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.196 [pywrapper.cpp:95][CANNKB][Tid:847]"
During handling of the above exception, another exception occurred:

"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.198 [pywrapper.cpp:95][CANNKB][Tid:847]"Traceback (most recent call last):
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.199 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/interface.py", line 42, in cann_kb_finalize
    RouteServer.finalize()
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.201 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 54, in wrapper
    return func(cls, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.202 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/route.py", line 208, in finalize
    cls.cann_kb_mgr.finalize()
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.203 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/cann_kb_manager/knowledge_bank_manager.py", line 42, in finalize
    self.__static_mgr.close()
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.205 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/Ascend/ascend-toolkit/latest/python/site-packages/tbe/common/repository_manager/cann_kb_manager/base_manager.py", line 339, in close
    for name in copy.deepcopy(self.write_kb.keys()):
                              ^^^^^^^^^^^^^^^^^^^^
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.206 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "<string>", line 2, in keys
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.207 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/managers.py", line 818, in _callmethod
    self._connect()
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.209 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/managers.py", line 805, in _connect
    conn = self._Client(self._token.address, authkey=self._authkey)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.210 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/connection.py", line 519, in Client
    c = SocketClient(address)
        ^^^^^^^^^^^^^^^^^^^^^
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.211 [pywrapper.cpp:95][CANNKB][Tid:847]"  File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/connection.py", line 647, in SocketClient
    s.connect(address)
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.883.213 [pywrapper.cpp:95][CANNKB][Tid:847]"ConnectionRefusedError: [Errno 111] Connection refused
"
[ERROR] TUNE(847,):2025-11-21-02:10:21.884.165 [py_interface.cpp:200][CANNKB][Tid:847]"Failed to call cann_kb_finalize."
[ERROR] TUNE(847,):2025-11-21-02:10:21.884.170 [cann_kb_api.cpp:40][CANNKB][Tid:847]"Run CannKbFinalize Error!"
[ERROR] TEFUSION(847,):2025-11-21-02:10:21.884.176 [python_adapter_manager.cc:235]847 Finalize call CannKbFinalize failed. res = [6].
```

after removed `--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' ` it will work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Qwen3-moe FULL_DECODE_ONLY inference crash #4326

Your current environment

🐛 Describe the bug

start script

server log

one maybe usefull plog:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Qwen3-moe FULL_DECODE_ONLY inference crash #4326

Description

Your current environment

🐛 Describe the bug

start script

server log

one maybe usefull plog:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions