Your current environment
The output of `python collect_env.py`
Your output of above commands here
🐛 Describe the bug
In certain NPU scenarios, os.sched_yield() can cause severe CPU host-bound performance issues (e.g., 800IA2, Qwen2.5-VL-72B W8A8 TP4 scenarios). Currently, you can avoid this problem by making the following manual modifications:
In the vllm/distributed/utils.py file within the vllm project, replace the implementation of the def sched_yield() function:
# Original implementation:
def sched_yield():
if USE_SCHED_YIELD:
os.sched_yield()
else:
time.sleep(0)
# Modified implementation:
def sched_yield():
time.sleep(0)