[Doc] Add single NPU tutorial for Qwen2.5-Omni-7B #4446

Semmer2 · 2025-11-26T02:00:44Z

What this PR does / why we need it?

Add single NPU tutorial for Qwen2.5-Omni-7B

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-26T02:00:53Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds a new tutorial for running Qwen2.5-Omni-7B on a single NPU. The documentation is well-structured, covering both offline inference and online serving. I've identified a missing dependency installation step that would prevent the offline inference example from running and have provided a suggestion to fix it.

gemini-code-assist · 2025-11-26T02:02:17Z

docs/source/tutorials/single_npu_qwen2.5_omni.md

+
+Run the following script to execute offline inference on a single NPU:
+
+


The Python script for offline inference uses qwen_vl_utils.process_vision_info, but the qwen_vl_utils package is not installed in the Docker container by default. This will cause an ImportError when running the script. Please add a step to install this package.

Suggested change

pip install qwen_vl_utils --extra-index-url https://download.pytorch.org/whl/cpu/

1092626063 · 2025-11-28T09:17:21Z

docs/source/tutorials/Qwen2.5-Omni.md

+
+Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.
+
+This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.


It's better to add this model's first supported version, like "The DeepSeek-V3.1 model is first supported in vllm-ascend:v0.9.1rc3"

1092626063 · 2025-11-28T09:23:39Z

docs/source/tutorials/Qwen2.5-Omni.md

+
+You can using our official docker image, v0.11.0 and later version of vllm-ascend supports Qwen2.5-Omni.
+
+:::{note}


please check this note, only aarch64 supported?

Checked, sorry for the wrong info.

1092626063 · 2025-11-28T09:25:45Z

docs/source/tutorials/Qwen2.5-Omni.md

+
+In addition, if you don't want to use the docker image as above, you can also build all from source:
+
+- Install `vllm-ascend` from source, refer to [installation](../installation.md).


Build from source i think you can just delete it. Since if one want to build from source, he must have some experiences and should check the installation page. Docker image can make the simple way for inexperienced person. And the tab code you can delete it also.

1092626063 · 2025-11-28T09:27:01Z

docs/source/tutorials/Qwen2.5-Omni.md

+::::{tab-item} A3&A2 series
+:sync: A3&A2
+
+Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker).


provide docker run command directly, like "#4399"

Sure, added.

Signed-off-by: Ting FU <[email protected]>

1092626063 · 2025-11-29T01:13:18Z

/lgtm

### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <[email protected]>

### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <[email protected]> Signed-off-by: Che Ruan <[email protected]>

github-actions bot added the documentation Improvements or additions to documentation label Nov 26, 2025

gemini-code-assist bot reviewed Nov 26, 2025

View reviewed changes

Semmer2 force-pushed the QwenOmniDocMain branch 7 times, most recently from 598628d to d56cd51 Compare November 28, 2025 07:35

wangxiyuan approved these changes Nov 28, 2025

View reviewed changes

1092626063 reviewed Nov 28, 2025

View reviewed changes

[Doc] Add tutorial doc for Qwen2.5-Omni

3d51f63

Signed-off-by: Ting FU <[email protected]>

Semmer2 force-pushed the QwenOmniDocMain branch from d56cd51 to 3d51f63 Compare November 28, 2025 11:46

MengqingCao approved these changes Nov 29, 2025

View reviewed changes

MengqingCao merged commit b747c95 into vllm-project:main Nov 29, 2025
17 checks passed


		Run the following script to execute offline inference on a single NPU:


	pip install qwen_vl_utils --extra-index-url https://download.pytorch.org/whl/cpu/


		Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner.

		This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation.


		You can using our official docker image, v0.11.0 and later version of vllm-ascend supports Qwen2.5-Omni.

		:::{note}


		In addition, if you don't want to use the docker image as above, you can also build all from source:

		- Install `vllm-ascend` from source, refer to [installation](../installation.md).

[Doc] Add single NPU tutorial for Qwen2.5-Omni-7B #4446

[Doc] Add single NPU tutorial for Qwen2.5-Omni-7B #4446

Uh oh!

Conversation

Semmer2 commented Nov 26, 2025 • edited by MengqingCao Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1092626063 commented Nov 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Semmer2 commented Nov 26, 2025 •

edited by MengqingCao

Loading