-
Notifications
You must be signed in to change notification settings - Fork 624
[Doc] Add single NPU tutorial for Qwen2.5-Omni-7B #4446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new tutorial for running Qwen2.5-Omni-7B on a single NPU. The documentation is well-structured, covering both offline inference and online serving. I've identified a missing dependency installation step that would prevent the offline inference example from running and have provided a suggestion to fix it.
|
|
||
| Run the following script to execute offline inference on a single NPU: | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python script for offline inference uses qwen_vl_utils.process_vision_info, but the qwen_vl_utils package is not installed in the Docker container by default. This will cause an ImportError when running the script. Please add a step to install this package.
| pip install qwen_vl_utils --extra-index-url https://download.pytorch.org/whl/cpu/ |
598628d to
d56cd51
Compare
|
|
||
| Qwen2.5-Omni is an end-to-end multimodal model designed to perceive diverse modalities, including text, images, audio, and video, while simultaneously generating text and natural speech responses in a streaming manner. | ||
|
|
||
| This document will show the main verification steps of the model, including supported features, feature configuration, environment preparation, single-node and multi-node deployment, accuracy and performance evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add this model's first supported version, like "The DeepSeek-V3.1 model is first supported in vllm-ascend:v0.9.1rc3"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
|
|
||
| You can using our official docker image, v0.11.0 and later version of vllm-ascend supports Qwen2.5-Omni. | ||
|
|
||
| :::{note} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check this note, only aarch64 supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked, sorry for the wrong info.
|
|
||
| In addition, if you don't want to use the docker image as above, you can also build all from source: | ||
|
|
||
| - Install `vllm-ascend` from source, refer to [installation](../installation.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build from source i think you can just delete it. Since if one want to build from source, he must have some experiences and should check the installation page. Docker image can make the simple way for inexperienced person. And the tab code you can delete it also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
| ::::{tab-item} A3&A2 series | ||
| :sync: A3&A2 | ||
|
|
||
| Start the docker image on your node, refer to [using docker](../installation.md#set-up-using-docker). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provide docker run command directly, like "#4399"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added.
Signed-off-by: Ting FU <[email protected]>
d56cd51 to
3d51f63
Compare
|
/lgtm |
### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <[email protected]>
### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <[email protected]> Signed-off-by: Che Ruan <[email protected]>
### What this PR does / why we need it? Add single NPU tutorial for Qwen2.5-Omni-7B - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: Ting FU <[email protected]> Signed-off-by: Che Ruan <[email protected]>
What this PR does / why we need it?
Add single NPU tutorial for Qwen2.5-Omni-7B
Does this PR introduce any user-facing change?
No
How was this patch tested?