-
Notifications
You must be signed in to change notification settings - Fork 629
[Doc] Refactor the DeepSeek-V3.1 tutorial. #4399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a comprehensive tutorial for deploying the DeepSeek-V3.1 model. While the document covers various deployment scenarios, I've found several critical errors in the provided code snippets and configurations, particularly for multi-node and prefill-decode disaggregation setups. These issues, including Python syntax errors, incorrect data parallel configurations, and inconsistent model naming, would likely prevent users from successfully following the instructions. My review provides specific corrections to address these critical problems and improve the tutorial's accuracy and usability.
74cdeb3 to
16d672f
Compare
| local_ip="xxxx" | ||
|
|
||
| # [Optional] jemalloc | ||
| # if `libjemalloc.so` is install on your machine, you can turn it on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jemalloc is for better performance, please add some description, otherwise may be a little confused. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have added some description: “jemalloc is for better performance, if libjemalloc.so is install on your machine, you can turn it on.”
| ### Model Weight | ||
| - `DeepSeek-V3.1`(BF16 version): [Download model weight](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1) | ||
| - `DeepSeek-V3.1-w8a8`(Quantized version): [Download model weight](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8). Note: modify `torch_dtype` from `float16` to `bfloat16` in `config.json`. | ||
| - Method of Quantify: [DeepSeek-V3.1 W8A8+MTP](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepSeek-V3.1 W8A8+MTP seems not having a available download url. It's better to upload to modelscope or other platform, since you mention DeepSeek-V3.1 W8A8+MTP as below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, we don't have mtp weights on modelscope, so i put a method of quantify here, maybe i should add more details.
| export VLLM_ASCEND_ENABLE_FLASHCOMM1=0 | ||
| export DISABLE_L2_CACHE=1 | ||
|
|
||
| vllm serve vllm-ascend/DeepSeek-V3.1_w8a8mix_mtp \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, if you use xxx/xxx as a model name, vllm will search it from the huggingface (or if you set VLLM_USE_MODELSCOPE, vllm will search from the modelscope), the vllm-ascend/xxx usually indicate it is from our modelscope vllm-ascend published models, so better change it to a local path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
| export VLLM_USE_V1=1 | ||
| export HCCL_BUFFSIZE=200 | ||
| export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True | ||
| export VLLM_ASCEND_ENABLE_MLAPO=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangxiyuan The VLLM_ASCEND_ENABLE_MLAPO=1 is also needed in DeepSeek-V3.1? And i am not sure it is ok for this, since i remember it caused some issue in 0.11.0rc1 DeepSeek-V3.2-Exp.
| --gpu-memory-utilization 0.92 \ | ||
| --speculative-config '{"num_speculative_tokens": 1, "method": "deepseek_mtp"}' \ | ||
| --compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \ | ||
| --additional-config '{"ascend_scheduler_config":{"enabled":false},"torchair_graph_config":{"enabled":false}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ascend schedular is ready to be dropped in main. Refer to this #4498
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
down
9ce8f7b to
208dcae
Compare
712ae28 to
8bb9393
Compare
8bb9393 to
5b7511a
Compare
Signed-off-by: 1092626063 <[email protected]>
5b7511a to
fed65de
Compare
|
LGTM, thanks for your contribution! |
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <[email protected]>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <[email protected]> Signed-off-by: Che Ruan <[email protected]>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <[email protected]> Signed-off-by: Che Ruan <[email protected]>
### What this PR does / why we need it? Refactor the DeepSeek-V3.1 tutorial. - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 Signed-off-by: 1092626063 <[email protected]>
What this PR does / why we need it?
Refactor the DeepSeek-V3.1 tutorial.
Does this PR introduce any user-facing change?
How was this patch tested?