Skip to content

Conversation

@1092626063
Copy link
Contributor

@1092626063 1092626063 commented Nov 24, 2025

What this PR does / why we need it?

Refactor the DeepSeek-V3.1 tutorial.

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 24, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a comprehensive tutorial for deploying the DeepSeek-V3.1 model. While the document covers various deployment scenarios, I've found several critical errors in the provided code snippets and configurations, particularly for multi-node and prefill-decode disaggregation setups. These issues, including Python syntax errors, incorrect data parallel configurations, and inconsistent model naming, would likely prevent users from successfully following the instructions. My review provides specific corrections to address these critical problems and improve the tutorial's accuracy and usability.

@1092626063 1092626063 force-pushed the DeepSeek3.1 branch 4 times, most recently from 74cdeb3 to 16d672f Compare November 27, 2025 10:15
local_ip="xxxx"

# [Optional] jemalloc
# if `libjemalloc.so` is install on your machine, you can turn it on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jemalloc is for better performance, please add some description, otherwise may be a little confused. Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have added some description: “jemalloc is for better performance, if libjemalloc.so is install on your machine, you can turn it on.”

### Model Weight
- `DeepSeek-V3.1`(BF16 version): [Download model weight](https://www.modelscope.cn/models/deepseek-ai/DeepSeek-V3.1)
- `DeepSeek-V3.1-w8a8`(Quantized version): [Download model weight](https://www.modelscope.cn/models/Eco-Tech/DeepSeek-V3.1-w8a8). Note: modify `torch_dtype` from `float16` to `bfloat16` in `config.json`.
- Method of Quantify: [DeepSeek-V3.1 W8A8+MTP](https://gitcode.com/Ascend/msit/blob/master/msmodelslim/example/DeepSeek/README.md#deepseek-v31-w8a8-%E6%B7%B7%E5%90%88%E9%87%8F%E5%8C%96-mtp-%E9%87%8F%E5%8C%96)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepSeek-V3.1 W8A8+MTP seems not having a available download url. It's better to upload to modelscope or other platform, since you mention DeepSeek-V3.1 W8A8+MTP as below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we don't have mtp weights on modelscope, so i put a method of quantify here, maybe i should add more details.

export VLLM_ASCEND_ENABLE_FLASHCOMM1=0
export DISABLE_L2_CACHE=1

vllm serve vllm-ascend/DeepSeek-V3.1_w8a8mix_mtp \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, if you use xxx/xxx as a model name, vllm will search it from the huggingface (or if you set VLLM_USE_MODELSCOPE, vllm will search from the modelscope), the vllm-ascend/xxx usually indicate it is from our modelscope vllm-ascend published models, so better change it to a local path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

export VLLM_USE_V1=1
export HCCL_BUFFSIZE=200
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export VLLM_ASCEND_ENABLE_MLAPO=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangxiyuan The VLLM_ASCEND_ENABLE_MLAPO=1 is also needed in DeepSeek-V3.1? And i am not sure it is ok for this, since i remember it caused some issue in 0.11.0rc1 DeepSeek-V3.2-Exp.

--gpu-memory-utilization 0.92 \
--speculative-config '{"num_speculative_tokens": 1, "method": "deepseek_mtp"}' \
--compilation-config '{"cudagraph_mode": "FULL_DECODE_ONLY"}' \
--additional-config '{"ascend_scheduler_config":{"enabled":false},"torchair_graph_config":{"enabled":false}}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ascend schedular is ready to be dropped in main. Refer to this #4498

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

down

Signed-off-by: 1092626063 <[email protected]>
@menogrey
Copy link
Contributor

menogrey commented Dec 1, 2025

LGTM, thanks for your contribution!

@MengqingCao MengqingCao merged commit eabedf4 into vllm-project:main Dec 2, 2025
17 checks passed
ChenCangtao pushed a commit to ChenCangtao/vllm-ascend that referenced this pull request Dec 3, 2025
### What this PR does / why we need it?
Refactor the DeepSeek-V3.1 tutorial. 

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: 1092626063 <[email protected]>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 4, 2025
### What this PR does / why we need it?
Refactor the DeepSeek-V3.1 tutorial.

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: 1092626063 <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
Mercykid-bash pushed a commit to Mercykid-bash/vllm-ascend that referenced this pull request Dec 4, 2025
### What this PR does / why we need it?
Refactor the DeepSeek-V3.1 tutorial.

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: 1092626063 <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
Meihan-chen pushed a commit to Meihan-chen/vllm-ascend that referenced this pull request Dec 5, 2025
### What this PR does / why we need it?
Refactor the DeepSeek-V3.1 tutorial. 

- vLLM version: v0.11.2
- vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

Signed-off-by: 1092626063 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants