-
Notifications
You must be signed in to change notification settings - Fork 624
[Doc] Add Qwen3-235B tutorial #4358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new tutorial for running the Qwen3-235B model. The documentation is well-structured and provides good detail. I've found a couple of critical typos in model names within commands that would cause them to fail, and a potentially confusing or incorrect configuration for cudagraph_capture_sizes. I've left specific comments with suggestions to fix these issues.
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
| --gpu-memory-utilization 0.95 \ | ||
| --rope-scaling '{"rope_type":"yarn","factor":4,"original_max_position_embeddings":32768}' \ | ||
| --additional-config '{"ascend_scheduler_config":{"enabled":false}}' \ | ||
| --compilation-config '{"cudagraph_capture_sizes":[1,4],"cudagraph_mode":"FULL_DECODE_ONLY"}' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example we provided represents the best practice under normal circumstances: optimal performance under stable operating conditions. Is this Capature size value a bit too small?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cudagraph_capture_sizes is set depending on the -max-num-seqs 4. This is an optimal example for 128k sequence inference.
| --quantization ascend \ | ||
| --served-model-name qwen3 \ | ||
| --max-num-seqs 4 \ | ||
| --max-model-len 133000 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of max-model-len should been 131072. I try to run this command but got following error:
(APIServer pid=598) File "/vllm-workspace/vllm/vllm/engine/arg_utils.py", line 994, in create_model_config
(APIServer pid=598) return ModelConfig(
(APIServer pid=598) ^^^^^^^^^^^^
(APIServer pid=598) File "/usr/local/python3.11.13/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=598) s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=598) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=598) Value error, User-specified max_model_len (133000) is greater than the derived max_model_len (max_position_embeddings=131072 or model_max_length=None in model's config.json). To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1. VLLM_ALLOW_LONG_MAX_MODEL_LEN must be used with extreme caution. If the model uses relative position encoding (RoPE), positions exceeding derived_max_model_len lead to nan. If the model uses absolute position encoding, positions exceeding derived_max_model_len will cause a CUDA array out-of-bounds error. [type=value_error, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you add this parameter --rope-scaling '{"rope_type":"yarn","factor":4,"original_max_position_embeddings":32768}'
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
Signed-off-by: xuyexiong <[email protected]>
What this PR does / why we need it?
Add Qwen3-235B tutorial including the following examples
Does this PR introduce any user-facing change?
How was this patch tested?