-
Notifications
You must be signed in to change notification settings - Fork 624
opti profiler default param #4609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates the default aic_metrics for the torch NPU profiler to PipeUtilization from AiCoreNone. While enabling more detailed metrics by default is useful for profiling, hardcoding this value could cause issues on hardware where it's not supported or for users who wish to minimize profiling overhead. I have added a comment suggesting to make this parameter configurable via an environment variable to increase flexibility and robustness.
vllm_ascend/worker/worker_v1.py
Outdated
| profiler_level=torch_npu.profiler.ProfilerLevel.Level1, | ||
| msprof_tx=False, | ||
| aic_metrics=torch_npu.profiler.AiCMetrics.AiCoreNone, | ||
| aic_metrics=torch_npu.profiler.AiCMetrics.PipeUtilization, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding aic_metrics to PipeUtilization could be problematic. This metric might not be supported on all hardware versions, or some users might prefer to disable it to minimize profiling overhead. It would be more robust to make this configurable via an environment variable (e.g., VLLM_ASCEND_PROFILER_AIC_METRICS defined in vllm_ascend/envs.py). This would allow users to easily switch between different metrics like PipeUtilization and AiCoreNone based on their needs.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
f85dfde to
0c99fa8
Compare
0c99fa8 to
c6b6cc0
Compare
| # 6: torch_npu.profiler.AiCMetrics.MemoryUB; | ||
| # 7: torch_npu.profiler.AiCMetrics.L2Cache; | ||
| # 8: torch_npu.profiler.AiCMetrics.MemoryAccess; | ||
| # If not set, it will be torch_npu.profiler.AiCMetrics.PipeUtilization by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide examples of profilings with AiCMetrics.PipeUtilization enabled and disabled to show the performance degradation introduced by this metric.
Signed-off-by: zzzzwwjj <[email protected]>
c6b6cc0 to
8d7d93a
Compare
What this PR does / why we need it?
Set profiler's param
aic_metricstotorch_npu.profiler.AiCMetrics.PipeUtilizationby default.This parameter can obtain more information for op.
Does this PR introduce any user-facing change?
How was this patch tested?