-
Notifications
You must be signed in to change notification settings - Fork 632
[Model] Support pooling models #3122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for pooling models in vllm-ascend by adapting changes from the upstream vllm project. The changes include modifying attention mask generation for non-causal masks, adding a new attention path for encoder-only models, and updating the model runner to handle pooling models and encoder-only attention specifications. The implementation is largely correct, but I've identified a critical issue in vllm_ascend/worker/model_runner_v1.py where a method call is missing a required argument, which would cause a runtime error.
722558f to
af4926f
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
af4926f to
344d14d
Compare
344d14d to
4100a6c
Compare
95f54d7 to
27d9a92
Compare
|
can we add some e2e and example for the model |
OK, but since I have no relevant experience before, I would like to ask which test scenarios this feature needs to cover. At present, I think it is necessary to write test cases in the |
27d9a92 to
f0829a4
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
f0829a4 to
b7eddd1
Compare
74349c6 to
dd092f7
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
dd092f7 to
3ebfb2e
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
5b65b0b to
14d4196
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: lianyibo <[email protected]>
Signed-off-by: lianyibo <[email protected]>
* move pooling model in the first if branch in forward * refactor the pooling-params and pooling-attnmetadata to align with vllm Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: lianyibo <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
ba28f9e to
433ec3a
Compare
Signed-off-by: MengqingCao <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: lianyibo <[email protected]>
What this PR does / why we need it?
Support pooling models (like
bge-reranker-v2-m3) in vllm-ascend, this pr covered the three model types of embed (cls_token, mean_token, lasttoken).After this commit, vllm has provided support for adapting pooling models on the v1 engine. This PR includes corresponding adaptations on the vllm-ascend side.
Fixes #1960
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Taking
BAAI/bge-reranker-v2-m3as an example.Perf test
Scripts
Results
Before this pr
After this pr