-
Notifications
You must be signed in to change notification settings - Fork 644
[cherry-pick]Upgrade CANN to 8.3.rc1 (#3945) #3962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cherry-pick]Upgrade CANN to 8.3.rc1 (#3945) #3962
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
452b4a0 to
922c8b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request upgrades the CANN dependency from 8.2.rc1 to 8.3.rc1. The changes primarily involve updating version strings in Dockerfiles, documentation, and configuration files. Additionally, conditional logic based on the CANN version has been removed from the Python source code, making the 8.3-specific logic the default. While most changes are straightforward, I've identified an issue in one of the unit tests that appears to have been modified incorrectly, which could lead to it not properly testing the intended functionality.
| self.assertEqual(attn_mask[0][-1], torch.tensor(True)) | ||
| self.assertEqual(attention_mask_builder._seq_len_cached, 1024) | ||
| self.assertEqual(attention_mask_builder.attn_mask_cache.shape, | ||
| (2048, 2048)) | ||
| (1024, 1024)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertions in this part of the test appear to be incorrect. The get_attn_mask method calls _update_attn_cache, which should update _seq_len_cached to 2048 when called with max_seq_len=2048 (since it was initialized with 1024). However, the test asserts that _seq_len_cached remains 1024. This contradicts the implementation of _update_attn_cache and the comment on line 73.
Additionally, attn_mask[0][-1] is asserted to be torch.tensor(True). For dtype=torch.float16, _generate_attn_mask creates a mask with float("-inf"). This assertion also seems incorrect.
If _update_attn_cache does not update as the test suggests, get_attn_mask would try to slice [:2048, :2048] from a (1024, 1024) tensor, which would raise an error. The test seems to be asserting an impossible state. It appears the original test logic was correct and should be restored.
self.assertEqual(attn_mask[0][-1],
torch.tensor(float("-inf"), dtype=torch.float16))
self.assertEqual(attention_mask_builder._seq_len_cached, 2048)
self.assertEqual(attention_mask_builder.attn_mask_cache.shape,
(2048, 2048))94f8ff1 to
d21af27
Compare
Signed-off-by: wangxiyuan <[email protected]>
d21af27 to
338f3b4
Compare
This PR upgrade CANN from 8.2rc1 to 8.3rc1 and remove the CANN version check logic.
TODO: we notice that UT runs failed with CANN 8.3 image. So the base image for UT is still 8.2. We'll fix it later.