Skip to content

Conversation

@MaoZiming
Copy link
Member

Description

Please include a summary of the changes and the related issue.

Fixes # (issue)

Type of Change

  • Bug fix
  • New feature
  • Documentation update

How Has This Been Tested?

Include any tests here.

  • Unit tests
  • Integration tests
  • Manual testing

Checklist

  • My code follows the style guidelines, e.g. format.sh.
  • I have run build_and_install.sh to verify compilation.
  • I have removed redundant variables and comments.
  • I have updated the documentation.
  • I have added tests.

@MaoZiming
Copy link
Member Author

MaoZiming commented Nov 13, 2025

@YangZhou1997
EP=32 dispatch is around 350us, Combine is around 580us.
I think there is not much room for combine, the recv_rdma_buffer has token_slot from different sender experts on different ranks. The combine latency seems a bit better.

@MaoZiming
Copy link
Member Author

MaoZiming commented Nov 13, 2025

EP=32
image

@MaoZiming
Copy link
Member Author

EP=16:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants