-
Notifications
You must be signed in to change notification settings - Fork 31.1k
fix(granitemoe*): Only create block_sparse_moe if num_local_experts > 0 #42036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Looks like I need to regenerate the |
|
Interestingly, when I run Looking a little deeper, this seems to be caused by the inheritance from The part that I'm still confused about is why regenerating the |
|
It appears that putting creation of |
src/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py
Outdated
Show resolved
Hide resolved
e1e87db to
be876f8
Compare
|
🤦 Ok, all that was because of a bad copy-paste somewhere that had me creating a completely incorrect block for |
be876f8 to
1257ced
Compare
…erts > 0 Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <[email protected]>
Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <[email protected]>
1257ced to
f0cebbb
Compare
|
One more redo: Based on advice from @ArthurZucker, since there are no models using either |
| return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype) | ||
|
|
||
|
|
||
| class GraniteFlashAttentionKwargs(TypedDict, total=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these just got moved during the regeneration. I'm not sure if should be included (to enforce consistency with the generation script) or excluded (to minimize the size of the change).
|
Hi @gabe-l-hart, thanks for the PR! You can get the code style tests to pass with Overall it looks good to me, and it does seem like the zero-experts case was accidentally deleted. Will wait for @ArthurZucker to confirm before merging! |
|
@Rocketknight1 Thanks! I'll get it cleaned up and hopefully green today |
Branch: GraniteMoeAsDenseFix Signed-off-by: Gabe Goodhart <[email protected]>
|
[For maintainers] Suggested jobs to run (before merge) run-slow: granitemoehybrid |
Branch: GraniteMoeAsDenseFix
What does this PR do?
With the introduction of
modular_granitemoe.pyin #40132, the conditional that allowedGraniteMoeto also encapsulate dense models as a degenerate case was accidentally removed. This is never actually needed for theGraniteMoearchitecture directly, butGraniteMoeis reused inGraniteMoeSharedand thenGraniteMoeHybridwhich do need this ability to also encapsulate dense FFN blocks in place of the MoE block.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker I believe this came in with your PR for MoE in vLLM, so I'd love your sanity check on this fix.