-
Notifications
You must be signed in to change notification settings - Fork 60
Adding ccl_enabled flag during model loading and passing CCL lists during compilation process #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@vjanfaza , Can you please resolve the conflicts on the PR and run lint/format checks? |
I resolved the conflicts and pushed the changes. |
…ring compilation process Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process Signed-off-by: Vahid Janfaza <[email protected]>
…ring compilation process Signed-off-by: Vahid Janfaza <[email protected]>
…27b.yaml Signed-off-by: vjanfaza <[email protected]>
…4b.yaml Signed-off-by: vjanfaza <[email protected]>
| comp_ctx_lengths_prefill = [256, 512, ctx_len] | ||
| comp_ctx_lengths_decode = [256, 512, ctx_len] | ||
| # In moe models when compiling with prefill_seq_len=1 and non-continuous-batching mode, prefill and decode will share the same ccl specializations. | ||
| comp_ctx_lengths_prefill = [256, 512, ctx_len] # None # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; please remove the #None # at the end of this line from other places/files as well.
|
|
||
| model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507" | ||
| """ | ||
| # For CB inference, set continuous_batching to True and add full_batch_size,mxfp6,mint8 argument in compile function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, should be mxint8 not mint8
| comp_ctx_lengths_prefill=comp_ctx_lengths_prefill, | ||
| comp_ctx_lengths_decode=comp_ctx_lengths_decode, | ||
| ) | ||
| # mos=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this line.
| processor=processor, | ||
| images=image_urls, | ||
| generation_len=100, | ||
| device_ids=[28, 29, 30, 31], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make these as [0,1,2,3]
| inputs["pixel_values"] = inputs["pixel_values"].to(torch.float32) | ||
| streamer = TextStreamer(tokenizer) | ||
| output = qeff_model.generate(inputs=inputs, device_ids=[0, 1, 2, 3], generation_len=100) | ||
| output = qeff_model.generate(inputs=inputs, device_ids=[8, 9, 10, 11], generation_len=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be kept as original.
In these changes, instead of passing CCL lists during model loading, I passed a flag called ccl_enabled to specify whether CCL feature is enabled or not and moved passing CCL lists to compilation process.