Skip to content

Conversation

@vjanfaza
Copy link
Contributor

In these changes, instead of passing CCL lists during model loading, I passed a flag called ccl_enabled to specify whether CCL feature is enabled or not and moved passing CCL lists to compilation process.

@quic-mamta
Copy link
Contributor

@vjanfaza , Can you please resolve the conflicts on the PR and run lint/format checks?

@vjanfaza
Copy link
Contributor Author

vjanfaza commented Nov 20, 2025

@vjanfaza , Can you please resolve the conflicts on the PR and run lint/format checks?

I resolved the conflicts and pushed the changes.

@vjanfaza vjanfaza reopened this Nov 24, 2025
comp_ctx_lengths_prefill = [256, 512, ctx_len]
comp_ctx_lengths_decode = [256, 512, ctx_len]
# In moe models when compiling with prefill_seq_len=1 and non-continuous-batching mode, prefill and decode will share the same ccl specializations.
comp_ctx_lengths_prefill = [256, 512, ctx_len] # None #
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; please remove the #None # at the end of this line from other places/files as well.


model_name = "Qwen/Qwen3-30B-A3B-Instruct-2507"
"""
# For CB inference, set continuous_batching to True and add full_batch_size,mxfp6,mint8 argument in compile function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, should be mxint8 not mint8

comp_ctx_lengths_prefill=comp_ctx_lengths_prefill,
comp_ctx_lengths_decode=comp_ctx_lengths_decode,
)
# mos=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this line.

processor=processor,
images=image_urls,
generation_len=100,
device_ids=[28, 29, 30, 31],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make these as [0,1,2,3]

inputs["pixel_values"] = inputs["pixel_values"].to(torch.float32)
streamer = TextStreamer(tokenizer)
output = qeff_model.generate(inputs=inputs, device_ids=[0, 1, 2, 3], generation_len=100)
output = qeff_model.generate(inputs=inputs, device_ids=[8, 9, 10, 11], generation_len=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be kept as original.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants