add Q6 support #94

lihaoyang-amd · 2025-06-13T16:10:29Z

Support Q6
Optimize the conversion of pack_add, etc., the conversion address appears redundant, using union is more concise and safe;
code:

  nv_bfloat162* tA = reinterpret_cast<nv_bfloat162*>(&a);
  nv_bfloat162* tB = reinterpret_cast<nv_bfloat162*>(&b);
  nv_bfloat162 tR = __hmax2(*tA, *tB);
  return *(reinterpret_cast<int*>(&tR));
->
  bf162_int_union A, B, R;
  A.i = a;
  B.i = b;
  R.bf2 = __hmax2(A.bf2, B.bf2);
  return R.i;

cast float function overloading; we do not need to consider expanding; overloading to achieve a more concise, no template overhead, compilation will be faster!
code:

__quickreduce_device_inline__ float T2float_cast(half a) {
  return __half2float(a);
}

__quickreduce_device_inline__ float T2float_cast(nv_bfloat16 a) {
  return __bfloat162float(a);
}

Removed some unneeded functions

T2uchar_cast
T2int_cast
group_max_min

Signed-off-by: Haoyang Li <[email protected]>

github-actions · 2025-06-13T16:10:51Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Haoyang Li <[email protected]>

csrc/quickreduce/quick_reduce_impl.cuh

Signed-off-by: Haoyang Li <[email protected]>

add Q6 support

a95c86d

Signed-off-by: Haoyang Li <[email protected]>

Adjusted to static constexpr int

e98dd47

Signed-off-by: Haoyang Li <[email protected]>

lihaoyang-amd marked this pull request as ready for review June 13, 2025 16:22

lihaoyang-amd added 2 commits June 13, 2025 17:21

Remove useless functions

0043ffe

Signed-off-by: Haoyang Li <[email protected]>

fix max size err

87f42ec

Signed-off-by: Haoyang Li <[email protected]>

ilmarkov reviewed Jun 16, 2025

View reviewed changes

csrc/quickreduce/quick_reduce_impl.cuh Outdated Show resolved Hide resolved

csrc/quickreduce/quick_reduce_impl.cuh Show resolved Hide resolved

adjust for comments

1bb757f

Signed-off-by: Haoyang Li <[email protected]>

ilmarkov merged commit 06af4d3 into neuralmagic:experimental/quick_reduce Jun 16, 2025
2 checks passed

lihaoyang-amd deleted the lhy/add_Q6 branch June 30, 2025 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Q6 support #94

add Q6 support #94

Uh oh!

lihaoyang-amd commented Jun 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add Q6 support #94

add Q6 support #94

Uh oh!

Conversation

lihaoyang-amd commented Jun 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lihaoyang-amd commented Jun 13, 2025 •

edited by github-actions bot

Loading