[Quantization] per tensor quantization kernel #42560

MekkCyber · 2025-12-02T15:05:54Z

What does this PR do?

Adds a simple kernel for per tensor quantization, where the matmul is done per blocks of 128x128, and the weights scales, and activation scales are expected to be scalars

…l-fp8

HuggingFaceDocBuilderDev · 2025-12-02T15:14:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Thanks a lot for adding this ! Just a minor comment

SunMarc · 2025-12-02T15:33:01Z

src/transformers/integrations/finegrained_fp8.py

    assert len(block_size) == 2
    block_n, block_k = block_size[0], block_size[1]

+    # if we have per-tensor quantization, we use 128x128 block size for tiled matmul multiplication
+    if block_n == B.shape[-2] and block_k == B.shape[-1]:
+        block_n = 128
+        block_k = 128
+


it doesn't make sense before to set blocks to something else than None when doing per tensor in the FP8Linear. Can we change that so that we fix it here also ?

SunMarc · 2025-12-02T15:34:19Z

src/transformers/integrations/finegrained_fp8.py

+    """Triton-accelerated function used to perform linear operations (dot
+    product) on input tensors `A` and `B` with block-wise quantization, and
+    store the result in output tensor `C`.
+    """


MekkCyber and others added 9 commits December 2, 2025 05:47

fix

425b7ec

style

924de16

Merge branch 'main' into fix-deqant-fp8

8a69f40

initial

e052da3

Merge remote-tracking branch 'upstream/fix-deqant-fp8' into use-kerne…

62c8601

…l-fp8

fix

fe3359d

comment

75f7e6f

style

1738ca0

Merge remote-tracking branch 'upstream/HEAD' into use-kernel-fp8

78c5459

MekkCyber requested review from ArthurZucker and SunMarc December 2, 2025 15:07

SunMarc approved these changes Dec 2, 2025

View reviewed changes

ArthurZucker approved these changes Dec 2, 2025

View reviewed changes

MekkCyber added 2 commits December 2, 2025 17:17

Merge remote-tracking branch 'upstream/HEAD' into use-kernel-fp8

2144e7c

fix

033e535

SunMarc merged commit 51c5a7a into main Dec 2, 2025
24 checks passed

SunMarc deleted the use-kernel-fp8 branch December 2, 2025 17:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Quantization] per tensor quantization kernel #42560

[Quantization] per tensor quantization kernel #42560

Uh oh!

MekkCyber commented Dec 2, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 2, 2025

Uh oh!

SunMarc left a comment •

edited

Loading

Uh oh!

SunMarc Dec 2, 2025 •

edited

Loading

Uh oh!

SunMarc Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[Quantization] per tensor quantization kernel #42560

[Quantization] per tensor quantization kernel #42560

Uh oh!

Conversation

MekkCyber commented Dec 2, 2025

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 2, 2025

Uh oh!

SunMarc left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

SunMarc left a comment •

edited

Loading

SunMarc Dec 2, 2025 •

edited

Loading