Commit a5025a2
authored
Fix BMM style MoE export in fp8_pc_pt recipe (#515)
## What does this PR do?
**Type of change:** Bug fix
**Overview:** The Llama-4-Scout-17B-16E-Instruct model uses
Llama4TextExperts, which stores expert weights in a BMM (batch matrix
multiply) layout: (num_experts, input_dim, output_dim). This is
different from standard MoE models. The FP8_PC_PT (FP8 per-channel
per-token) quantization code didn't handle this layout properly, causing
shape mismatches.
## Usage
<!-- You can potentially add a usage example below. -->
```python
python3 hf_ptq.py --pyt_ckpt_path /home/scratch.omniml_data_2/models/Llama-4-Scout-17B-16E-Instruct --qformat fp8_pc_pt --export_path /home/scratch.omniml_data_2/zhiyuc/checkpoints/llama4-scout-fp8_pc_pt --trust_remote_code
```
## Testing
<!-- Mention how have you tested your change if applicable. -->
## Before your PR is "*Ready for review*"
<!-- If you haven't finished some of the above items you can still open
`Draft` PR. -->
- **Make sure you read and follow [Contributor
guidelines](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CONTRIBUTING.md)**
and your commits are signed.
- **Is this change backward compatible?**: Yes <!--- If No, explain why.
-->
- **Did you write any new necessary tests?**: No
- **Did you add or update any necessary documentation?**: No
- **Did you update
[Changelog](https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/CHANGELOG.rst)?**:
No <!--- Only for new features, API changes, critical bug fixes or bw
breaking changes. -->
## Additional Information
<!-- E.g. related issue. -->
---------
Signed-off-by: Zhiyu Cheng <[email protected]>1 parent 93f5bbf commit a5025a2
2 files changed
+55
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
779 | 779 | | |
780 | 780 | | |
781 | 781 | | |
782 | | - | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
783 | 812 | | |
784 | 813 | | |
785 | 814 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
50 | 50 | | |
51 | 51 | | |
52 | 52 | | |
| 53 | + | |
53 | 54 | | |
54 | 55 | | |
55 | 56 | | |
| |||
327 | 328 | | |
328 | 329 | | |
329 | 330 | | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
330 | 337 | | |
331 | 338 | | |
332 | 339 | | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | 340 | | |
338 | 341 | | |
339 | 342 | | |
| |||
354 | 357 | | |
355 | 358 | | |
356 | 359 | | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
357 | 378 | | |
358 | 379 | | |
359 | 380 | | |
| |||
0 commit comments