Commit 6b705eb
Fix memory consumption issue with quantized Gemini Nano2 models on CPU (#32149)
### Details:
Problem:
Quantized models (i8/fp16 weights) were consuming excessive memory (up
to 90GB) due to
ConstantFolding transformation converting compressed weights to fp32.
Root cause:
1. EinsumDecomposition was not called in CPU pipeline before
MarkDequantization
2. MarkDequantization couldn't recognize decompression patterns with
Einsum operations
3. DisableDecompressionConvertConstantFolding was disabled, allowing
unwanted conversions
Solution:
1. Add EinsumDecomposition to decompression_handling_manager before
MarkDequantization
This allows proper pattern recognition for Einsum operations
2. Keep DisableDecompressionConvertConstantFolding enabled (comment out
the disable line)
This preserves the protection against unwanted constant folding
Transformation pipeline flow:
Before fix:
MarkDequantization -> [Einsum blocks pattern] -> ConstantFolding
converts to fp32
After fix:
EinsumDecomposition -> MarkDequantization -> [Pattern recognized] ->
Constants preserved
Test results on einsum_model_with_fp16_i8:
- Before: constants converted to fp32 (4x memory increase for i8)
- After: constants remain in i8 format (1057 MB memory usage)
Both changes are required - applying only one results in incorrect
behavior.
### Tickets:
- 165827
---------
Co-authored-by: Mikhail Ryzhov <[email protected]>1 parent bbf3f96 commit 6b705eb
File tree
1 file changed
+5
-2
lines changed- src/plugins/intel_cpu/src/transformations
1 file changed
+5
-2
lines changedLines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| 116 | + | |
116 | 117 | | |
117 | 118 | | |
118 | 119 | | |
| |||
462 | 463 | | |
463 | 464 | | |
464 | 465 | | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
465 | 470 | | |
466 | 471 | | |
467 | 472 | | |
| |||
830 | 835 | | |
831 | 836 | | |
832 | 837 | | |
833 | | - | |
834 | | - | |
835 | 838 | | |
836 | 839 | | |
837 | 840 | | |
| |||
0 commit comments