[NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior #32448

esmirno · 2025-10-16T13:13:36Z

Details:

Applied specific LPT passes to decompose FakeConvert layer, and leaving kv-cache in fp8 precision.
performance regressions will be addressed in upcoming work.

Tickets:

E-186663

esmirno · 2025-11-13T14:47:26Z

src/plugins/intel_npu/src/plugin/npuw/llm_compiled_model.cpp

+        if (fcTypesInput.empty() || !fcTypesRemained.empty()) {
+            LOG_WARN("FakeConvert layers not decomposed - leaving kv-cache in " << kv_kache_storage_type
+                                                                                << " precision");
+        } else if (fcTypesInput.size() > 1) {


what if LPT passes applied and different precision detected - consider change message + leaving kv-cache in " << kv_kache_storage_type

esmirno

tbd

dmatveev · 2025-11-18T15:07:54Z

@AsyaPronina please have a look

initial iteration of lpt8 integration of NPUW

e0f6e00

github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 16, 2025

esmirno changed the title ~~initial iteration of lpt8 integration of NPUW~~ initial iteration of lpt8 integration into NPUW Oct 16, 2025

small fixes

0903513

dmatveev added this to the 2026.0 milestone Oct 31, 2025

esmirno added 2 commits November 5, 2025 15:55

Merge branch 'master' into es/fp8-lpt-apply

03da682

clang-format fixes

2a40e10

esmirno marked this pull request as ready for review November 13, 2025 13:46

esmirno requested review from a team as code owners November 13, 2025 13:46

Merge branch 'master' into es/fp8-lpt-apply

c2e0d65

esmirno commented Nov 13, 2025

View reviewed changes

esmirno changed the title ~~initial iteration of lpt8 integration into NPUW~~ [NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior Nov 13, 2025

dmatveev assigned AsyaPronina Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior #32448

[NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior #32448

esmirno commented Oct 16, 2025 •

edited

Loading

Uh oh!

esmirno Nov 13, 2025 •

edited

Loading

Uh oh!

esmirno left a comment

Uh oh!

dmatveev commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior #32448

Are you sure you want to change the base?

[NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior #32448

Conversation

esmirno commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

esmirno Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

esmirno left a comment

Choose a reason for hiding this comment

Uh oh!

dmatveev commented Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

esmirno commented Oct 16, 2025 •

edited

Loading

esmirno Nov 13, 2025 •

edited

Loading