Skip to content

Conversation

@esmirno
Copy link
Contributor

@esmirno esmirno commented Oct 16, 2025

Details:

  • Applied specific LPT passes to decompose FakeConvert layer, and leaving kv-cache in fp8 precision.
  • performance regressions will be addressed in upcoming work.

Tickets:

  • E-186663

@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 16, 2025
@esmirno esmirno changed the title initial iteration of lpt8 integration of NPUW initial iteration of lpt8 integration into NPUW Oct 16, 2025
@dmatveev dmatveev added this to the 2026.0 milestone Oct 31, 2025
@esmirno esmirno marked this pull request as ready for review November 13, 2025 13:46
@esmirno esmirno requested review from a team as code owners November 13, 2025 13:46
if (fcTypesInput.empty() || !fcTypesRemained.empty()) {
LOG_WARN("FakeConvert layers not decomposed - leaving kv-cache in " << kv_kache_storage_type
<< " precision");
} else if (fcTypesInput.size() > 1) {
Copy link
Contributor Author

@esmirno esmirno Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if LPT passes applied and different precision detected - consider change message + leaving kv-cache in " << kv_kache_storage_type

Copy link
Contributor Author

@esmirno esmirno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbd

@esmirno esmirno changed the title initial iteration of lpt8 integration into NPUW [NPUW] integration of LPT-fp8 passes for optimizing fp8 kv-cache behavior Nov 13, 2025
@dmatveev
Copy link
Contributor

@AsyaPronina please have a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants