What's Changed
- Bump product version 2025.4 by @akladiev in #2620
- Fix CLEANUP_CACHE by @Wovchena in #2617
- [wwb] Add tests for mac/win to ci by @sbalandi in #2603
- xfail embed by @Wovchena in #2618
- Bump py-build-cmake from 0.4.3 to 0.5.0 by @dependabot[bot] in #2624
- Bump actions/upload-pages-artifact from 3.0.1 to 4.0.0 by @dependabot[bot] in #2625
- Bump optimum-intel[nncf] from 1.25.1 to 1.25.2 by @dependabot[bot] in #2613
- Bump optimum-intel from 1.25.1 to 1.25.2 in /tests/python_tests by @dependabot[bot] in #2614
- [GGUF] Fix Q4_1 accuracy by @wine99 in #2563
- [llm bench] Add support of arcee model by @sbalandi in #2636
- Warn about older transformers by @Wovchena in #2634
- Limited max GPU KV-cache considering max allocatable GPU memory size by @popovaan in #2633
- Bump actions/dependency-review-action from 4.7.1 to 4.7.2 by @dependabot[bot] in #2639
- [llm_bench] Override max_length to preserve max_new_tokens by @Wovchena in #2641
- Cache images by @Wovchena in #2629
- Test logging by @Wovchena in #2621
- [CMAKE] Fix samples installation by @mryzhov in #2649
- [JS] Add prettier and align eslint by @almilosz in #2631
- [WWB] Remove use_flash_attention_2 argument for phi4mm by @nikita-savelyevv in #2653
- Implement text embedding pipeline shape fix by @as-suvorov in #2449
- [CMAKE] Solve pybind targets conflict by @mryzhov in #2655
- [GHA] Disable Cacheopt tests on mac by @mryzhov in #2663
- Fix Coverity by @Wovchena in #2601
- Bump peft from 0.17.0 to 0.17.1 in /samples by @dependabot[bot] in #2658
- Bump peft from 0.17.0 to 0.17.1 in /tests/python_tests by @dependabot[bot] in #2660
- downgrade xgrammar version in master by @pavel-esir in #2668
- Extend chat template test models by @yatarkan in #2648
- [NPU]Enable chunk prefill for VLM. by @intelgaoxiong in #2657
- [JS] Add build NodeJS bindings into Manylinux 2_28 by @Retribution98 in #2537
- WWB empty_adapters mode by @likholat in #2671
- Bump actions/dependency-review-action from 4.7.2 to 4.7.3 by @dependabot[bot] in #2674
- Bump aquasecurity/trivy-action from 0.32.0 to 0.33.0 by @dependabot[bot] in #2677
- Bump langchain-core from 0.3.74 to 0.3.75 in /tests/python_tests by @dependabot[bot] in #2673
- Bump actions/download-artifact from 4.3.0 to 5.0.0 by @dependabot[bot] in #2676
- [OV JS] Add perfMetrics grammar getters & update docstrings by @almilosz in #2681
- Bump langchain-community from 0.3.27 to 0.3.29 in /tests/python_tests by @dependabot[bot] in #2680
- Bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #2685
- Fix StructuredOutputConfig pybind11-subgen signatures generation by @pavel-esir in #2669
- [llm_bench] Add start memory info by @sbalandi in #2686
- Updating KVCrush hyperparameters by @gopikrishnajha in #2678
- [Docs] Convert whisper as stateless in the quantization example by @nikita-savelyevv in #2690
- print genai version by @wgzintel in #2684
- Fix initializer for the sparse attention mode by @vshampor in #2689
- Add docs entry about building GenAI with free threaded Python by @p-wysocki in #2679
- Reduce structured output controller mutex locking scope by @mzegla in #2687
- [speculative decoding] Move from ManualTimer to pure metrics by @sbalandi in #2695
- [CI] [GHA] Use custom
actions/download-artifactaction with the fixed retries logic by @akashchi in #2692 - Enable VLM generation on NPU without image input by @AlexanderKalistratov in #2694
- [llm bench] Add possibility to setup cache eviction config for LLM by @sbalandi in #2693
- Remove not supported rerank models from docs by @as-suvorov in #2702
- Tune automatic memory allocation by @popovaan in #2697
- Bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2705
- Bump pytest from 8.4.1 to 8.4.2 in /tests/python_tests by @dependabot[bot] in #2710
- [WWB] friendly error message for wrong model type by @isanghao in #2672
- [CI] [GHA] Use smaller runners for image generation samples by @akashchi in #2682
- Add image generation pipeline reuse into README by @JohnLeFeng in #2701
- Align benchmark_vlm.py and cpp by @Wovchena in #2711
- [CI] Fix NodeJS tests for manylinux by @Retribution98 in #2715
- fix checking tokenizers version by @pavel-esir in #2667
- Optimize qwen2vl encoder by @WeldonWangwang in #2630
- Check available memory before allocating KV-cache. by @popovaan in #2683
- Fixed clearing of kv-cache for GPU by @popovaan in #2717
- [GHA] w/a to build ov samples by @mryzhov in #2734
- [OV JS] Initial support for SchedulerConfig by @almilosz in #2696
- Test LLM samples with GGUF models by @Retribution98 in #2464
- [llm_bench] LLMPipeline fix negative time by @sbalandi in #2742
- Bump pydantic from 2.11.7 to 2.11.9 in /samples by @dependabot[bot] in #2732
- [llm bench] Add mem info on initial/compilation phase to json/csv by @sbalandi in #2741
- Increase timeouts by @Wovchena in #2743
- Allow additional_params for tokenizer decode in TextStreamer by @dkalinowski in #2729
- Fix attention mask pass for whisper (static) by @eshiryae in #2665
- Support from-onnx parameter by @sstrehlk in #2441
- Use model path property for caching by @praasz in #2720
- Increase GGUF timeouts by @Wovchena in #2756
- [VLM] Fixed measuring of embeddings preparation. by @popovaan in #2752
- Bump timm from 1.0.19 to 1.0.20 by @dependabot[bot] in #2754
- [llm_bench] Fix OpenVINO config not being passed for speech-to-text and Whisper models by @aobolensk in #2763
- OPT & Clean code of openvino_vision_embeddings_merger_model inputs processing by @zhaixuejun1993 in #2726
- Add .github/pull_request_template.md by @Wovchena in #2765
- Update transformers to 4.53.3 by @as-suvorov in #2760
- Set xfail for gguf reader tests on windows. by @popovaan in #2766
- [llm_bench] Add reranking pipeline by @sbalandi in #2728
- remove extra cpu->gpu tensor copies in genai VLM pipeline by @liangali in #2507
- torchcodec for linux to use the latest datasets by @Wovchena in #2424
- [VLM] Add nanoLLaVA by @popovaan in #2733
- Simplify update_config_from_kwargs by @Wovchena in #2769
- Remove unused get_model_kv_cache_precision() by @Wovchena in #2750
- Update Tokenizers Submodule by @apaniukov in #2767
- Bump minja with call blocks support, remove chat template fallback for MiniCPM3-4B by @yatarkan in #2718
- Bump aquasecurity/trivy-action from 0.33.0 to 0.33.1 by @dependabot[bot] in #2706
- Bump langchain-core from 0.3.75 to 0.3.76 in /tests/python_tests by @dependabot[bot] in #2721
- hook supports transformers 4.55.0 by @wgzintel in #2776
- [Embeddings] Add last token pooling by @as-suvorov in #2757
- Remove hardcoded fields and pass special variables to
apply_chat_template()by @yatarkan in #2768 - [CI] [GHA] Migrate
macworkflow to mac-14 arm64 by @akashchi in #2533 - [CI] [GHA] Use older version of the Smart CI action by @akashchi in #2797
- Safe VLM JSON config parsing with
read_json_param()by @yatarkan in #2785 - [NPUW] Disable chunking and F16IC for gemma3 as they are not supported currently by @AlexanderKalistratov in #2800
- Skip qwen3 embeddings test for mac by @as-suvorov in #2802
- [wwb] Add text embeddings pipeline by @sbalandi in #2787
- [llm_bench] Refactor class mapping and add qwen3 to text_embeds/rerank supported list by @sbalandi in #2782
- Support One Element Vector in Chat Scenario by @apaniukov in #2775
- Bump pydantic from 2.11.9 to 2.12.0 by @dependabot[bot] in #2806
- [WWB] Fix to support miniCPM-o. by @popovaan in #2808
- [wwb] Add text reranking pipeline by @sbalandi in #2786
- Introduce JsonContainer for chat history by @yatarkan in #2799
- [llm_bench] Add possibility to run tool with models in gguf format by @sbalandi in #2771
- Bump tar-fs from 3.0.9 to 3.1.1 in /samples/js in the npm_and_yarn group across 1 directory by @dependabot[bot] in #2774
- [llm_bench] Add minicpmo to allowded model types by @sbalandi in #2810
- Text2Image pipeline export/import by @as-suvorov in #2716
- [JS] Bump version openvino-genai-node up to 2025.4.0 by @Retribution98 in #2803
- [WWB] Fix to support nanoLLaVA. by @popovaan in #2801
- [TextRerankPipeline] Support Qwen3 reranker model by @as-suvorov in #2809
- feat: Add GGUF file support for GenAI and HF text pipelines by @sswierze in #2798
- Bump langchain-core from 0.3.76 to 0.3.78 in /tests/python_tests by @dependabot[bot] in #2794
- [VLM] Add LLaVa-NeXT-Video. by @popovaan in #2793
- Fix cache eviction header - warning as error. by @rasapala in #2784
- Add minicpm4 in supported model list by @openvino-dev-samples in #2827
- Remove redundant WWB nanoLLaVA test runs from GA workflows by @as-suvorov in #2822
- Fix output wrong model name for CSV and json by @wgzintel in #2821
- Adding non-CB pipeline for Speculative Decoding by @AsyaPronina in #2544
- Bump pydantic from 2.12.0 to 2.12.1 in /samples by @dependabot[bot] in #2830
- Add GPT OSS to Supported Models List by @apaniukov in #2833
- Bump langchain-core from 0.3.78 to 0.3.79 in /tests/python_tests by @dependabot[bot] in #2823
- Bump diffusers from 0.34.0 to 0.35.2 in /samples by @dependabot[bot] in #2842
- add_request() to support token_type_ids with prompt by @zhaohb in #2738
- Bump pydantic from 2.12.1 to 2.12.2 by @dependabot[bot] in #2837
- Split cache eviction tests by @as-suvorov in #2844
- [OV JS] Fix TextEmbeddingPipeline config param by @almilosz in #2832
- Fix missing result_dir parameter in evaluator constructors by @sswierze in #2825
- Bump diffusers from 0.34.0 to 0.35.2 in /tests/python_tests by @dependabot[bot] in #2836
- Coverity issues fix by @sgonorov in #2834
- Bump pillow from 11.3.0 to 12.0.0 in /samples by @dependabot[bot] in #2848
- Bump actions/dependency-review-action from 4.7.3 to 4.8.1 by @dependabot[bot] in #2824
- Sparse attention documentation by @vshampor in #2698
- Increase cacheopt timeout by @Wovchena in #2852
- Bump actions/cache from 4.2.4 to 4.3.0 by @dependabot[bot] in #2770
- Bump langchain-community from 0.3.29 to 0.3.31 in /tests/python_tests by @dependabot[bot] in #2805
- Bump timm from 1.0.19 to 1.0.20 in /tests/python_tests by @dependabot[bot] in #2753
- [StructuredOutput] Update XGrammar by @apaniukov in #2817
- Expose get_original_chat_template method in Tokenizer by @mzegla in #2722
- [DOCS] Fix ENABLE_PYTHON cmake option by @as-suvorov in #2849
- Encourage requirements txt usage by @Wovchena in #2851
- Fix MiniCPM-o-2_6 by @Wovchena in #2847
- Enable model caching for Whisper pipeline on GPU and NPU by @luke-lin-vmc in #2759
- Increase wwb timeout by @Wovchena in #2854
- Show how to run from PR by @Wovchena in #2261
- Introduce chat history class by @yatarkan in #2816
- [CI] [GHA] Remove redundant args from the coverity command by @akashchi in #2761
- Reduce logging for windows by @Wovchena in #2853
- Bump actions/setup-node from 4.4.0 to 6.0.0 by @dependabot[bot] in #2856
- [llm_bench] Model type resolution fix by @sgonorov in #2877
- Fix whisper hook generate decorator arguments passing by @as-suvorov in #2873
- [llm_bench] Fix text use_case in case of complex_model_types by @sbalandi in #2874
- [JS API] Add text_generation/benchmark_genai.js sample by @almilosz in #2826
- Fixture-based VLM models reusing by @sgonorov in #2719
- Update Tokenizers Submodule by @apaniukov in #2872
- Support
ChatHistoryin.generate()for LLMs by @yatarkan in #2871 - Print the scheduler_config info by @wgzintel in #2467
- [llm_bench] Fix gpt-oss LLM bench support by @sgonorov in #2889
- C API: implemented VlmPipeline by @zhaohb in #2735
- [JS] Implement StructuredOutputConfig by @Retribution98 in #2876
- Fix unclassified issues by @sgonorov in #2881
- [llm_bench] Fix help message by @isanghao in #2887
- Limit torch version by @sgonorov in #2858
- Relax DecodedResults.perfMetrics test by @Wovchena in #2900
- [wwb] Fix reranker tests by @sbalandi in #2890
- Bump pydantic from 2.12.2 to 2.12.3 in /samples by @dependabot[bot] in #2868
- [JS] Disable the randomly failed tests by @Retribution98 in #2910
- Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 by @eshiryae in #2126
- Extend using of full chat history mode for stateful pipeline for VLM and LLM with encoded inputs by @AlexanderKalistratov in #2835
- Upgrade optimum-intel and transformers by @Wovchena in #2611
- [wwb] Similarity for reranking models become based on relevance scores by @sbalandi in #2883
- Bump langchain-core from 0.3.79 to 1.0.0 in /tests/python_tests by @dependabot[bot] in #2867
- Bump langchain-community from 0.3.31 to 0.4 in /tests/python_tests by @dependabot[bot] in #2866
- Add parsing by @pavel-esir in #2772
- Bump timm from 1.0.20 to 1.0.21 by @dependabot[bot] in #2915
- WWB Text Generation with LoRA by @likholat in #2723
- [RAG] Add qwen3 models to docs by @as-suvorov in #2920
- Restore 4 missing VLM test cases by @sgonorov in #2914
- Bump actions/upload-artifact from 4.6.2 to 5.0.0 by @dependabot[bot] in #2917
- Bump timm from 1.0.20 to 1.0.21 in /tests/python_tests by @dependabot[bot] in #2918
- update gpu block size based on xattn by @rnwang04 in #2764
- [JS] Create ChatHistory bindings by @Retribution98 in #2922
- Update C++ and Python chat samples with using
ChatHistoryclass by @yatarkan in #2931 - Update llm bench by @rnwang04 in #2924
- configure_sparse_attention for wwb model loader by @rnwang04 in #2927
- Align whisper log with other GenAI pipelines by @eshiryae in #2939
- add gil_scoped_acquire to fix segfault for streaming parser by @pavel-esir in #2938
- [llm_bench] Limit the MAX Input Text In log by @peterchen-intel in #2911
- Revert torch freeze by @sgonorov in #2935
- Fix spelling errors & Grammar issues by @as-suvorov in #2882
- [GHA] Add missing jobs to workflow overal status by @as-suvorov in #2893
- Drop Python 3.9 by @Wovchena in #2948
- Add gemma-3-270m to supported models by @yatarkan in #2949
- Clarify llm_bench help by @Wovchena in #2944
- Fix for another batch of coverity issues by @sgonorov in #2897
- [RAG] Add text rerank models task by @as-suvorov in #2941
- [Rag] Add new embedding parameters by @as-suvorov in #2932
- [llm_bench] Add support smolvlm for optimum only by @sbalandi in #2957
- Add threaded callback for image generation pipeline by @JohnLeFeng in #2864
- Fix
KVCrushAnchorPointMode::ALTERNATEconflicting with Windows headers by @yatarkan in #2958 - Fixed LLM bench readme to use CPU torch by @sgonorov in #2936
- Modify sample to demonstrate parsing by @pavel-esir in #2940
- Readme improvements by @MaximProshin in #2961
- [VLM] Use position_ids from inputs embedder in CB. by @popovaan in #2703
- Update README for threaded callback of image generation pipeline by @JohnLeFeng in #2962
- Support QWen VL video inputs (#2514) by @peterchen-intel in #2985
- [Port] Update condition for enum value conflicting with windows headers by @yatarkan in #2997
- Release fix for callback handling by @sgonorov in #3056
New Contributors
- @wine99 made their first contribution in #2563
- @JohnLeFeng made their first contribution in #2701
- @aobolensk made their first contribution in #2763
- @zhaixuejun1993 made their first contribution in #2726
- @liangali made their first contribution in #2507
- @rasapala made their first contribution in #2784
- @openvino-dev-samples made their first contribution in #2827
- @rnwang04 made their first contribution in #2764
- @MaximProshin made their first contribution in #2961
Full Changelog: 2025.3.0.0...2025.4.0.0