Release 2025.4.0.0 · openvinotoolkit/openvino.genai

What's Changed

Bump product version 2025.4 by @akladiev in #2620
Fix CLEANUP_CACHE by @Wovchena in #2617
[wwb] Add tests for mac/win to ci by @sbalandi in #2603
xfail embed by @Wovchena in #2618
Bump py-build-cmake from 0.4.3 to 0.5.0 by @dependabot[bot] in #2624
Bump actions/upload-pages-artifact from 3.0.1 to 4.0.0 by @dependabot[bot] in #2625
Bump optimum-intel[nncf] from 1.25.1 to 1.25.2 by @dependabot[bot] in #2613
Bump optimum-intel from 1.25.1 to 1.25.2 in /tests/python_tests by @dependabot[bot] in #2614
[GGUF] Fix Q4_1 accuracy by @wine99 in #2563
[llm bench] Add support of arcee model by @sbalandi in #2636
Warn about older transformers by @Wovchena in #2634
Limited max GPU KV-cache considering max allocatable GPU memory size by @popovaan in #2633
Bump actions/dependency-review-action from 4.7.1 to 4.7.2 by @dependabot[bot] in #2639
[llm_bench] Override max_length to preserve max_new_tokens by @Wovchena in #2641
Cache images by @Wovchena in #2629
Test logging by @Wovchena in #2621
[CMAKE] Fix samples installation by @mryzhov in #2649
[JS] Add prettier and align eslint by @almilosz in #2631
[WWB] Remove use_flash_attention_2 argument for phi4mm by @nikita-savelyevv in #2653
Implement text embedding pipeline shape fix by @as-suvorov in #2449
[CMAKE] Solve pybind targets conflict by @mryzhov in #2655
[GHA] Disable Cacheopt tests on mac by @mryzhov in #2663
Fix Coverity by @Wovchena in #2601
Bump peft from 0.17.0 to 0.17.1 in /samples by @dependabot[bot] in #2658
Bump peft from 0.17.0 to 0.17.1 in /tests/python_tests by @dependabot[bot] in #2660
downgrade xgrammar version in master by @pavel-esir in #2668
Extend chat template test models by @yatarkan in #2648
[NPU]Enable chunk prefill for VLM. by @intelgaoxiong in #2657
[JS] Add build NodeJS bindings into Manylinux 2_28 by @Retribution98 in #2537
WWB empty_adapters mode by @likholat in #2671
Bump actions/dependency-review-action from 4.7.2 to 4.7.3 by @dependabot[bot] in #2674
Bump aquasecurity/trivy-action from 0.32.0 to 0.33.0 by @dependabot[bot] in #2677
Bump langchain-core from 0.3.74 to 0.3.75 in /tests/python_tests by @dependabot[bot] in #2673
Bump actions/download-artifact from 4.3.0 to 5.0.0 by @dependabot[bot] in #2676
[OV JS] Add perfMetrics grammar getters & update docstrings by @almilosz in #2681
Bump langchain-community from 0.3.27 to 0.3.29 in /tests/python_tests by @dependabot[bot] in #2680
Bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #2685
Fix StructuredOutputConfig pybind11-subgen signatures generation by @pavel-esir in #2669
[llm_bench] Add start memory info by @sbalandi in #2686
Updating KVCrush hyperparameters by @gopikrishnajha in #2678
[Docs] Convert whisper as stateless in the quantization example by @nikita-savelyevv in #2690
print genai version by @wgzintel in #2684
Fix initializer for the sparse attention mode by @vshampor in #2689
Add docs entry about building GenAI with free threaded Python by @p-wysocki in #2679
Reduce structured output controller mutex locking scope by @mzegla in #2687
[speculative decoding] Move from ManualTimer to pure metrics by @sbalandi in #2695
[CI] [GHA] Use custom actions/download-artifact action with the fixed retries logic by @akashchi in #2692
Enable VLM generation on NPU without image input by @AlexanderKalistratov in #2694
[llm bench] Add possibility to setup cache eviction config for LLM by @sbalandi in #2693
Remove not supported rerank models from docs by @as-suvorov in #2702
Tune automatic memory allocation by @popovaan in #2697
Bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2705
Bump pytest from 8.4.1 to 8.4.2 in /tests/python_tests by @dependabot[bot] in #2710
[WWB] friendly error message for wrong model type by @isanghao in #2672
[CI] [GHA] Use smaller runners for image generation samples by @akashchi in #2682
Add image generation pipeline reuse into README by @JohnLeFeng in #2701
Align benchmark_vlm.py and cpp by @Wovchena in #2711
[CI] Fix NodeJS tests for manylinux by @Retribution98 in #2715
fix checking tokenizers version by @pavel-esir in #2667
Optimize qwen2vl encoder by @WeldonWangwang in #2630
Check available memory before allocating KV-cache. by @popovaan in #2683
Fixed clearing of kv-cache for GPU by @popovaan in #2717
[GHA] w/a to build ov samples by @mryzhov in #2734
[OV JS] Initial support for SchedulerConfig by @almilosz in #2696
Test LLM samples with GGUF models by @Retribution98 in #2464
[llm_bench] LLMPipeline fix negative time by @sbalandi in #2742
Bump pydantic from 2.11.7 to 2.11.9 in /samples by @dependabot[bot] in #2732
[llm bench] Add mem info on initial/compilation phase to json/csv by @sbalandi in #2741
Increase timeouts by @Wovchena in #2743
Allow additional_params for tokenizer decode in TextStreamer by @dkalinowski in #2729
Fix attention mask pass for whisper (static) by @eshiryae in #2665
Support from-onnx parameter by @sstrehlk in #2441
Use model path property for caching by @praasz in #2720
Increase GGUF timeouts by @Wovchena in #2756
[VLM] Fixed measuring of embeddings preparation. by @popovaan in #2752
Bump timm from 1.0.19 to 1.0.20 by @dependabot[bot] in #2754
[llm_bench] Fix OpenVINO config not being passed for speech-to-text and Whisper models by @aobolensk in #2763
OPT & Clean code of openvino_vision_embeddings_merger_model inputs processing by @zhaixuejun1993 in #2726
Add .github/pull_request_template.md by @Wovchena in #2765
Update transformers to 4.53.3 by @as-suvorov in #2760
Set xfail for gguf reader tests on windows. by @popovaan in #2766
[llm_bench] Add reranking pipeline by @sbalandi in #2728
remove extra cpu->gpu tensor copies in genai VLM pipeline by @liangali in #2507
torchcodec for linux to use the latest datasets by @Wovchena in #2424
[VLM] Add nanoLLaVA by @popovaan in #2733
Simplify update_config_from_kwargs by @Wovchena in #2769
Remove unused get_model_kv_cache_precision() by @Wovchena in #2750
Update Tokenizers Submodule by @apaniukov in #2767
Bump minja with call blocks support, remove chat template fallback for MiniCPM3-4B by @yatarkan in #2718
Bump aquasecurity/trivy-action from 0.33.0 to 0.33.1 by @dependabot[bot] in #2706
Bump langchain-core from 0.3.75 to 0.3.76 in /tests/python_tests by @dependabot[bot] in #2721
hook supports transformers 4.55.0 by @wgzintel in #2776
[Embeddings] Add last token pooling by @as-suvorov in #2757
Remove hardcoded fields and pass special variables to apply_chat_template() by @yatarkan in #2768
[CI] [GHA] Migrate mac workflow to mac-14 arm64 by @akashchi in #2533
[CI] [GHA] Use older version of the Smart CI action by @akashchi in #2797
Safe VLM JSON config parsing with read_json_param() by @yatarkan in #2785
[NPUW] Disable chunking and F16IC for gemma3 as they are not supported currently by @AlexanderKalistratov in #2800
Skip qwen3 embeddings test for mac by @as-suvorov in #2802
[wwb] Add text embeddings pipeline by @sbalandi in #2787
[llm_bench] Refactor class mapping and add qwen3 to text_embeds/rerank supported list by @sbalandi in #2782
Support One Element Vector in Chat Scenario by @apaniukov in #2775
Bump pydantic from 2.11.9 to 2.12.0 by @dependabot[bot] in #2806
[WWB] Fix to support miniCPM-o. by @popovaan in #2808
[wwb] Add text reranking pipeline by @sbalandi in #2786
Introduce JsonContainer for chat history by @yatarkan in #2799
[llm_bench] Add possibility to run tool with models in gguf format by @sbalandi in #2771
Bump tar-fs from 3.0.9 to 3.1.1 in /samples/js in the npm_and_yarn group across 1 directory by @dependabot[bot] in #2774
[llm_bench] Add minicpmo to allowded model types by @sbalandi in #2810
Text2Image pipeline export/import by @as-suvorov in #2716
[JS] Bump version openvino-genai-node up to 2025.4.0 by @Retribution98 in #2803
[WWB] Fix to support nanoLLaVA. by @popovaan in #2801
[TextRerankPipeline] Support Qwen3 reranker model by @as-suvorov in #2809
feat: Add GGUF file support for GenAI and HF text pipelines by @sswierze in #2798
Bump langchain-core from 0.3.76 to 0.3.78 in /tests/python_tests by @dependabot[bot] in #2794
[VLM] Add LLaVa-NeXT-Video. by @popovaan in #2793
Fix cache eviction header - warning as error. by @rasapala in #2784
Add minicpm4 in supported model list by @openvino-dev-samples in #2827
Remove redundant WWB nanoLLaVA test runs from GA workflows by @as-suvorov in #2822
Fix output wrong model name for CSV and json by @wgzintel in #2821
Adding non-CB pipeline for Speculative Decoding by @AsyaPronina in #2544
Bump pydantic from 2.12.0 to 2.12.1 in /samples by @dependabot[bot] in #2830
Add GPT OSS to Supported Models List by @apaniukov in #2833
Bump langchain-core from 0.3.78 to 0.3.79 in /tests/python_tests by @dependabot[bot] in #2823
Bump diffusers from 0.34.0 to 0.35.2 in /samples by @dependabot[bot] in #2842
add_request() to support token_type_ids with prompt by @zhaohb in #2738
Bump pydantic from 2.12.1 to 2.12.2 by @dependabot[bot] in #2837
Split cache eviction tests by @as-suvorov in #2844
[OV JS] Fix TextEmbeddingPipeline config param by @almilosz in #2832
Fix missing result_dir parameter in evaluator constructors by @sswierze in #2825
Bump diffusers from 0.34.0 to 0.35.2 in /tests/python_tests by @dependabot[bot] in #2836
Coverity issues fix by @sgonorov in #2834
Bump pillow from 11.3.0 to 12.0.0 in /samples by @dependabot[bot] in #2848
Bump actions/dependency-review-action from 4.7.3 to 4.8.1 by @dependabot[bot] in #2824
Sparse attention documentation by @vshampor in #2698
Increase cacheopt timeout by @Wovchena in #2852
Bump actions/cache from 4.2.4 to 4.3.0 by @dependabot[bot] in #2770
Bump langchain-community from 0.3.29 to 0.3.31 in /tests/python_tests by @dependabot[bot] in #2805
Bump timm from 1.0.19 to 1.0.20 in /tests/python_tests by @dependabot[bot] in #2753
[StructuredOutput] Update XGrammar by @apaniukov in #2817
Expose get_original_chat_template method in Tokenizer by @mzegla in #2722
[DOCS] Fix ENABLE_PYTHON cmake option by @as-suvorov in #2849
Encourage requirements txt usage by @Wovchena in #2851
Fix MiniCPM-o-2_6 by @Wovchena in #2847
Enable model caching for Whisper pipeline on GPU and NPU by @luke-lin-vmc in #2759
Increase wwb timeout by @Wovchena in #2854
Show how to run from PR by @Wovchena in #2261
Introduce chat history class by @yatarkan in #2816
[CI] [GHA] Remove redundant args from the coverity command by @akashchi in #2761
Reduce logging for windows by @Wovchena in #2853
Bump actions/setup-node from 4.4.0 to 6.0.0 by @dependabot[bot] in #2856
[llm_bench] Model type resolution fix by @sgonorov in #2877
Fix whisper hook generate decorator arguments passing by @as-suvorov in #2873
[llm_bench] Fix text use_case in case of complex_model_types by @sbalandi in #2874
[JS API] Add text_generation/benchmark_genai.js sample by @almilosz in #2826
Fixture-based VLM models reusing by @sgonorov in #2719
Update Tokenizers Submodule by @apaniukov in #2872
Support ChatHistory in .generate() for LLMs by @yatarkan in #2871
Print the scheduler_config info by @wgzintel in #2467
[llm_bench] Fix gpt-oss LLM bench support by @sgonorov in #2889
C API: implemented VlmPipeline by @zhaohb in #2735
[JS] Implement StructuredOutputConfig by @Retribution98 in #2876
Fix unclassified issues by @sgonorov in #2881
[llm_bench] Fix help message by @isanghao in #2887
Limit torch version by @sgonorov in #2858
Relax DecodedResults.perfMetrics test by @Wovchena in #2900
[wwb] Fix reranker tests by @sbalandi in #2890
Bump pydantic from 2.12.2 to 2.12.3 in /samples by @dependabot[bot] in #2868
[JS] Disable the randomly failed tests by @Retribution98 in #2910
Enable WhisperStatefulImpl for NPU, fix Whisper pipelines for transformers 4.53.3 & 4.55 by @eshiryae in #2126
Extend using of full chat history mode for stateful pipeline for VLM and LLM with encoded inputs by @AlexanderKalistratov in #2835
Upgrade optimum-intel and transformers by @Wovchena in #2611
[wwb] Similarity for reranking models become based on relevance scores by @sbalandi in #2883
Bump langchain-core from 0.3.79 to 1.0.0 in /tests/python_tests by @dependabot[bot] in #2867
Bump langchain-community from 0.3.31 to 0.4 in /tests/python_tests by @dependabot[bot] in #2866
Add parsing by @pavel-esir in #2772
Bump timm from 1.0.20 to 1.0.21 by @dependabot[bot] in #2915
WWB Text Generation with LoRA by @likholat in #2723
[RAG] Add qwen3 models to docs by @as-suvorov in #2920
Restore 4 missing VLM test cases by @sgonorov in #2914
Bump actions/upload-artifact from 4.6.2 to 5.0.0 by @dependabot[bot] in #2917
Bump timm from 1.0.20 to 1.0.21 in /tests/python_tests by @dependabot[bot] in #2918
update gpu block size based on xattn by @rnwang04 in #2764
[JS] Create ChatHistory bindings by @Retribution98 in #2922
Update C++ and Python chat samples with using ChatHistory class by @yatarkan in #2931
Update llm bench by @rnwang04 in #2924
configure_sparse_attention for wwb model loader by @rnwang04 in #2927
Align whisper log with other GenAI pipelines by @eshiryae in #2939
add gil_scoped_acquire to fix segfault for streaming parser by @pavel-esir in #2938
[llm_bench] Limit the MAX Input Text In log by @peterchen-intel in #2911
Revert torch freeze by @sgonorov in #2935
Fix spelling errors & Grammar issues by @as-suvorov in #2882
[GHA] Add missing jobs to workflow overal status by @as-suvorov in #2893
Drop Python 3.9 by @Wovchena in #2948
Add gemma-3-270m to supported models by @yatarkan in #2949
Clarify llm_bench help by @Wovchena in #2944
Fix for another batch of coverity issues by @sgonorov in #2897
[RAG] Add text rerank models task by @as-suvorov in #2941
[Rag] Add new embedding parameters by @as-suvorov in #2932
[llm_bench] Add support smolvlm for optimum only by @sbalandi in #2957
Add threaded callback for image generation pipeline by @JohnLeFeng in #2864
Fix KVCrushAnchorPointMode::ALTERNATE conflicting with Windows headers by @yatarkan in #2958
Fixed LLM bench readme to use CPU torch by @sgonorov in #2936
Modify sample to demonstrate parsing by @pavel-esir in #2940
Readme improvements by @MaximProshin in #2961
[VLM] Use position_ids from inputs embedder in CB. by @popovaan in #2703
Update README for threaded callback of image generation pipeline by @JohnLeFeng in #2962
Support QWen VL video inputs (#2514) by @peterchen-intel in #2985
[Port] Update condition for enum value conflicting with windows headers by @yatarkan in #2997
Release fix for callback handling by @sgonorov in #3056

New Contributors

@wine99 made their first contribution in #2563
@JohnLeFeng made their first contribution in #2701
@aobolensk made their first contribution in #2763
@zhaixuejun1993 made their first contribution in #2726
@liangali made their first contribution in #2507
@rasapala made their first contribution in #2784
@openvino-dev-samples made their first contribution in #2827
@rnwang04 made their first contribution in #2764
@MaximProshin made their first contribution in #2961

Full Changelog: 2025.3.0.0...2025.4.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2025.4.0.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!