Releases · openvinotoolkit/openvino.genai

03 Dec 06:05

Wovchena

2025.4.0.0

5041b1d

2025.4.0.0 Latest

Latest

What's Changed

Bump product version 2025.4 by @akladiev in #2620
Fix CLEANUP_CACHE by @Wovchena in #2617
[wwb] Add tests for mac/win to ci by @sbalandi in #2603
xfail embed by @Wovchena in #2618
Bump py-build-cmake from 0.4.3 to 0.5.0 by @dependabot[bot] in #2624
Bump actions/upload-pages-artifact from 3.0.1 to 4.0.0 by @dependabot[bot] in #2625
Bump optimum-intel[nncf] from 1.25.1 to 1.25.2 by @dependabot[bot] in #2613
Bump optimum-intel from 1.25.1 to 1.25.2 in /tests/python_tests by @dependabot[bot] in #2614
[GGUF] Fix Q4_1 accuracy by @wine99 in #2563
[llm bench] Add support of arcee model by @sbalandi in #2636
Warn about older transformers by @Wovchena in #2634
Limited max GPU KV-cache considering max allocatable GPU memory size by @popovaan in #2633
Bump actions/dependency-review-action from 4.7.1 to 4.7.2 by @dependabot[bot] in #2639
[llm_bench] Override max_length to preserve max_new_tokens by @Wovchena in #2641
Cache images by @Wovchena in #2629
Test logging by @Wovchena in #2621
[CMAKE] Fix samples installation by @mryzhov in #2649
[JS] Add prettier and align eslint by @almilosz in #2631
[WWB] Remove use_flash_attention_2 argument for phi4mm by @nikita-savelyevv in #2653
Implement text embedding pipeline shape fix by @as-suvorov in #2449
[CMAKE] Solve pybind targets conflict by @mryzhov in #2655
[GHA] Disable Cacheopt tests on mac by @mryzhov in #2663
Fix Coverity by @Wovchena in #2601
Bump peft from 0.17.0 to 0.17.1 in /samples by @dependabot[bot] in #2658
Bump peft from 0.17.0 to 0.17.1 in /tests/python_tests by @dependabot[bot] in #2660
downgrade xgrammar version in master by @pavel-esir in #2668
Extend chat template test models by @yatarkan in #2648
[NPU]Enable chunk prefill for VLM. by @intelgaoxiong in #2657
[JS] Add build NodeJS bindings into Manylinux 2_28 by @Retribution98 in #2537
WWB empty_adapters mode by @likholat in #2671
Bump actions/dependency-review-action from 4.7.2 to 4.7.3 by @dependabot[bot] in #2674
Bump aquasecurity/trivy-action from 0.32.0 to 0.33.0 by @dependabot[bot] in #2677
Bump langchain-core from 0.3.74 to 0.3.75 in /tests/python_tests by @dependabot[bot] in #2673
Bump actions/download-artifact from 4.3.0 to 5.0.0 by @dependabot[bot] in #2676
[OV JS] Add perfMetrics grammar getters & update docstrings by @almilosz in #2681
Bump langchain-community from 0.3.27 to 0.3.29 in /tests/python_tests by @dependabot[bot] in #2680
Bump actions/checkout from 4.2.2 to 5.0.0 by @dependabot[bot] in #2685
Fix StructuredOutputConfig pybind11-subgen signatures generation by @pavel-esir in #2669
[llm_bench] Add start memory info by @sbalandi in #2686
Updating KVCrush hyperparameters by @gopikrishnajha in #2678
[Docs] Convert whisper as stateless in the quantization example by @nikita-savelyevv in #2690
print genai version by @wgzintel in #2684
Fix initializer for the sparse attention mode by @vshampor in #2689
Add docs entry about building GenAI with free threaded Python by @p-wysocki in #2679
Reduce structured output controller mutex locking scope by @mzegla in #2687
[speculative decoding] Move from ManualTimer to pure metrics by @sbalandi in #2695
[CI] [GHA] Use custom actions/download-artifact action with the fixed retries logic by @akashchi in #2692
Enable VLM generation on NPU without image input by @AlexanderKalistratov in #2694
[llm bench] Add possibility to setup cache eviction config for LLM by @sbalandi in #2693
Remove not supported rerank models from docs by @as-suvorov in #2702
Tune automatic memory allocation by @popovaan in #2697
Bump actions/setup-python from 5.6.0 to 6.0.0 by @dependabot[bot] in #2705
Bump pytest from 8.4.1 to 8.4.2 in /tests/python_tests by @dependabot[bot] in #2710
[WWB] friendly error message for wrong model type by @isanghao in #2672
[CI] [GHA] Use smaller runners for image generation samples by @akashchi in #2682
Add image generation pipeline reuse into README by @JohnLeFeng in #2701
Align benchmark_vlm.py and cpp by @Wovchena in #2711
[CI] Fix NodeJS tests for manylinux by @Retribution98 in #2715
fix checking tokenizers version by @pavel-esir in #2667
Optimize qwen2vl encoder by @WeldonWangwang in #2630
Check available memory before allocating KV-cache. by @popovaan in #2683
Fixed clearing of kv-cache for GPU by @popovaan in #2717
[GHA] w/a to build ov samples by @mryzhov in #2734
[OV JS] Initial support for SchedulerConfig by @almilosz in #2696
Test LLM samples with GGUF models by @Retribution98 in #2464
[llm_bench] LLMPipeline fix negative time by @sbalandi in #2742
Bump pydantic from 2.11.7 to 2.11.9 in /samples by @dependabot[bot] in #2732
[llm bench] Add mem info on initial/compilation phase to json/csv by @sbalandi in #2741
Increase timeouts by @Wovchena in #2743
Allow additional_params for tokenizer decode in TextStreamer by @dkalinowski in #2729
Fix attention mask pass for whisper (static) by @eshiryae in #2665
Support from-onnx parameter by @sstrehlk in #2441
Use model path property for caching by @praasz in #2720
Increase GGUF timeouts by @Wovchena in #2756
[VLM] Fixed measuring of embeddings preparation. by @popovaan in #2752
Bump timm from 1.0.19 to 1.0.20 by @dependabot[bot] in #2754
[llm_bench] Fix OpenVINO config not being passed for speech-to-text and Whisper models by @aobolensk in #2763
OPT & Clean code of openvino_vision_embeddings_merger_model inputs processing by @zhaixuejun1993 in #2726
Add .github/pull_request_template.md by @Wovchena in #2765
Update transformers to 4.53.3 by @as-suvorov in https://gi...

Contributors

pavel-esir, zhaohb, and 41 other contributors

Assets 2

03 Sep 15:37

as-suvorov

2025.3.0.0

3c0e2d3

2025.3.0.0

What's Changed

Bump product version 2025.3 by @akladiev in #2255
Implement SnapKV by @vshampor in #2067
[WWB] Additional processing of native phi4mm by @nikita-savelyevv in #2276
Update ov genai version in samples by @as-suvorov in #2275
use chat templates in vlm by @eaidova in #2279
Fix 'Unsupported property' fails if set prompt_lookup to False by @sbalandi in #2240
Force the PA implementation in the llm-bench by default by @sbalandi in #2271
Update Whisper README.md as "--disable-stateful" is no longer required to export models for NPU by @luke-lin-vmc in #2249
Removed 'slices' from EncodedImage by @popovaan in #2258
support text embeddings in llm_bench by @eaidova in #2269
[wwb]: load transformers model first, then only trust_remote_code by @eaidova in #2270
[GHA] Coverity pipeline fixes by @mryzhov in #2283
[GHA][DEV] Fixed coverity path creation by @mryzhov in #2285
[GHA][DEV]Save coveity tool to cache by @mryzhov in #2286
[GHA][DEV] Set cache key for coverity tool by @mryzhov in #2288
Image generation multiconcurrency (#2190) by @dkalinowski in #2284
[GGUF] Support GGUF format for tokenizers and detokenizers by @rkazants in #2263
Unskip whisper tests & update optimum-intel by @as-suvorov in #2247
Update README.md with text-to-speech by @rkazants in #2294
add new chat template for qwen3 by @eaidova in #2297
[DOCS] Correct cmd-line for TTS conversion by @rkazants in #2303
[GHA] Enabled product manifest.yml by @mryzhov in #2281
Bump the npm_and_yarn group across 3 directories with 2 updates by @dependabot[bot] in #2309
[GHA] Save artifacts to cloud share by @akladiev in #1943
[GHA][[COVERITY] added manual trigger by @mryzhov in https://github.com//pull/2289
[GHA] Fix missing condition for Extract Artifacts step by @akladiev in #2313
[llm bench] Turn off PA backend for VLM by @sbalandi in #2312
[llm_bench] Add setting of max_num_batched_tokens for SchedulerConfig by @sbalandi in #2316
[GHA] Fix missing condition for LLM & VLM test by @sammysun0711 in #2326
[Test] Skip gguf test on MacOS due to sporadic failure by @sammysun0711 in #2328
[GGUF] support Qwen3 architecture by @TianmengChen in #2273
[llm_bench] Increase max_num_batched_tokens to the largest positive integer by @sbalandi in #2327
Bump aquasecurity/trivy-action from 0.30.0 to 0.31.0 by @dependabot[bot] in #2310
Bump actions/download-artifact from 4.1.9 to 4.3.0 by @dependabot[bot] in #2315
Fix system_message forwarding by @Wovchena in #2325
Disabled crop of the prompt for minicpmv. by @andreyanufr in #2320
[llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum by @sbalandi in #2332
Bump brace-expansion from 2.0.1 to 2.0.2 in /.github/actions/install_wheel in the npm_and_yarn group across 1 directory by @dependabot[bot] in #2338
Fix multinomial sampling for PromptLookupDecoding by @sbalandi in #2331
[llm bench] Avoid using not supported by beam_search parameters for beam_search case by @sbalandi in #2336
Update Export Requirements by @apaniukov in #2342
[GGUF] Serialize Generated OV Model for Faster LLMPipeline Init by @sammysun0711 in #2218
Fixed system message in chat mode. by @popovaan in #2343
Bump librosa from 0.10.2.post1 to 0.11.0 in /samples by @dependabot[bot] in #2346
[Test][GGUF] Add DeepSeek-R1-Distill-Qwen GGUF in CI by @sammysun0711 in #2329
[llm_bench] Remove default scheduler config by @sbalandi in #2341
master: add Phi-4-multimodal-instruct by @Wovchena in #2264
Fix paths with unicode for tokenizers by @yatarkan in #2337
[WWB] Add try-except block for processor loading by @nikita-savelyevv in #2352
[WWB] Bring back eager attention implementation by default by @nikita-savelyevv in #2353
fix supported models link in TTS samples by @eaidova in #2300
StaticLLMPipeline: Add tests on caching by @smirnov-alexey in #1905
[WWB] Fix loading the tokenizer for VLMs by @l-bat in #2351
Pass Scheduler Config for VLM Pipeline in WhoWhatBenchmark. by @popovaan in #2318
Remove misspelled CMAKE_CURRENT_SOUCE_DIR by @Wovchena in #2362
Increase timeout for LLM & VLM by @Wovchena in #2359
[llm bench] Fix setting ATTENTION_BACKEND to plugin config in case of fallback to Optimum for VLM by @sbalandi in #2361
Support multi images for vlm benchmarking in samples and llm_bench by @wgzintel in #2197
CB: Hetero pipeline parallel support by @WeldonWangwang in #2227
Update conversion instructions by @adrianboguszewski in #2287
Merge stderr from failed samples by @Wovchena in #2156
Revert cache folder by @Wovchena in #2372
Update README in Node.js API by @almilosz in #2374
[Docs] Rework home page by @yatarkan in #2368
Align PromptLookupDecoding with greedy when dynamic_split_fuse works by @sbalandi in #2360
Support to collect latency for transformers V4.52.0 by @wgzintel in #2373
Bump diffusers from 0.33.1 to 0.34.0 in /samples by @dependabot[bot] in #2381
Bump diffusers from 0.33.1 to 0.34.0 in /tests/python_tests by @dependabot[bot] in #2380
Structured Output generation with XGrammar by @pavel-esir in #2295
Disable XGrammar on Android by @apaniukov in #2389
[wwb] Take prompts from different categories for def dataset for VLM by @sbalandi in #2349
Fix for cloning NPU Image Generation pipelines (#2376) by @dkalinowski in #2393
Set add_special_tokens=false for image tags in MiniCPM. by @popovaan in #2404
Fix missing use cases for inpainting models and defining use case with relative path by @sbalandi in #2387
temporary skip failing whisper tests by @pavel-esir in #2396
Fix test_vlm_npu_no_exception by @AlexanderKalistratov in #2388
Bump timm from 1.0.15 to 1.0.16 by @dependabot[bot] in #2390
Optimize VisionEncoderQwen2VL::encode by @usst...

Contributors

adrianboguszewski, pavel-esir, and 41 other contributors

Assets 2

18 Jun 13:39

Wovchena

2025.2.0.0

01f0fe1

2025.2.0.0

What's Changed

[GHA] Replaced isual_language_chat_sample-ubuntu-minicpm_v2_6 job by @mryzhov in #1909
[GHA] Replaced cpp-chat_sample-ubuntu pipeline by @mryzhov in #1913
Add support of Prompt Lookup decoding to llm bench by @sbalandi in #1917
[GHA] Introduce SDL pipeline by @mryzhov in #1924
Switch Download OpenVINO step to aks-medium-runner by @ababushk in #1889
Bump product version 2025.2 by @akladiev in #1920
[GHA] Replaced cpp-continuous-batching by @mryzhov in #1910
Update dependencies in samples by @ilya-lavrenov in #1925
phi3_v: add universal tag by @Wovchena in #1921
Fix image_id unary error by @rkazants in #1927
[Docs] Image generation use case by @yatarkan in #1877
Add perf metrics for CB VLM by @pavel-esir in #1897
Enhance the flexibility of the c streamer by @apinge in #1941
add Gemma3 LLM to supported models by @eaidova in #1942
Added GPTQ/AWQ support with HF Transformers by @AlexKoff88 in #1933
Add --static_reshape option to llm_bench, to force static reshape + compilation at pipeline creation by @RyanMetcalfeInt8 in #1851
benchmark_image_gen: Add --reshape option, and ability to specify multiple devices by @RyanMetcalfeInt8 in #1878
Revert perf regression changes by @dkalinowski in #1949
Add running greedy_causal_lm for JS to the sample tests by @Retribution98 in #1930
[Docs] Add VLM use case by @yatarkan in #1907
Added possibility to generate base text on GPU for text evaluation. by @andreyanufr in #1945
VLM: change infer to start_async/wait by @dkalinowski in #1948
[WWB]: Addressed issues with validation on Windows by @AlexKoff88 in #1953
[GHA] Remove bandit pipeline by @mryzhov in #1956
Disable MSVC debug assertions, addressing false positives in iterator checking by @apinge in #1952
[GHA] Replaced genai-tools pipeline by @mryzhov in #1954
configurable delay by @eaidova in #1963
Update cast of tensor data pointer for const tensors by @praasz in #1966
Remove tokens after EOS for draft model for speculative decoding by @sbalandi in #1951
Add testcase for chat_sample_c by @apinge in #1934
Skip warm-up iteration during llm_bench results averaging by @nikita-savelyevv in #1972
Reset pipeline cache usage statistics on each generate call by @vshampor in #1961
[Docs] Update models, rebuild on push by @yatarkan in #1922
Updated logic whether PA backend is explicitly required by @ilya-lavrenov in #1976
[GHA] [MAC] Use latest_available_commit OV artifacts by @mryzhov in #1977
[GHA] Set HF_TOKEN by @mryzhov in #1986
[GHA] Setup ov_cache by @mryzhov in #1962
[GHA] Changed cleanup runner by @mryzhov in #1995
Added mutex to methods which use blocks map. by @popovaan in #1975
Add documentation and sample on KV cache eviction by @vshampor in #1960
StaticLLMPipeline: Simplify compile_model call logic by @smirnov-alexey in #1915
Fix reshape in heterogeneous SD samples by @helena-intel in #1994
Update tokenizers by @mryzhov in #2002
docs: fix max_new_tokens option description by @tpragasa in #1987
[Docs] Add speech recognition with whisper use case by @yatarkan in #1971
Revert "VLM: change infer to start_async/wait " by @ilya-lavrenov in #2004
Revert "Revert perf regression changes" by @ilya-lavrenov in #2003
Set xfail to failing tests. by @popovaan in #2006
[GHA] Use cpack bindings in the samples tests by @mryzhov in #1979
[Docs]: add Phi3.5MoE to supported models by @eaidova in #2012
add TensorArt SD3.5 models to supported list by @eaidova in #2013
Move MiniCPM resampler to vision encoder by @popovaan in #1997
[GHA] Fix ccache on Win/Mac by @mryzhov in #2008
samples/python/text_generation/lora.py -> samples/python/text_generation/lora_greedy_causal_lm.py by @Wovchena in #2007
Whisper timestamp fix by @RyanMetcalfeInt8 in #1918
Unskip Qwen2-VL-2B-Instruct sample test by @as-suvorov in #1970
[GHA] Use developer openvino packages by @mryzhov in #2000
Added NNCF to export-requirements.txt by @ilya-lavrenov in #1974
Bump py-build-cmake from 0.4.2 to 0.4.3 by @dependabot in #2016
Use OV_CACHE for python tests by @as-suvorov in #2020
[GHA] Disable HTTP calls to the Hugging Face Hub by @mryzhov in #2021
Add python bindings to VLMPipeline for encrypted models by @olpipi in #1916
Bump the npm_and_yarn group across 1 directory with 2 updates by @dependabot in #2017
CB: auto plugin support by @ilya-lavrenov in #2034
timeout-minutes: 90 by @Wovchena in #2039
Bump diffusers from 0.32.2 to 0.33.1 by @dependabot in #2031
Bump diffusers from 0.32.2 to 0.33.1 in /samples by @dependabot in #2032
Enable cache and add cache encryption to samples by @olpipi in #1990
Fix VLM concurrency by @mzegla in #2022
Move Phi3 vision projection model to vision encoder by @popovaan in #2009
Fix spelling by @Wovchena in #2025
[Docs] Enable autogenerated samples docs by @yatarkan in #2029
Synchronize entire embeddings calculation phase (#1967) by @mzegla in #1993
Add missing finish reason set when finishing the sequence by @mzegla in #2036
Bump image-size from 1.2.0 to 1.2.1 in /site in the npm_and_yarn group across 1 directory by @dependabot in #1998
Add README for C Samples by @apinge in #2040
Use ov_cache for test_vlm_pipeline by @as-suvorov in #2042
increase timeouts by @Wovchena in #2041
[GHA] Use azure runners for python tests by @mryzhov in #1991
[WWB]: move diffusers imports closer to usage by @eaidova in #2046
[llm bench] Move calculation of memory consumption to memory_monitor tool by @sbalandi in #1...

Contributors

ilya-lavrenov, pavel-esir, and 41 other contributors

Assets 2

10 Apr 09:35

Wovchena

2025.1.0.0

e5a8bb6

2025.1.0.0

What's Changed

skip failing Chinese prompt on Win by @pavel-esir in #1573
Bump product version 2025.1 by @akladiev in #1571
Bump tokenizers submodule by @akladiev in #1575
[LLM_BENCH] relax md5 checks and allow pass cb config without use_cb by @eaidova in #1570
[VLM] Add Qwen2VL by @yatarkan in #1553
Fix links, remind about ABI by @Wovchena in #1585
Add nightly to instructions similar to requirements by @Wovchena in #1582
GHA: use nightly from 2025.1.0 by @ilya-lavrenov in #1577
NPU LLM Pipeline: Switch to STATEFUL by default by @dmatveev in #1561
Verify not empty rendered chat template by @yatarkan in #1574
[RTTI] Fix passes rtti definitions by @t-jankowski in #1588
Test add_special_tokens properly by @pavel-esir in #1586
Add indentation for llm_bench json report dumping by @nikita-savelyevv in #1584
prioretize config model type under path-based task determination by @eaidova in #1587
Replace openvino.runtime imports with openvino by @helena-intel in #1579
Add tests for Whisper static pipeline by @eshiryae in #1250
CB: removed handle_dropped() misuse by @ilya-lavrenov in #1594
Bump timm from 1.0.13 to 1.0.14 by @dependabot in #1595
Update samples readme by @olpipi in #1545
[ Speculative decoding ][ Prompt lookup ] Enable Perf Metrics for assisting pipelines by @iefode in #1599
[LLM] [NPU] StaticLLMPipeline: Export blob by @smirnov-alexey in #1601
[llm_bench] enable prompt permutations for prevent prefix caching and fix vlm image load by @eaidova in #1607
LLM: use set_output_seq_len instead of WA by @ilya-lavrenov in #1611
CB: support different number of K and V heads per layer by @ilya-lavrenov in #1610
LLM: fixed Slice / Gather of last MatMul by @ilya-lavrenov in #1616
Switch to VS 2022 by @mryzhov in #1598
Add Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1609
Whisper pipeline: apply slice matmul by @as-suvorov in #1623
GHA: use OV master in mac.yml by @ilya-lavrenov in #1622
[Image Generation] Image2Image for FLUX by @likholat in #1621
add missed ignore_eos in generation config by @eaidova in #1625
Master increase priority for rt info to fix Phi-3.5-vision-instruct and Phi-3-vision-128k-instruct by @Wovchena in #1626
Correct model name by @wgzintel in #1624
Token rotation by @vshampor in #987
Whisper pipeline: use Sampler by @as-suvorov in #1615
Fix setting eos_token_id with kwarg by @Wovchena in #1629
Extract cacheopt E2E tests into separate test matrix field by @vshampor in #1630
[CB] Split token streaming and generation to different threads for all CB based pipelines by @iefode in #1544
Don't silence a error if a file can't be opened by @Wovchena in #1620
[CMAKE]: use different version for macOS arm64 by @ilya-lavrenov in #1632
Test invalid fields assignment raises in GenerationConfig by @Wovchena in #1633
do_sample=False for NPU in chat_sample, add NPU to README by @helena-intel in #1637
[JS] Add GenAI Node.js bindings by @vishniakov-nikolai in #1193
CB: preparation for relying on KV cache precisions from plugins by @ilya-lavrenov in #1634
[LLM bench]support providing adapter config mode by @eaidova in #1644
Automatically apply chat template in non-chat scenarios by @sbalandi in #1533
beam_search_causal_lm.cpp: delete wrong comment by @Wovchena in #1639
[WWB]: Fixed chat template usage in VLM GenAI pipeline by @AlexKoff88 in #1643
[WWB]: Fixed nano-Llava preprocessor selection by @AlexKoff88 in #1646
[WWB]: Added config to preprocessor call in VLMs by @AlexKoff88 in #1638
CB: remove DeviceConfig class by @ilya-lavrenov in #1640
[WWB]: Added initialization of nano-llava in case of Transformers model by @AlexKoff88 in #1649
WWB: simplify code around start_chat / use_template by @ilya-lavrenov in #1650
Tokenizers update by @ilya-lavrenov in #1653
DOCS: reorganized support models for image generation by @ilya-lavrenov in #1655
Fix using lm_bemch/wwb with version w/o apply_chat_template by @sbalandi in #1651
Fix Qwen2VL generation without images by @yatarkan in #1645
Parallel sampling with threadpool by @mzegla in #1252
[Coverity] Enabling coverity scan by @akazakov-github in #1657
[ CB ] Fix streaming in case of empty outputs by @iefode in #1647
Allow overriding eos_token_id by @Wovchena in #1654
CB: remove GenerationHandle:back by @ilya-lavrenov in #1662
Fix tiny-random-llava-next in VLM Pipeline by @yatarkan in #1660
[CB] Add KVHeadConfig parameters to PagedAttention's rt_info by @sshlyapn in #1666
Bump py-build-cmake from 0.3.4 to 0.4.0 by @dependabot in #1668
pin optimum version by @pavel-esir in #1675
[LLM] Enabled CB by default by @ilya-lavrenov in #1455
SAMPLER: fixed hang during destruction of ThreadPool by @ilya-lavrenov in #1681
CB: use optimized scheduler config for cases when user explicitly asked CB backend by @ilya-lavrenov in #1679
[CB] Return Block manager asserts to destructors by @iefode in #1569
phi3_v: allow images, remove unused var by @Wovchena in #1670
[Image Generation] Inpainting for FLUX by @likholat in #1685
[WWB]: Added support for SchedulerConfig in LLMPipeline by @AlexKoff88 in #1671
Add LongBench validation by @l-bat in #1220
Fix Tokenizer for several added special tokens by @pavel-esir in #1659
Unpin optimum-intel version by @ilya-lavrenov in #1680
Image generation: proper error message when encode() is used w/o encoder passed to ctor by @ilya-lavrenov in #1683
Fix excluding stop str from output for some tokenizer by @sbalandi in #1676
[VLM] Fix chat template fallback in chat mode with defined system message by @yatarkan in https://github.com/openvinotoolkit/openvino.genai/pull/...

Contributors

dmatveev, ilya-lavrenov, and 45 other contributors

Assets 2

07 Feb 04:52

Wovchena

2025.0.0.0

e5cf8ce

2025.0.0.0

Please check out the latest documentation pages related to the new openvino_genai package!

Assets 2

19 Dec 13:20

Wovchena

2024.6.0.0

1f149a6

2024.6.0.0

Please check out the latest documentation pages related to the new openvino_genai package!

Assets 2

20 Nov 10:56

Wovchena

2024.5.0.0

158f662

2024.5.0.0

Please check out the latest documentation pages related to the new openvino_genai package!

Assets 2

30 Sep 13:19

andrei-kochin

2024.4.1.0

1808558

2024.4.1.0 Pre-release

Pre-release

Please check out the latest documentation pages related to the new openvino_genai package!

What's Changed

Bump OV version to 2024.4.1 by @akladiev in #894
Update requirements.txt and add requirements_2024.4.txt by @wgzintel in #893

Full Changelog: 2024.4.0.0...2024.4.1.0

Contributors

akladiev and wgzintel

Assets 2

23 Sep 08:19

andrei-kochin

2024.4.0.0

2137d67

2024.4.0.0

Please check out the latest documentation pages related to the new openvino_genai package!

What's Changed

Support chat conversation for StaticLLMPipeline by @TolyaTalamanov in #580
Prefix caching. by @popovaan in #639
Allow to build GenAI with OpenVINO via extra modules by @ilya-lavrenov in #726
Simplified partial preemption algorithm. by @popovaan in #730
Add set_chat_template by @Wovchena in #734
Detect KV cache sequence length axis by @as-suvorov in #744
Enable u8 KV cache precision for CB by @ilya-lavrenov in #759
Add test case for native pytorch model by @wgzintel in #722
Prefix caching improvements by @popovaan in #758
Add USS metric by @wgzintel in #762
Prefix caching optimization by @popovaan in #785
Transition to default int4 compression configs from optimum-intel by @nikita-savelyevv in #689
Control KV-cache size for StaticLLMPipeline by @TolyaTalamanov in #795
[2024.4] update optimum intel commit to include mxfp4 conversion by @eaidova in #828
[2024.4] use perf metrics for genai in llm bench by @eaidova in #830
Update Pybind to version 13 by @mryzhov in #836
Introduce stop_strings and stop_token_ids sampling params [2024.4 base] by @mzegla in #817
StaticLLMPipeline: Handle single element list of prompts by @TolyaTalamanov in #848
Fix Meta-Llama-3.1-8B-Instruct chat template by @pavel-esir in #846
Add GPU support for continuous batching [2024.4] by @sshlyapn in #858

Full Changelog: 2024.3.0.0...2024.4.0.0

Contributors

ilya-lavrenov, sshlyapn, and 10 other contributors

Assets 2

01 Aug 07:38

Wovchena

2024.3.0.0

102f00a

2024.3.0.0

Please check out the latest documentation pages related to the new openvino_genai package!

Assets 2

Releases: openvinotoolkit/openvino.genai

2025.4.0.0

What's Changed

Contributors

Uh oh!

2025.3.0.0

What's Changed

Contributors

Uh oh!

2025.2.0.0

What's Changed

Contributors

Uh oh!

2025.1.0.0

What's Changed

Contributors

Uh oh!

2025.0.0.0

Uh oh!

2024.6.0.0

Uh oh!

2024.5.0.0

Uh oh!

2024.4.1.0

What's Changed

Contributors

Uh oh!

2024.4.0.0

What's Changed

Contributors

Uh oh!

2024.3.0.0

Uh oh!