Skip to content

Commit 5d986b7

Browse files

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+502
-323
lines changed

site/docs/guides/chat-scenario.mdx

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
sidebar_position: 2
3+
title: Chat Scenario
4+
---
5+
6+
# Using OpenVINO GenAI in Chat Scenario
7+
8+
For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.
9+
10+
Refer to the [Stateful Models vs Stateless Models](/docs/concepts/stateful-vs-stateless-models) for more information about KV-cache.
11+
12+
:::tip
13+
Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
14+
:::
15+
16+
:::info
17+
Chat mode is supported for both `LLMPipeline` and `VLMPipeline`.
18+
:::
19+
20+
A simple chat example (with grouped beam search decoding):
21+
22+
<LanguageTabs>
23+
<TabItemPython>
24+
```python showLineNumbers
25+
import openvino_genai as ov_genai
26+
pipe = ov_genai.LLMPipeline(model_path, 'CPU')
27+
28+
config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
29+
pipe.set_generation_config(config)
30+
31+
# highlight-next-line
32+
pipe.start_chat()
33+
while True:
34+
try:
35+
prompt = input('question:\n')
36+
except EOFError:
37+
break
38+
answer = pipe.generate(prompt)
39+
print('answer:\n')
40+
print(answer)
41+
print('\n----------\n')
42+
# highlight-next-line
43+
pipe.finish_chat()
44+
```
45+
</TabItemPython>
46+
<TabItemCpp>
47+
```cpp showLineNumbers
48+
#include "openvino/genai/llm_pipeline.hpp"
49+
#include <iostream>
50+
51+
int main(int argc, char* argv[]) {
52+
std::string prompt;
53+
54+
std::string model_path = argv[1];
55+
ov::genai::LLMPipeline pipe(model_path, "CPU");
56+
57+
ov::genai::GenerationConfig config;
58+
config.max_new_tokens = 100;
59+
config.num_beam_groups = 3;
60+
config.num_beams = 15;
61+
config.diversity_penalty = 1.0f;
62+
63+
// highlight-next-line
64+
pipe.start_chat();
65+
std::cout << "question:\n";
66+
while (std::getline(std::cin, prompt)) {
67+
std::cout << "answer:\n";
68+
auto answer = pipe.generate(prompt, config);
69+
std::cout << answer << std::endl;
70+
std::cout << "\n----------\n"
71+
"question:\n";
72+
}
73+
// highlight-next-line
74+
pipe.finish_chat();
75+
}
76+
```
77+
</TabItemCpp>
78+
</LanguageTabs>
79+
80+
:::info
81+
For more information, refer to the [Python](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/python/text_generation/chat_sample.py) and [C++](https://github.com/openvinotoolkit/openvino.genai/blob/master/samples/cpp/text_generation/chat_sample.cpp) chat samples.
82+
:::

site/docs/guides/model-preparation/convert-to-openvino.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ import UseCasesNote from './_use_cases_note.mdx';
88

99
# Convert Models to OpenVINO Format
1010

11-
This page explains how to convert various generative AI models from Hugging Face and ModelScope to OpenVINO IR format. Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
11+
This page explains how to convert various generative AI models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) to OpenVINO IR format.
12+
Refer to the [Supported Models](../../supported-models/index.mdx) for a list of available models.
1213

1314
For downloading pre-converted models, see [Download Pre-Converted OpenVINO Models](./download-openvino-models.mdx).
1415

site/docs/guides/model-preparation/download-openvino-models.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import UseCasesNote from './_use_cases_note.mdx';
88
# Download Pre-Converted OpenVINO Models
99

1010
OpenVINO GenAI allows to run different generative AI models (see [Supported Models](../../supported-models/index.mdx)).
11-
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models can save time and effort.
11+
While you can convert models from other frameworks (see [Convert Models to OpenVINO Format](./convert-to-openvino.mdx)), using pre-converted models from [Hugging Face](https://huggingface.co/) and [ModelScope](https://modelscope.cn/) can save time and effort.
1212

1313
## Download from Hugging Face
1414

site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/_chat_scenario.mdx renamed to site/docs/guides/streaming.mdx

Lines changed: 12 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,16 @@
1-
### Using GenAI in Chat Scenario
1+
---
2+
sidebar_position: 3
3+
---
24

3-
For chat applications, OpenVINO GenAI provides special optimizations to maintain conversation context and improve performance using KV-cache.
4-
5-
:::tip
6-
Use `start_chat()` and `finish_chat()` to properly manage the chat session's KV-cache. This improves performance by reusing context between messages.
7-
:::
8-
9-
A simple chat example (with grouped beam search decoding):
10-
11-
<LanguageTabs>
12-
<TabItemPython>
13-
```python showLineNumbers
14-
import openvino_genai as ov_genai
15-
pipe = ov_genai.LLMPipeline(model_path, 'CPU')
16-
17-
config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
18-
pipe.set_generation_config(config)
19-
20-
# highlight-next-line
21-
pipe.start_chat()
22-
while True:
23-
try:
24-
prompt = input('question:\n')
25-
except EOFError:
26-
break
27-
answer = pipe.generate(prompt)
28-
print('answer:\n')
29-
print(answer)
30-
print('\n----------\n')
31-
# highlight-next-line
32-
pipe.finish_chat()
33-
```
34-
</TabItemPython>
35-
<TabItemCpp>
36-
```cpp showLineNumbers
37-
#include "openvino/genai/llm_pipeline.hpp"
38-
#include <iostream>
39-
40-
int main(int argc, char* argv[]) {
41-
std::string prompt;
42-
43-
std::string model_path = argv[1];
44-
ov::genai::LLMPipeline pipe(model_path, "CPU");
45-
46-
ov::genai::GenerationConfig config;
47-
config.max_new_tokens = 100;
48-
config.num_beam_groups = 3;
49-
config.num_beams = 15;
50-
config.diversity_penalty = 1.0f;
51-
52-
// highlight-next-line
53-
pipe.start_chat();
54-
std::cout << "question:\n";
55-
while (std::getline(std::cin, prompt)) {
56-
std::cout << "answer:\n";
57-
auto answer = pipe.generate(prompt, config);
58-
std::cout << answer << std::endl;
59-
std::cout << "\n----------\n"
60-
"question:\n";
61-
}
62-
// highlight-next-line
63-
pipe.finish_chat();
64-
}
65-
```
66-
</TabItemCpp>
67-
</LanguageTabs>
68-
69-
#### Streaming the Output
5+
# Streaming the Output
706

717
For more interactive UIs during generation, you can stream output tokens.
728

73-
##### Streaming Function
9+
:::info
10+
Streaming is supported for both `LLMPipeline` and `VLMPipeline`.
11+
:::
12+
13+
## Streaming Function
7414

7515
In this example, a function outputs words to the console immediately upon generation:
7616

@@ -138,11 +78,7 @@ In this example, a function outputs words to the console immediately upon genera
13878
</TabItemCpp>
13979
</LanguageTabs>
14080

141-
:::info
142-
For more information, refer to the [chat sample](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/chat_sample/).
143-
:::
144-
145-
##### Custom Streamer Class
81+
## Custom Streamer Class
14682

14783
You can also create your custom streamer for more sophisticated processing:
14884

@@ -210,7 +146,7 @@ You can also create your custom streamer for more sophisticated processing:
210146
int main(int argc, char* argv[]) {
211147
std::string prompt;
212148
// highlight-next-line
213-
CustomStreamer custom_streamer;
149+
std::shared_ptr<CustomStreamer> custom_streamer;
214150

215151
std::string model_path = argv[1];
216152
ov::genai::LLMPipeline pipe(model_path, "CPU");
@@ -232,5 +168,5 @@ You can also create your custom streamer for more sophisticated processing:
232168
</LanguageTabs>
233169

234170
:::info
235-
For fully implemented iterable CustomStreamer refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
171+
For fully implemented iterable `CustomStreamer` refer to [multinomial_causal_lm](https://github.com/openvinotoolkit/openvino.genai/blob/releases/2025/0/samples/python/text_generation/multinomial_causal_lm.py) sample.
236172
:::

site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/_generation_parameters.mdx

Lines changed: 0 additions & 127 deletions
This file was deleted.

site/docs/use-cases/1-LLM-pipeline/_sections/_usage_options/index.mdx

Lines changed: 0 additions & 18 deletions
This file was deleted.

site/docs/use-cases/1-LLM-pipeline/index.mdx

Lines changed: 0 additions & 14 deletions
This file was deleted.

site/docs/use-cases/2-Image-Generation/index.mdx

Lines changed: 0 additions & 14 deletions
This file was deleted.

site/docs/use-cases/3-Processing-speech-whisper.md

Lines changed: 0 additions & 5 deletions
This file was deleted.

0 commit comments

Comments
 (0)