Skip to content

Commit a13ee3b

Browse files
authored
Image Gen Calculator doc + Image Gen demo fixes (#3415) (#3418)
CVS-166472
1 parent 16e7fa4 commit a13ee3b

File tree

9 files changed

+195
-41
lines changed

9 files changed

+195
-41
lines changed

demos/common/export_models/export_model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -601,7 +601,7 @@ def export_image_generation_model(model_repository_path, source_model, model_nam
601601
else:
602602
optimum_command = "optimum-cli export openvino --model {} --weight-format {} {}".format(source_model, precision, target_path)
603603
if os.system(optimum_command):
604-
raise ValueError("Failed to export image generation model model", source_model)
604+
raise ValueError("Failed to export image generation model", source_model)
605605

606606
plugin_config = {}
607607
assert num_streams >= 0, "num_streams should be a non-negative integer"

demos/image_generation/README.md

Lines changed: 21 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33
This demo shows how to deploy image generation models (Stable Diffusion/Stable Diffusion 3/Stable Diffusion XL/FLUX) in the OpenVINO Model Server.
44
Image generation pipeline is exposed via [OpenAI API](https://platform.openai.com/docs/api-reference/images/create) `images/generations` endpoints.
55

6-
> **Note:** This demo was tested on Intel® Xeon®, Intel® Core®, Intel® Arc™ A770 on Ubuntu 22/24, RedHat 9 and Windows 11.
6+
> **Note:** This demo was tested on Intel® Xeon®, Intel® Core®, Intel® Arc™ A770, Intel® Arc™ B580 on Ubuntu 22/24, RedHat 9 and Windows 11.
77
88
## Prerequisites
99

10-
**RAM/vRAM** Model used in this demo takes up to 7GB of RAM/vRAM. Please consider lower precision to decrease it, or better/bigger model to get better image results.
10+
**RAM/vRAM** Select model size and precision according to your hardware capabilities (RAM/vRAM). Request resolution plays significant role in memory consumption, so the higher resolution you request, the more RAM/vRAM is required.
1111

1212
**Model preparation** (one of the below):
1313
- preconfigured models from HuggingFaces directly in OpenVINO IR format, list of Intel uploaded models available [here](https://huggingface.co/collections/OpenVINO/image-generation-67697d9952fb1eee4a252aa8))
@@ -56,8 +56,8 @@ Assuming you have unpacked model server package, make sure to:
5656
as mentioned in [deployment guide](../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
5757

5858

59-
```console
60-
mkdir -p models
59+
```bat
60+
mkdir models
6161
6262
ovms --rest_port 8000 ^
6363
--model_repository_path ./models/ ^
@@ -95,10 +95,10 @@ docker run -d --rm -p 8000:8000 -v $(pwd)/models:/models/:rw \
9595

9696
Depending on how you prepared models in the first step of this demo, they are deployed to either CPU or GPU (it's defined in `config.json`). If you run on GPU make sure to have appropriate drivers installed, so the device is accessible for the model server.
9797

98-
```console
99-
mkdir -p models
98+
```bat
99+
mkdir models
100100
101-
ovms.exe --rest_port 8000 ^
101+
ovms --rest_port 8000 ^
102102
--model_repository_path ./models/ ^
103103
--task image_generation ^
104104
--source_model OpenVINO/FLUX.1-schnell-int4-ov ^
@@ -131,7 +131,7 @@ Run `export_model.py` script to download and quantize the model:
131131
```console
132132
python export_model.py image_generation \
133133
--source_model black-forest-labs/FLUX.1-schnell \
134-
--weight-format int8 \
134+
--weight-format int4 \
135135
--config_file_path models/config.json \
136136
--model_repository_path models \
137137
--overwrite_models
@@ -141,7 +141,7 @@ python export_model.py image_generation \
141141
```console
142142
python export_model.py image_generation \
143143
--source_model black-forest-labs/FLUX.1-schnell \
144-
--weight-format int8 \
144+
--weight-format int4 \
145145
--target_device GPU \
146146
--config_file_path models/config.json \
147147
--model_repository_path models \
@@ -190,7 +190,7 @@ Assuming you have unpacked model server package, make sure to:
190190

191191
as mentioned in [deployment guide](../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
192192

193-
```console
193+
```bat
194194
ovms --rest_port 8000 ^
195195
--model_name OpenVINO/FLUX.1-schnell-int4-ov ^
196196
--model_path ./models/black-forest-labs/FLUX.1-schnell
@@ -224,9 +224,9 @@ docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro \
224224

225225
Depending on how you prepared models in the first step of this demo, they are deployed to either CPU or GPU (it's defined in `config.json`). If you run on GPU make sure to have appropriate drivers installed, so the device is accessible for the model server.
226226

227-
```console
228-
ovms.exe --rest_port 8000 ^
229-
--model_name OpenVINO/FLUX.1-schnell-int4-ov \
227+
```bat
228+
ovms --rest_port 8000 ^
229+
--model_name OpenVINO/FLUX.1-schnell-int4-ov ^
230230
--model_path ./models/black-forest-labs/FLUX.1-schnell
231231
```
232232
:::
@@ -277,7 +277,8 @@ curl http://localhost:8000/v3/images/generations \
277277
-H "Content-Type: application/json" \
278278
-d '{
279279
"model": "OpenVINO/FLUX.1-schnell-int4-ov",
280-
"prompt": "three happy cats",
280+
"prompt": "three cute cats sitting on a bench",
281+
"rng_seed": 45,
281282
"num_inference_steps": 3,
282283
"size": "512x512"
283284
}'| jq -r '.data[0].b64_json' | base64 --decode > output.png
@@ -288,7 +289,7 @@ Windows Powershell
288289
$response = Invoke-WebRequest -Uri "http://localhost:8000/v3/images/generations" `
289290
-Method POST `
290291
-Headers @{ "Content-Type" = "application/json" } `
291-
-Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three happy cats", "num_inference_steps": 3}'
292+
-Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three cute cats sitting on a bench", "rng_seed": 45, "num_inference_steps": 3, "size": "512x512"}'
292293
293294
$base64 = ($response.Content | ConvertFrom-Json).data[0].b64_json
294295
@@ -299,7 +300,7 @@ Windows Command Prompt
299300
```bat
300301
curl http://localhost:8000/v3/images/generations ^
301302
-H "Content-Type: application/json" ^
302-
-d "{\"model\": \"OpenVINO/FLUX.1-schnell-int4-ov\", \"prompt\": \"three happy cats\", \"num_inference_steps\": 3, \"size\": \"512x512\"}"
303+
-d "{\"model\": \"OpenVINO/FLUX.1-schnell-int4-ov\", \"prompt\": \"three cute cats sitting on a bench\", \"rng_seed\": 45, \"num_inference_steps\": 3, \"size\": \"512x512\"}"
303304
```
304305

305306

@@ -328,32 +329,26 @@ Install the client library:
328329
pip3 install openai pillow
329330
```
330331

331-
```console
332-
pip3 install openai
333-
```
334332
```python
335333
from openai import OpenAI
336334
import base64
337335
from io import BytesIO
338336
from PIL import Image
339-
import time
340-
341337

342338
client = OpenAI(
343-
base_url="http://ov-spr-36.sclab.intel.com:7774/v3",
339+
base_url="http://localhost:8000/v3",
344340
api_key="unused"
345341
)
346342

347-
now = time.time()
348343
response = client.images.generate(
349344
model="OpenVINO/FLUX.1-schnell-int4-ov",
350-
prompt="three happy cats",
345+
prompt="three cute cats sitting on a bench",
351346
extra_body={
352-
"rng_seed": 43,
347+
"rng_seed": 60,
348+
"size": "512x512",
353349
"num_inference_steps": 3
354350
}
355351
)
356-
print("Time elapsed: ", time.time()-now, "seconds")
357352
base64_image = response.data[0].b64_json
358353

359354
image_data = base64.b64decode(base64_image)
@@ -365,11 +360,7 @@ image.save('output2.png')
365360
Output file (`output2.png`):
366361
![output2](./output2.png)
367362

368-
Client side logs confirm image generation latency on Intel® Xeon®:
369363

370-
```
371-
Time elapsed: 18.89774751663208 seconds
372-
```
373364

374365

375366
## References

demos/image_generation/output.png

-1.46 MB
Loading

demos/image_generation/output2.png

-1.61 MB
Loading

docs/clients_genai.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -317,9 +317,10 @@ client = OpenAI(
317317
)
318318
response = client.images.generate(
319319
model="OpenVINO/FLUX.1-schnell-int4-ov",
320-
prompt="three happy cats",
320+
prompt="three cute cats sitting on a bench",
321+
"size": "512x512",
321322
extra_body={
322-
"rng_seed": 42,
323+
"rng_seed": 45,
323324
"num_inference_steps": 3
324325
}
325326
)
@@ -336,19 +337,19 @@ curl http://localhost:8000/v3/images/generations \
336337
-H "Content-Type: application/json" \
337338
-d '{
338339
"model": "black-forest-labs/FLUX.1-schnell",
339-
"prompt": "three happy cats",
340+
"prompt": "three cute cats sitting on a bench",
340341
"num_inference_steps": 3,
341342
"size": "512x512"
342343
}'| jq -r '.data[0].b64_json' | base64 --decode > output.png
343344
```
344345
:::
345346
:::{tab-item} Windows PowerShell
346347
:sync: power-shell
347-
```{code} bash
348+
```{code} powershell
348349
$response = Invoke-WebRequest -Uri "http://localhost:8000/v3/images/generations" `
349350
-Method POST `
350351
-Headers @{ "Content-Type" = "application/json" } `
351-
-Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three happy cats", "num_inference_steps": 3}'
352+
-Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three cute cats sitting on a bench", "num_inference_steps": 3, "size": "512x512"}'
352353
$base64 = ($response.Content | ConvertFrom-Json).data[0].b64_json
353354
[IO.File]::WriteAllBytes('output.png', [Convert]::FromBase64String($base64))
354355
```

docs/image_generation/reference.md

Lines changed: 164 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,164 @@
1-
TBD
1+
# Efficient Image Generation Serving {#ovms_docs_image_generation_reference}
2+
3+
## Image Generation Calculator
4+
Image Generation pipeline consists of one MediaPipe node - Image Generation Calculator. To serve the image generation model, it is required to create a MediaPipe graph configuration file that defines the node and its parameters. The graph configuration file is typically named `graph.pbtxt` and is placed in the model directory.
5+
The `graph.pbtxt` file may be created automatically by the Model Server when [using HuggingFaces pulling](../pull_hf_models.md) on start-up, automatically via [export models script](../../demos/common/export_models/) or manually by an administrator.
6+
7+
Calculator has access to HTTP request and parses it to extract the generation parameters:
8+
```cpp
9+
struct HttpPayload {
10+
std::string uri;
11+
std::unordered_map<std::string, std::string> headers;
12+
std::string body;
13+
std::shared_ptr<rapidjson::Document> parsedJson;
14+
std::shared_ptr<ClientConnection> client;
15+
std::shared_ptr<MultiPartParser> multipartParser;
16+
};
17+
```
18+
19+
The input JSON content should be compatible with the [Image Generation API](../model_server_rest_api_image_generation.md).
20+
21+
The input also includes a side packet with a reference to `IMAGE_GEN_NODE_RESOURCES` which is a shared object representing multiple OpenVINO GenAI pipelines built from OpenVINO models loaded into memory just once.
22+
23+
**Every node based on Image Generation Calculator MUST have exactly that specification of this side packet:**
24+
25+
`input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"`
26+
27+
**If it is missing or modified, model server will fail to provide graph with the model**
28+
29+
The calculator produces `std::string` MediaPipe packet with the JSON content representing OpenAI response format, [described in separate document](../model_server_rest_api_image_generation.md). Image Generation calculator has no support for streaming and partial responses.
30+
31+
Let's have a look at the example graph definition:
32+
```protobuf
33+
input_stream: "HTTP_REQUEST_PAYLOAD:input"
34+
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
35+
36+
node: {
37+
name: "ImageGenExecutor"
38+
calculator: "ImageGenCalculator"
39+
input_stream: "HTTP_REQUEST_PAYLOAD:input"
40+
input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"
41+
output_stream: "HTTP_RESPONSE_PAYLOAD:output"
42+
node_options: {
43+
[type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: {
44+
models_path: "./"
45+
device: "CPU"
46+
}
47+
}
48+
}
49+
```
50+
51+
Above node configuration should be used as a template since user is not expected to change most of it's content. Actually only `node_options` requires user attention as it specifies OpenVINO GenAI pipeline parameters. The rest of the configuration can remain unchanged.
52+
53+
The calculator supports the following `node_options` for tuning the pipeline configuration:
54+
- `required string models_path` - location of the models and scheduler directory (can be relative);
55+
- `optional string device` - device to load models to. Supported values: "CPU", "GPU", "NPU" [default = "CPU"]
56+
- `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html) and additional pipeline options. Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options). The config is used for all models in the pipeline except for tokenizers (text encoders/decoders, unet, vae) [default = "{}"]
57+
- `optional string max_resolution` - maximum resolution allowed for generation. Requests exceeding this value will be rejected. [default = "4096x4096"];
58+
- `optional string default_resolution` - default resolution used for generation. If not specified, underlying model shape will determine final resolution.
59+
- `optional uint64 max_num_images_per_prompt` - maximum number of images generated per prompt. Requests exceeding this value will be rejected. [default = 10];
60+
- `optional uint64 default_num_inference_steps` - default number of inference steps used for generation, if not specified by the request [default = 50];
61+
- `optional uint64 max_num_inference_steps` - maximum number of inference steps allowed for generation. Requests exceeding this value will be rejected. [default = 100];
62+
63+
64+
## Models Directory
65+
66+
In node configuration we set `models_path` indicating location of the directory with files loaded by LLM engine. It loads following files:
67+
68+
```
69+
models/OpenVINO/
70+
├── FLUX.1-schnell-int4-ov
71+
│ ├── graph.pbtxt <----------------- - OVMS MediaPipe graph configuration file
72+
│ ├── model_index.json <------------ - GenAI configuration file including pipeline type SD/SDXL/SD3/FLUX
73+
│ ├── README.md
74+
│ ├── scheduler
75+
│ │ └── scheduler_config.json
76+
│ ├── text_encoder
77+
│ │ ├── config.json
78+
│ │ ├── openvino_model.bin
79+
│ │ └── openvino_model.xml
80+
│ ├── text_encoder_2
81+
│ │ ├── config.json
82+
│ │ ├── openvino_model.bin
83+
│ │ └── openvino_model.xml
84+
│ ├── tokenizer
85+
│ │ ├── merges.txt
86+
│ │ ├── openvino_detokenizer.bin
87+
│ │ ├── openvino_detokenizer.xml
88+
│ │ ├── openvino_tokenizer.bin
89+
│ │ ├── openvino_tokenizer.xml
90+
│ │ ├── special_tokens_map.json
91+
│ │ ├── tokenizer_config.json
92+
│ │ └── vocab.json
93+
│ ├── tokenizer_2
94+
│ │ ├── openvino_detokenizer.bin
95+
│ │ ├── openvino_detokenizer.xml
96+
│ │ ├── openvino_tokenizer.bin
97+
│ │ ├── openvino_tokenizer.xml
98+
│ │ ├── special_tokens_map.json
99+
│ │ ├── spiece.model
100+
│ │ ├── tokenizer_config.json
101+
│ │ └── tokenizer.json
102+
│ ├── transformer
103+
│ │ ├── config.json
104+
│ │ ├── openvino_model.bin
105+
│ │ └── openvino_model.xml
106+
│ ├── vae_decoder
107+
│ │ ├── config.json
108+
│ │ ├── openvino_model.bin
109+
│ │ └── openvino_model.xml
110+
│ └── vae_encoder
111+
│ ├── config.json
112+
│ ├── openvino_model.bin
113+
│ └── openvino_model.xml
114+
└── stable-diffusion-v1-5-int8-ov
115+
├── feature_extractor
116+
│ └── preprocessor_config.json
117+
├── graph.pbtxt <----------------- - OVMS MediaPipe graph configuration file
118+
├── model_index.json <------------ - GenAI configuration file including pipeline type SD/SDXL/SD3/FLUX
119+
├── README.md
120+
├── safety_checker
121+
│ ├── config.json
122+
│ └── model.safetensors
123+
├── scheduler
124+
│ └── scheduler_config.json
125+
├── text_encoder
126+
│ ├── config.json
127+
│ ├── openvino_model.bin
128+
│ └── openvino_model.xml
129+
├── tokenizer
130+
│ ├── merges.txt
131+
│ ├── openvino_detokenizer.bin
132+
│ ├── openvino_detokenizer.xml
133+
│ ├── openvino_tokenizer.bin
134+
│ ├── openvino_tokenizer.xml
135+
│ ├── special_tokens_map.json
136+
│ ├── tokenizer_config.json
137+
│ └── vocab.json
138+
├── unet
139+
│ ├── config.json
140+
│ ├── openvino_model.bin
141+
│ └── openvino_model.xml
142+
├── vae_decoder
143+
│ ├── config.json
144+
│ ├── openvino_model.bin
145+
│ └── openvino_model.xml
146+
└── vae_encoder
147+
├── config.json
148+
├── openvino_model.bin
149+
└── openvino_model.xml
150+
151+
```
152+
153+
- `graph.pbtxt` - MediaPipe graph configuration file defining the Image Generation Calculator node and its parameters.
154+
- `model_index.json` - GenAI configuration file that describes the pipeline type (SD/SDXL/SD3/FLUX) and the models used in the pipeline.
155+
- `scheduler/scheduler_config.json` - configuration file for the scheduler that manages the execution of the models in the pipeline.
156+
- `text_encoder`, `tokenizer`, `unet`, `vae_encoder`, `vae_decoder` - directories containing the OpenVINO models and their configurations for the respective components of the image generation pipeline.
157+
158+
We recommend using [export script](../../demos/common/export_models/README.md) to prepare models directory structure for serving, or simply use [HuggingFace pulling](../pull_hf_models.md) to automatically download and convert models from Hugging Face Hub.
159+
160+
Check [tested models](https://github.com/openvinotoolkit/openvino.genai/blob/master/tests/python_tests/models/real_models).
161+
162+
## References
163+
- [Image Generation API](../model_server_rest_api_image_generation.md)
164+
- Demos on [CPU/GPU](../../demos/image_generation/README.md)

docs/model_server_rest_api_image_generation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ curl http://localhost:8000/v3/images/generations \
4040
| model ||| string (required) | Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model. **Note**: This can also be omitted to fall back to URI based routing. Read more on routing topic **TODO** |
4141
| prompt ||| string (required) | A text description of the desired image(s). **TODO**: Length restrictions? Too short/too large? |
4242
| size ||| string or null (default: auto) | The size of the generated images. Must be in WxH format, example: `1024x768`. Default model W/H will be used when using `auto`. |
43-
| n ||| integer or null (default: `1`) | A number of images to generate. If you want to generate multiple images for the same combination of generation parameters and text prompts, you can use this parameter for better performance as internally compuations will be performed with batch for Unet / Transformer models and text embeddings tensors will also be computed only once. **Not supported for now.** |
43+
| n ||| integer or null (default: `1`) | A number of images to generate. If you want to generate multiple images for the same combination of generation parameters and text prompts, you can use this parameter for better performance as internally computations will be performed with batch for Unet / Transformer models and text embeddings tensors will also be computed only once. **Not supported for now.** |
4444
| background ||| string or null (default: auto) | Allows to set transparency for the background of the generated image(s). Not supported for now. |
4545
| style ||| string or null (default: vivid) | The style of the generated images. Recognized OpenAI settings, but not supported: vivid, natural. |
4646
| moderation ||| string (default: auto) | Control the content-moderation level for images generated by endpoint. Either `low` or `auto`. Not supported for now. |
@@ -83,7 +83,7 @@ curl http://localhost:8000/v3/images/generations \
8383
## Error handling
8484
Endpoint can raise an error related to incorrect request in the following conditions:
8585
- Incorrect format of any of the fields based on the schema
86-
- Tokenized prompt exceeds the maximum length of the model context. **TODO** Verify
86+
- Tokenized prompt exceeds the maximum length of the model context.
8787
- Model does not support requested width and height
8888
- Administrator defined min/max parameter value requirements are not met
8989

docs/reference.md

Lines changed: 0 additions & 1 deletion
This file was deleted.

0 commit comments

Comments
 (0)