openvinotoolkit
diff --git a/‎demos/common/export_models/export_model.py‎
Lines changed: 1 addition & 1 deletion b/‎demos/common/export_models/export_model.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎demos/image_generation/README.md‎
Lines changed: 21 additions & 30 deletions b/‎demos/image_generation/README.md‎
Lines changed: 21 additions & 30 deletions
diff --git a/‎demos/image_generation/output.png‎
-1.46 MB b/‎demos/image_generation/output.png‎
-1.46 MB
diff --git a/‎demos/image_generation/output2.png‎
-1.61 MB b/‎demos/image_generation/output2.png‎
-1.61 MB
diff --git a/‎docs/clients_genai.md‎
Lines changed: 6 additions & 5 deletions b/‎docs/clients_genai.md‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎docs/image_generation/reference.md‎
Lines changed: 164 additions & 1 deletion b/‎docs/image_generation/reference.md‎
Lines changed: 164 additions & 1 deletion
diff --git a/‎docs/model_server_rest_api_image_generation.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/model_server_rest_api_image_generation.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/reference.md‎
Lines changed: 0 additions & 1 deletion b/‎docs/reference.md‎
Lines changed: 0 additions & 1 deletion
@@ -601,7 +601,7 @@ def export_image_generation_model(model_repository_path, source_model, model_nam
     else:
         optimum_command = "optimum-cli export openvino --model {} --weight-format {} {}".format(source_model, precision, target_path)
         if os.system(optimum_command):
-            raise ValueError("Failed to export image generation model model", source_model)   
+            raise ValueError("Failed to export image generation model", source_model)
 
     plugin_config = {}
     assert num_streams >= 0, "num_streams should be a non-negative integer"
 
@@ -3,11 +3,11 @@
 This demo shows how to deploy image generation models (Stable Diffusion/Stable Diffusion 3/Stable Diffusion XL/FLUX) in the OpenVINO Model Server.
 Image generation pipeline is exposed via [OpenAI API](https://platform.openai.com/docs/api-reference/images/create) `images/generations` endpoints.
 
-> **Note:** This demo was tested on Intel® Xeon®, Intel® Core®, Intel® Arc™ A770 on Ubuntu 22/24, RedHat 9 and Windows 11.
+> **Note:** This demo was tested on Intel® Xeon®, Intel® Core®, Intel® Arc™ A770, Intel® Arc™ B580 on Ubuntu 22/24, RedHat 9 and Windows 11.
 
 ## Prerequisites
 
-**RAM/vRAM** Model used in this demo takes up to 7GB of RAM/vRAM. Please consider lower precision to decrease it, or better/bigger model to get better image results.
+**RAM/vRAM** Select model size and precision according to your hardware capabilities (RAM/vRAM). Request resolution plays significant role in memory consumption, so the higher resolution you request, the more RAM/vRAM is required.
 
 **Model preparation** (one of the below):
 - preconfigured models from HuggingFaces directly in OpenVINO IR format, list of Intel uploaded models available [here](https://huggingface.co/collections/OpenVINO/image-generation-67697d9952fb1eee4a252aa8))
@@ -56,8 +56,8 @@ Assuming you have unpacked model server package, make sure to:
 as mentioned in [deployment guide](../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
 
 
-```console
-mkdir -p models
+```bat
+mkdir models
 
 ovms --rest_port 8000 ^
   --model_repository_path ./models/ ^
@@ -95,10 +95,10 @@ docker run -d --rm -p 8000:8000 -v $(pwd)/models:/models/:rw \
 
 Depending on how you prepared models in the first step of this demo, they are deployed to either CPU or GPU (it's defined in `config.json`). If you run on GPU make sure to have appropriate drivers installed, so the device is accessible for the model server.
 
-```console
-mkdir -p models
+```bat
+mkdir models
 
-ovms.exe --rest_port 8000 ^
+ovms --rest_port 8000 ^
   --model_repository_path ./models/ ^
   --task image_generation ^
   --source_model OpenVINO/FLUX.1-schnell-int4-ov ^
@@ -131,7 +131,7 @@ Run `export_model.py` script to download and quantize the model:
 ```console
 python export_model.py image_generation \
   --source_model black-forest-labs/FLUX.1-schnell \
-  --weight-format int8 \
+  --weight-format int4 \
   --config_file_path models/config.json \
   --model_repository_path models \
   --overwrite_models
@@ -141,7 +141,7 @@ python export_model.py image_generation \
 ```console
 python export_model.py image_generation \
   --source_model black-forest-labs/FLUX.1-schnell \
-  --weight-format int8 \
+  --weight-format int4 \
   --target_device GPU \
   --config_file_path models/config.json \
   --model_repository_path models \
@@ -190,7 +190,7 @@ Assuming you have unpacked model server package, make sure to:
 
 as mentioned in [deployment guide](../../docs/deploying_server_baremetal.md), in every new shell that will start OpenVINO Model Server.
 
-```console
+```bat
 ovms --rest_port 8000 ^
   --model_name OpenVINO/FLUX.1-schnell-int4-ov ^
   --model_path ./models/black-forest-labs/FLUX.1-schnell
@@ -224,9 +224,9 @@ docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro \
 
 Depending on how you prepared models in the first step of this demo, they are deployed to either CPU or GPU (it's defined in `config.json`). If you run on GPU make sure to have appropriate drivers installed, so the device is accessible for the model server.
 
-```console
-ovms.exe --rest_port 8000 ^
-  --model_name OpenVINO/FLUX.1-schnell-int4-ov \
+```bat
+ovms --rest_port 8000 ^
+  --model_name OpenVINO/FLUX.1-schnell-int4-ov ^
   --model_path ./models/black-forest-labs/FLUX.1-schnell
 ```
 :::
@@ -277,7 +277,8 @@ curl http://localhost:8000/v3/images/generations \
   -H "Content-Type: application/json" \
   -d '{
     "model": "OpenVINO/FLUX.1-schnell-int4-ov",
-    "prompt": "three happy cats",
+    "prompt": "three cute cats sitting on a bench",
+    "rng_seed": 45,
     "num_inference_steps": 3,
     "size": "512x512"
   }'| jq -r '.data[0].b64_json' | base64 --decode > output.png
@@ -288,7 +289,7 @@ Windows Powershell
 $response = Invoke-WebRequest -Uri "http://localhost:8000/v3/images/generations" `
     -Method POST `
     -Headers @{ "Content-Type" = "application/json" } `
-    -Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three happy cats", "num_inference_steps": 3}'
+    -Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three cute cats sitting on a bench", "rng_seed": 45, "num_inference_steps": 3, "size": "512x512"}'
 
 $base64 = ($response.Content | ConvertFrom-Json).data[0].b64_json
 
@@ -299,7 +300,7 @@ Windows Command Prompt
 ```bat
 curl http://localhost:8000/v3/images/generations ^
   -H "Content-Type: application/json" ^
-  -d "{\"model\": \"OpenVINO/FLUX.1-schnell-int4-ov\", \"prompt\": \"three happy cats\", \"num_inference_steps\": 3, \"size\": \"512x512\"}"
+  -d "{\"model\": \"OpenVINO/FLUX.1-schnell-int4-ov\", \"prompt\": \"three cute cats sitting on a bench\", \"rng_seed\": 45, \"num_inference_steps\": 3, \"size\": \"512x512\"}"
 ```
 
 
@@ -328,32 +329,26 @@ Install the client library:
 pip3 install openai pillow
 ```
 
-```console
-pip3 install openai
-```
 ```python
 from openai import OpenAI
 import base64
 from io import BytesIO
 from PIL import Image
-import time
-
 
 client = OpenAI(
-    base_url="http://ov-spr-36.sclab.intel.com:7774/v3",
+    base_url="http://localhost:8000/v3",
     api_key="unused"
 )
 
-now = time.time()
 response = client.images.generate(
             model="OpenVINO/FLUX.1-schnell-int4-ov",
-            prompt="three happy cats",
+            prompt="three cute cats sitting on a bench",
             extra_body={
-                "rng_seed": 43,
+                "rng_seed": 60,
+                "size": "512x512",
                 "num_inference_steps": 3
             }
         )
-print("Time elapsed: ", time.time()-now, "seconds")
 base64_image = response.data[0].b64_json
 
 image_data = base64.b64decode(base64_image)
@@ -365,11 +360,7 @@ image.save('output2.png')
 Output file (`output2.png`):  
 ![output2](./output2.png)
 
-Client side logs confirm image generation latency on Intel® Xeon®:
 
-```
-Time elapsed:  18.89774751663208 seconds
-```
 
 
 ## References
 
@@ -317,9 +317,10 @@ client = OpenAI(
 )
 response = client.images.generate(
             model="OpenVINO/FLUX.1-schnell-int4-ov",
-            prompt="three happy cats",
+            prompt="three cute cats sitting on a bench",
+            "size": "512x512",
             extra_body={
-                "rng_seed": 42,
+                "rng_seed": 45,
                 "num_inference_steps": 3
             }
         )
@@ -336,19 +337,19 @@ curl http://localhost:8000/v3/images/generations \
   -H "Content-Type: application/json" \
   -d '{
     "model": "black-forest-labs/FLUX.1-schnell",
-    "prompt": "three happy cats",
+    "prompt": "three cute cats sitting on a bench",
     "num_inference_steps": 3,
     "size": "512x512"
   }'| jq -r '.data[0].b64_json' | base64 --decode > output.png
 ```
 :::
 :::{tab-item} Windows PowerShell
 :sync: power-shell
-```{code} bash
+```{code} powershell
 $response = Invoke-WebRequest -Uri "http://localhost:8000/v3/images/generations" `
     -Method POST `
     -Headers @{ "Content-Type" = "application/json" } `
-    -Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three happy cats", "num_inference_steps": 3}'
+    -Body '{"model": "OpenVINO/FLUX.1-schnell-int4-ov", "prompt": "three cute cats sitting on a bench", "num_inference_steps": 3,  "size": "512x512"}'
 $base64 = ($response.Content | ConvertFrom-Json).data[0].b64_json
 [IO.File]::WriteAllBytes('output.png', [Convert]::FromBase64String($base64))
 ```
 
@@ -1 +1,164 @@
-TBD
+# Efficient Image Generation Serving {#ovms_docs_image_generation_reference}
+
+## Image Generation Calculator
+Image Generation pipeline consists of one MediaPipe node - Image Generation Calculator. To serve the image generation model, it is required to create a MediaPipe graph configuration file that defines the node and its parameters. The graph configuration file is typically named `graph.pbtxt` and is placed in the model directory.
+The `graph.pbtxt` file may be created automatically by the Model Server when [using HuggingFaces pulling](../pull_hf_models.md) on start-up, automatically via [export models script](../../demos/common/export_models/) or manually by an administrator.
+
+Calculator has access to HTTP request and parses it to extract the generation parameters:
+```cpp
+struct HttpPayload {
+    std::string uri;
+    std::unordered_map<std::string, std::string> headers;
+    std::string body;
+    std::shared_ptr<rapidjson::Document> parsedJson;
+    std::shared_ptr<ClientConnection> client;
+    std::shared_ptr<MultiPartParser> multipartParser;
+};
+```
+
+The input JSON content should be compatible with the [Image Generation API](../model_server_rest_api_image_generation.md).
+
+The input also includes a side packet with a reference to `IMAGE_GEN_NODE_RESOURCES` which is a shared object representing multiple OpenVINO GenAI pipelines built from OpenVINO models loaded into memory just once.
+
+**Every node based on Image Generation Calculator MUST have exactly that specification of this side packet:**
+
+`input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"`
+
+**If it is missing or modified, model server will fail to provide graph with the model**
+
+The calculator produces `std::string` MediaPipe packet with the JSON content representing OpenAI response format, [described in separate document](../model_server_rest_api_image_generation.md). Image Generation calculator has no support for streaming and partial responses.
+
+Let's have a look at the example graph definition:
+```protobuf
+input_stream: "HTTP_REQUEST_PAYLOAD:input"
+output_stream: "HTTP_RESPONSE_PAYLOAD:output"
+
+node: {
+  name: "ImageGenExecutor"
+  calculator: "ImageGenCalculator"
+  input_stream: "HTTP_REQUEST_PAYLOAD:input"
+  input_side_packet: "IMAGE_GEN_NODE_RESOURCES:pipes"
+  output_stream: "HTTP_RESPONSE_PAYLOAD:output"
+  node_options: {
+      [type.googleapis.com / mediapipe.ImageGenCalculatorOptions]: {
+          models_path: "./"
+          device: "CPU"
+      }
+  }
+}
+```
+
+Above node configuration should be used as a template since user is not expected to change most of it's content. Actually only `node_options` requires user attention as it specifies OpenVINO GenAI pipeline parameters. The rest of the configuration can remain unchanged.
+
+The calculator supports the following `node_options` for tuning the pipeline configuration:
+-    `required string models_path` - location of the models and scheduler directory (can be relative);
+-    `optional string device` - device to load models to. Supported values: "CPU", "GPU", "NPU" [default = "CPU"]
+-    `optional string plugin_config` - [OpenVINO device plugin configuration](https://docs.openvino.ai/2025/openvino-workflow/running-inference/inference-devices-and-modes.html) and additional pipeline options. Should be provided in the same format for regular [models configuration](../parameters.md#model-configuration-options). The config is used for all models in the pipeline except for tokenizers (text encoders/decoders, unet, vae) [default = "{}"]
+-    `optional string max_resolution` - maximum resolution allowed for generation. Requests exceeding this value will be rejected. [default = "4096x4096"];
+-    `optional string default_resolution` - default resolution used for generation. If not specified, underlying model shape will determine final resolution.
+-    `optional uint64 max_num_images_per_prompt` - maximum number of images generated per prompt. Requests exceeding this value will be rejected. [default = 10];
+-    `optional uint64 default_num_inference_steps` - default number of inference steps used for generation, if not specified by the request [default = 50];
+-    `optional uint64 max_num_inference_steps` - maximum number of inference steps allowed for generation. Requests exceeding this value will be rejected. [default = 100];
+
+
+## Models Directory
+
+In node configuration we set `models_path` indicating location of the directory with files loaded by LLM engine. It loads following files:
+
+```
+models/OpenVINO/
+├── FLUX.1-schnell-int4-ov
+│   ├── graph.pbtxt <----------------- - OVMS MediaPipe graph configuration file
+│   ├── model_index.json <------------ - GenAI configuration file including pipeline type SD/SDXL/SD3/FLUX
+│   ├── README.md
+│   ├── scheduler
+│   │   └── scheduler_config.json
+│   ├── text_encoder
+│   │   ├── config.json
+│   │   ├── openvino_model.bin
+│   │   └── openvino_model.xml
+│   ├── text_encoder_2
+│   │   ├── config.json
+│   │   ├── openvino_model.bin
+│   │   └── openvino_model.xml
+│   ├── tokenizer
+│   │   ├── merges.txt
+│   │   ├── openvino_detokenizer.bin
+│   │   ├── openvino_detokenizer.xml
+│   │   ├── openvino_tokenizer.bin
+│   │   ├── openvino_tokenizer.xml
+│   │   ├── special_tokens_map.json
+│   │   ├── tokenizer_config.json
+│   │   └── vocab.json
+│   ├── tokenizer_2
+│   │   ├── openvino_detokenizer.bin
+│   │   ├── openvino_detokenizer.xml
+│   │   ├── openvino_tokenizer.bin
+│   │   ├── openvino_tokenizer.xml
+│   │   ├── special_tokens_map.json
+│   │   ├── spiece.model
+│   │   ├── tokenizer_config.json
+│   │   └── tokenizer.json
+│   ├── transformer
+│   │   ├── config.json
+│   │   ├── openvino_model.bin
+│   │   └── openvino_model.xml
+│   ├── vae_decoder
+│   │   ├── config.json
+│   │   ├── openvino_model.bin
+│   │   └── openvino_model.xml
+│   └── vae_encoder
+│       ├── config.json
+│       ├── openvino_model.bin
+│       └── openvino_model.xml
+└── stable-diffusion-v1-5-int8-ov
+    ├── feature_extractor
+    │   └── preprocessor_config.json
+    ├── graph.pbtxt <----------------- - OVMS MediaPipe graph configuration file
+    ├── model_index.json <------------ - GenAI configuration file including pipeline type SD/SDXL/SD3/FLUX
+    ├── README.md
+    ├── safety_checker
+    │   ├── config.json
+    │   └── model.safetensors
+    ├── scheduler
+    │   └── scheduler_config.json
+    ├── text_encoder
+    │   ├── config.json
+    │   ├── openvino_model.bin
+    │   └── openvino_model.xml
+    ├── tokenizer
+    │   ├── merges.txt
+    │   ├── openvino_detokenizer.bin
+    │   ├── openvino_detokenizer.xml
+    │   ├── openvino_tokenizer.bin
+    │   ├── openvino_tokenizer.xml
+    │   ├── special_tokens_map.json
+    │   ├── tokenizer_config.json
+    │   └── vocab.json
+    ├── unet
+    │   ├── config.json
+    │   ├── openvino_model.bin
+    │   └── openvino_model.xml
+    ├── vae_decoder
+    │   ├── config.json
+    │   ├── openvino_model.bin
+    │   └── openvino_model.xml
+    └── vae_encoder
+        ├── config.json
+        ├── openvino_model.bin
+        └── openvino_model.xml
+
+```
+
+- `graph.pbtxt` - MediaPipe graph configuration file defining the Image Generation Calculator node and its parameters.
+- `model_index.json` - GenAI configuration file that describes the pipeline type (SD/SDXL/SD3/FLUX) and the models used in the pipeline.
+- `scheduler/scheduler_config.json` - configuration file for the scheduler that manages the execution of the models in the pipeline.
+- `text_encoder`, `tokenizer`, `unet`, `vae_encoder`, `vae_decoder` - directories containing the OpenVINO models and their configurations for the respective components of the image generation pipeline.
+
+We recommend using [export script](../../demos/common/export_models/README.md) to prepare models directory structure for serving, or simply use [HuggingFace pulling](../pull_hf_models.md) to automatically download and convert models from Hugging Face Hub.
+
+Check [tested models](https://github.com/openvinotoolkit/openvino.genai/blob/master/tests/python_tests/models/real_models).
+
+## References
+- [Image Generation API](../model_server_rest_api_image_generation.md)
+- Demos on [CPU/GPU](../../demos/image_generation/README.md)
@@ -40,7 +40,7 @@ curl http://localhost:8000/v3/images/generations \
 | model | ✅ | ✅ | string (required) | Name of the model to use. Name assigned to a MediaPipe graph configured to schedule generation using desired embedding model. **Note**: This can also be omitted to fall back to URI based routing. Read more on routing topic **TODO** |
 | prompt | ✅ | ✅ | string (required) | A text description of the desired image(s). **TODO**: Length restrictions? Too short/too large? |
 | size | ✅ | ✅ | string or null (default: auto) | The size of the generated images. Must be in WxH format, example: `1024x768`. Default model W/H will be used when using `auto`. |
-| n | ❌ | ✅ | integer or null (default: `1`) | A number of images to generate. If you want to generate multiple images for the same combination of generation parameters and text prompts, you can use this parameter for better performance as internally compuations will be performed with batch for Unet / Transformer models and text embeddings tensors will also be computed only once. **Not supported for now.** |
+| n | ❌ | ✅ | integer or null (default: `1`) | A number of images to generate. If you want to generate multiple images for the same combination of generation parameters and text prompts, you can use this parameter for better performance as internally computations will be performed with batch for Unet / Transformer models and text embeddings tensors will also be computed only once. **Not supported for now.** |
 | background | ❌ | ✅ | string or null (default: auto) | Allows to set transparency for the background of the generated image(s). Not supported for now. |
 | style | ❌ | ✅ | string or null (default: vivid) | The style of the generated images. Recognized OpenAI settings, but not supported: vivid, natural. |
 | moderation | ❌ | ✅ | string (default: auto) | Control the content-moderation level for images generated by endpoint. Either `low` or `auto`. Not supported for now.  |
@@ -83,7 +83,7 @@ curl http://localhost:8000/v3/images/generations \
 ## Error handling
 Endpoint can raise an error related to incorrect request in the following conditions:
 - Incorrect format of any of the fields based on the schema
-- Tokenized prompt exceeds the maximum length of the model context. **TODO** Verify
+- Tokenized prompt exceeds the maximum length of the model context.
 - Model does not support requested width and height
 - Administrator defined min/max parameter value requirements are not met