roboflow
diff --git a/‎docs/workflows/execution_engine_changelog.md‎
Lines changed: 40 additions & 0 deletions b/‎docs/workflows/execution_engine_changelog.md‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎docs/workflows/internal_data_types.md‎
Lines changed: 91 additions & 39 deletions b/‎docs/workflows/internal_data_types.md‎
Lines changed: 91 additions & 39 deletions
diff --git a/‎docs/workflows/kinds.md‎
Lines changed: 23 additions & 23 deletions b/‎docs/workflows/kinds.md‎
Lines changed: 23 additions & 23 deletions
diff --git a/‎inference/core/interfaces/stream/model_handlers/workflows.py‎
Lines changed: 12 additions & 4 deletions b/‎inference/core/interfaces/stream/model_handlers/workflows.py‎
Lines changed: 12 additions & 4 deletions
@@ -0,0 +1,40 @@
+# Execution Engine Changelog
+
+Below you can find the changelog for Execution Engine.
+
+## Execution Engine `v1.2.0` | inference `v0.23.0`
+
+* The [`video_metadata` kind](/workflows/kinds/video_metadata/) has been deprecated, and we **strongly recommend discontinuing its use for building 
+blocks moving forward**. As an alternative, the [`image` kind](/workflows/kinds/image/) has been extended to support the same metadata as 
+[`video_metadata` kind](/workflows/kinds/video_metadata/), which can now be provided optionally. This update is 
+**non-breaking** for existing blocks, but **some older blocks** that produce images **may become incompatible** with 
+**future** video processing blocks.
+
+??? warning "Potential blocks incompatibility"
+
+    As previously mentioned, adding `video_metadata` as an optional field to the internal representation of 
+    [`image` kind](/workflows/kinds/image/) (`WorkflowImageData` class) 
+    may introduce some friction between existing blocks that output the [`image` kind](/workflows/kinds/image/) and 
+    future video processing blocks that rely on `video_metadata` being part of `image` representation. 
+    
+    The issue arises because, while we can provide **default** values for `video_metadata` in `image` without 
+    explicitly copying them from the input, any non-default metadata that was added upstream may be lost. 
+    This can lead to downstream blocks that depend on the `video_metadata` not functioning as expected.
+
+    We've updated all existing `roboflow_core` blocks to account for this, but blocks created before this change in
+    external repositories may cause issues in workflows where their output images are used by video processing blocks.
+
+
+* While the deprecated [`video_metadata` kind](/workflows/kinds/video_metadata/) is still available for use, it will be fully removed in 
+Execution Engine version `v2.0.0`.
+
+!!! warning "Breaking change planned - Execution Engine `v2.0.0`"
+
+    [`video_metadata` kind](/workflows/kinds/video_metadata/) got deprecated and will be removed in `v2.0.0`
+
+
+* As a result of the changes mentioned above, the internal representation of the [`image` kind](/workflows/kinds/image/) has been updated to 
+include a new `video_metadata` property. This property can be optionally set in the constructor; if not provided, 
+a default value with reasonable defaults will be used. To simplify metadata manipulation within blocks, we have 
+introduced two new class methods: `WorkflowImageData.copy_and_replace(...)` and `WorkflowImageData.create_crop(...)`. 
+For more details, refer to the updated [`WoorkflowImageData` usage guide](/workflows/internal_data_types/#workflowimagedata).
@@ -100,6 +100,12 @@ image's location within the original file (e.g., when working with cropped image
 HTTP, WorkflowImageData allows caching of different image representations, such as base64-encoded versions, 
 improving efficiency.
 
+!!! Note "Video Metadata"
+    
+    Since Execution Enginge `v1.2.0`, we have added `video_metadata` into `WorkflowImageData`. This 
+    object is supposed to hold the context of video processing and will only be relevant for video processing
+    blocks. Other blocks may ignore it's existance if not creating output image (covered in the next section). 
+
 Operating on `WorkflowImageData` is fairly simple once you understand its interface. Here are some of the key
 methods and properties:
 
@@ -128,6 +134,11 @@ def operate_on_image(workflow_image: WorkflowImageData) -> None:
 
     # or the same for root metadata (the oldest ancestor of the image - Workflow input image)
     root_metadata = workflow_image.workflow_root_ancestor_metadata
+    
+    # retrieving `VideoMetadata` object - see the usage guide section below
+    # if `workflow_image` is not provided with `VideoMetadata` - default metadata object will 
+    # be created on accessing the property
+    video_metadata = workflow_image.video_metadata 
 ```
 
 Below you can find an example showcasing how to preserve metadata, while transforming image
@@ -140,29 +151,36 @@ from inference.core.workflows.execution_engine.entities.base import WorkflowImag
 
 def transform_image(image: WorkflowImageData) -> WorkflowImageData:
     transformed_image = some_transformation(image.numpy_image)
-    return WorkflowImageData(
-        parent_metadata=image.parent_metadata,
-        workflow_root_ancestor_metadata=image.workflow_root_ancestor_metadata,
+    # `WorkflowImageData` exposes helper method to return a new object with
+    # updated image, but with preserved metadata. Metadata preservation
+    # should only be used when the output image is compatible regarding
+    # data lineage (the predecessor-successor relation for images).
+    # Lineage is not preserved for cropping and merging images (without common predecessor)
+    # - below you may find implementation tips.
+    return WorkflowImageData.copy_and_replace(
+        origin_image_data=image,
         numpy_image=transformed_image,
     )
 
+
 def some_transformation(image: np.ndarray) -> np.ndarray:
     ...
 ```
 
 ??? tip "Images cropping"
 
     When your block increases dimensionality and provides output with `image` kind - usually that means cropping the 
-    image. Below you can find scratch of implementation for that operation:
-    
+    image. In such cases input image `video_metadata` is to be removed (as usually it does not make sense to
+    keep them, as underlying video processing blocks will not work correctly when for dynamically created blocks).
+
+    Below you can find scratch of implementation for that operation:
+
     ```python
     from typing import List, Tuple
 
     from dataclasses import replace
-    from inference.core.workflows.execution_engine.entities.base import \
-        WorkflowImageData, ImageParentMetadata, OriginCoordinatesSystem
-    
-    
+    from inference.core.workflows.execution_engine.entities.base import WorkflowImageData
+
     def crop_images(
         image: WorkflowImageData, 
         crops: List[Tuple[str, int, int, int, int]],
@@ -171,45 +189,79 @@ def some_transformation(image: np.ndarray) -> np.ndarray:
         original_image = image.numpy_image
         for crop_id, x_min, y_min, x_max, y_max in crops:
             cropped_image = original_image[y_min:y_max, x_min:x_max]
-            crop_parent_metadata = ImageParentMetadata(
-                parent_id=crop_id,
-                origin_coordinates=OriginCoordinatesSystem(
-                    left_top_x=x_min,
-                    left_top_y=y_min,
-                    origin_width=original_image.shape[1],
-                    origin_height=original_image.shape[0],
-                ),
-            )
-            # adding shift to root ancestor coordinates system
-            crop_root_ancestor_coordinates = replace(
-                image.workflow_root_ancestor_metadata.origin_coordinates,
-                left_top_x=image.workflow_root_ancestor_metadata.origin_coordinates.left_top_x + x_min,
-                left_top_y=image.workflow_root_ancestor_metadata.origin_coordinates.left_top_y + y_min,
-            )
-            workflow_root_ancestor_metadata = ImageParentMetadata(
-                parent_id=image.workflow_root_ancestor_metadata.parent_id,
-                origin_coordinates=crop_root_ancestor_coordinates,
-            )
-            result_crop = WorkflowImageData(
-                parent_metadata=crop_parent_metadata,
-                workflow_root_ancestor_metadata=workflow_root_ancestor_metadata,
-                numpy_image=cropped_image,
+            if not cropped_image.size:
+                # discarding empty crops
+                continue
+            result_crop = WorkflowImageData.create_crop(
+                origin_image_data=image, 
+                crop_identifier=crop_id,
+                cropped_image=cropped_image,
+                offset_x=x_min,
+                offset_y=y_min,
             )
             crops.append(result_crop)
         return crops
     ```
 
+    In some cases you may want to preserve `video_metadata`. Example of such situation is when 
+    your block produces crops based on fixed coordinates (like video single footage with multiple fixed Regions of 
+    Interest to be applied individual trackers) - then you want result crops to be processed in context of video,
+    as if they were produced by separate cameras. To adjust behaviour of `create_crop(...)` method, simply add 
+    `preserve_video_metadata=True`:
 
-## `VideoMetadata`
+    ```{ .py linenums="1" hl_lines="11"}
+    def crop_images(
+        image: WorkflowImageData, 
+        crops: List[Tuple[str, int, int, int, int]],
+    ) -> List[WorkflowImageData]:
+        # [...]
+        result_crop = WorkflowImageData.create_crop(
+            origin_image_data=image, 
+            crop_identifier=crop_id,
+            cropped_image=cropped_image,
+            offset_x=x_min,
+            offset_y=y_min,
+            preserve_video_metadata=True
+        )
+        # [...]
+    ```
+
+
+??? tip "Merging images without common predecessor"
+
+    If common `parent_metadata` cannot be pointed for multiple images you try to merge, you should denote that
+    "a new" image appears in the Workflow. To do it simply:
+
+    ```python
+    from typing import List, Tuple
+    
+    from dataclasses import replace
+    from inference.core.workflows.execution_engine.entities.base import \
+        WorkflowImageData, ImageParentMetadata
+
+    def merge_images(image_1: WorkflowImageData, image_2: WorkflowImageData) -> WorkflowImageData:
+        merged_image = some_mergin_operation(
+            image_1=image_1.numpy_image,
+            image_2=image_2.numpy_image
+        )
+        new_parent_metadata = ImageParentMetadata(
+            # this is just one of the option for creating id, yet sensible one
+            parent_id=f"{image_1.parent_metadata.parent_id} + {image_2.parent_metadata.parent_id}"
+        )
+        return WorkflowImageData(
+            parent_metadata=new_parent_metadata,
+            numpy_image=merged_imagem
+        )
+    ```
 
-!!! warning "Early adoption"
 
-    `video_metadata` kind and `VideoMetadata` data representatio are in early adoption at the moment. They represent
-    new batch-oriented data type added to Workflows ecosystem that should provide extended set of metadata on top
-    of video frame, to make it possible to create stateful video processing blocks like ByteTracker. 
+## `VideoMetadata`
+
+!!! warning "Deprecation"
 
-    Authors still experiment with different, potenially more handy ways of onboarding video processing. Stay tuned 
-    and observe [video processing updates](/workflows/video_processing/overview/).
+    [`video_metadata` kind](/workflows/kinds/video_metadata) is deprecated - we advise not using that kind in new 
+    blocks. `VideoMetadata` data representation became a member of `WorkflowImageData` in Execution Engine `v1.2.0` 
+    (`inference` release `v0.23.0`)
 
 `VideoMetadata` is a dataclass that provides the following metadata about video frame and video source:
 
 
@@ -37,36 +37,36 @@ for the presence of a mask in the input.
 
 ## Kinds declared in Roboflow plugins
 <!--- AUTOGENERATED_KINDS_LIST -->
-* [`list_of_values`](/workflows/kinds/list_of_values): List of values of any type
-* [`float`](/workflows/kinds/float): Float value
-* [`point`](/workflows/kinds/point): Single point in 2D
-* [`top_class`](/workflows/kinds/top_class): String value representing top class predicted by classification model
-* [`object_detection_prediction`](/workflows/kinds/object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object
 * [`bar_code_detection`](/workflows/kinds/bar_code_detection): Prediction with barcode detection
-* [`image_metadata`](/workflows/kinds/image_metadata): Dictionary with image metadata required by supervision
-* [`roboflow_project`](/workflows/kinds/roboflow_project): Roboflow project name
-* [`qr_code_detection`](/workflows/kinds/qr_code_detection): Prediction with QR code detection
-* [`*`](/workflows/kinds/*): Equivalent of any element
-* [`video_metadata`](/workflows/kinds/video_metadata): Video image metadata
-* [`rgb_color`](/workflows/kinds/rgb_color): RGB color
-* [`keypoint_detection_prediction`](/workflows/kinds/keypoint_detection_prediction): Prediction with detected bounding boxes and detected keypoints in form of sv.Detections(...) object
 * [`prediction_type`](/workflows/kinds/prediction_type): String value with type of prediction
-* [`image_keypoints`](/workflows/kinds/image_keypoints): Image keypoints detected by classical Computer Vision method
-* [`language_model_output`](/workflows/kinds/language_model_output): LLM / VLM output
-* [`detection`](/workflows/kinds/detection): Single element of detections-based prediction (like `object_detection_prediction`)
-* [`image`](/workflows/kinds/image): Image in workflows
-* [`float_zero_to_one`](/workflows/kinds/float_zero_to_one): `float` value in range `[0.0, 1.0]`
-* [`zone`](/workflows/kinds/zone): Definition of polygon zone
 * [`boolean`](/workflows/kinds/boolean): Boolean flag
-* [`instance_segmentation_prediction`](/workflows/kinds/instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object
-* [`contours`](/workflows/kinds/contours): List of numpy arrays where each array represents contour points
-* [`integer`](/workflows/kinds/integer): Integer value
+* [`numpy_array`](/workflows/kinds/numpy_array): Numpy array
+* [`list_of_values`](/workflows/kinds/list_of_values): List of values of any type
+* [`keypoint_detection_prediction`](/workflows/kinds/keypoint_detection_prediction): Prediction with detected bounding boxes and detected keypoints in form of sv.Detections(...) object
+* [`point`](/workflows/kinds/point): Single point in 2D
 * [`roboflow_api_key`](/workflows/kinds/roboflow_api_key): Roboflow API key
+* [`dictionary`](/workflows/kinds/dictionary): Dictionary
+* [`image_keypoints`](/workflows/kinds/image_keypoints): Image keypoints detected by classical Computer Vision method
+* [`zone`](/workflows/kinds/zone): Definition of polygon zone
+* [`detection`](/workflows/kinds/detection): Single element of detections-based prediction (like `object_detection_prediction`)
+* [`qr_code_detection`](/workflows/kinds/qr_code_detection): Prediction with QR code detection
 * [`parent_id`](/workflows/kinds/parent_id): Identifier of parent for step output
 * [`classification_prediction`](/workflows/kinds/classification_prediction): Predictions from classifier
+* [`*`](/workflows/kinds/*): Equivalent of any element
 * [`roboflow_model_id`](/workflows/kinds/roboflow_model_id): Roboflow model id
-* [`dictionary`](/workflows/kinds/dictionary): Dictionary
+* [`rgb_color`](/workflows/kinds/rgb_color): RGB color
+* [`roboflow_project`](/workflows/kinds/roboflow_project): Roboflow project name
+* [`image_metadata`](/workflows/kinds/image_metadata): Dictionary with image metadata required by supervision
+* [`contours`](/workflows/kinds/contours): List of numpy arrays where each array represents contour points
+* [`video_metadata`](/workflows/kinds/video_metadata): Video image metadata
 * [`string`](/workflows/kinds/string): String value
+* [`instance_segmentation_prediction`](/workflows/kinds/instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object
+* [`integer`](/workflows/kinds/integer): Integer value
+* [`top_class`](/workflows/kinds/top_class): String value representing top class predicted by classification model
+* [`float_zero_to_one`](/workflows/kinds/float_zero_to_one): `float` value in range `[0.0, 1.0]`
+* [`language_model_output`](/workflows/kinds/language_model_output): LLM / VLM output
+* [`object_detection_prediction`](/workflows/kinds/object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object
 * [`serialised_payloads`](/workflows/kinds/serialised_payloads): Serialised element that is usually accepted by sink
-* [`numpy_array`](/workflows/kinds/numpy_array): Numpy array
+* [`float`](/workflows/kinds/float): Float value
+* [`image`](/workflows/kinds/image): Image in workflows
 <!--- AUTOGENERATED_KINDS_LIST -->
@@ -22,10 +22,7 @@ def run_workflow(
         if fps is None:
             # for FPS reporting we expect 0 when FPS cannot be determined
             fps = 0
-        workflows_parameters[image_input_name] = [
-            video_frame.image for video_frame in video_frames
-        ]
-        workflows_parameters[video_metadata_input_name] = [
+        video_metadata_for_images = [
             VideoMetadata(
                 video_identifier=(
                     str(video_frame.source_id)
@@ -39,6 +36,17 @@ def run_workflow(
             )
             for video_frame in video_frames
         ]
+        workflows_parameters[image_input_name] = [
+            {
+                "type": "numpy_object",
+                "value": video_frame.image,
+                "video_metadata": video_metadata,
+            }
+            for video_frame, video_metadata in zip(
+                video_frames, video_metadata_for_images
+            )
+        ]
+        workflows_parameters[video_metadata_input_name] = video_metadata_for_images
         return execution_engine.run(
             runtime_parameters=workflows_parameters,
             fps=fps,