Skip to content

Commit e563b2b

Browse files
Merge pull request #734 from roboflow/feature/video_metadata_deprecation
Video metadata deprecation
2 parents 948f938 + 2484992 commit e563b2b

File tree

60 files changed

+2967
-421
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+2967
-421
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Execution Engine Changelog
2+
3+
Below you can find the changelog for Execution Engine.
4+
5+
## Execution Engine `v1.2.0` | inference `v0.23.0`
6+
7+
* The [`video_metadata` kind](/workflows/kinds/video_metadata/) has been deprecated, and we **strongly recommend discontinuing its use for building
8+
blocks moving forward**. As an alternative, the [`image` kind](/workflows/kinds/image/) has been extended to support the same metadata as
9+
[`video_metadata` kind](/workflows/kinds/video_metadata/), which can now be provided optionally. This update is
10+
**non-breaking** for existing blocks, but **some older blocks** that produce images **may become incompatible** with
11+
**future** video processing blocks.
12+
13+
??? warning "Potential blocks incompatibility"
14+
15+
As previously mentioned, adding `video_metadata` as an optional field to the internal representation of
16+
[`image` kind](/workflows/kinds/image/) (`WorkflowImageData` class)
17+
may introduce some friction between existing blocks that output the [`image` kind](/workflows/kinds/image/) and
18+
future video processing blocks that rely on `video_metadata` being part of `image` representation.
19+
20+
The issue arises because, while we can provide **default** values for `video_metadata` in `image` without
21+
explicitly copying them from the input, any non-default metadata that was added upstream may be lost.
22+
This can lead to downstream blocks that depend on the `video_metadata` not functioning as expected.
23+
24+
We've updated all existing `roboflow_core` blocks to account for this, but blocks created before this change in
25+
external repositories may cause issues in workflows where their output images are used by video processing blocks.
26+
27+
28+
* While the deprecated [`video_metadata` kind](/workflows/kinds/video_metadata/) is still available for use, it will be fully removed in
29+
Execution Engine version `v2.0.0`.
30+
31+
!!! warning "Breaking change planned - Execution Engine `v2.0.0`"
32+
33+
[`video_metadata` kind](/workflows/kinds/video_metadata/) got deprecated and will be removed in `v2.0.0`
34+
35+
36+
* As a result of the changes mentioned above, the internal representation of the [`image` kind](/workflows/kinds/image/) has been updated to
37+
include a new `video_metadata` property. This property can be optionally set in the constructor; if not provided,
38+
a default value with reasonable defaults will be used. To simplify metadata manipulation within blocks, we have
39+
introduced two new class methods: `WorkflowImageData.copy_and_replace(...)` and `WorkflowImageData.create_crop(...)`.
40+
For more details, refer to the updated [`WoorkflowImageData` usage guide](/workflows/internal_data_types/#workflowimagedata).

docs/workflows/internal_data_types.md

Lines changed: 91 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,12 @@ image's location within the original file (e.g., when working with cropped image
100100
HTTP, WorkflowImageData allows caching of different image representations, such as base64-encoded versions,
101101
improving efficiency.
102102

103+
!!! Note "Video Metadata"
104+
105+
Since Execution Enginge `v1.2.0`, we have added `video_metadata` into `WorkflowImageData`. This
106+
object is supposed to hold the context of video processing and will only be relevant for video processing
107+
blocks. Other blocks may ignore it's existance if not creating output image (covered in the next section).
108+
103109
Operating on `WorkflowImageData` is fairly simple once you understand its interface. Here are some of the key
104110
methods and properties:
105111

@@ -128,6 +134,11 @@ def operate_on_image(workflow_image: WorkflowImageData) -> None:
128134

129135
# or the same for root metadata (the oldest ancestor of the image - Workflow input image)
130136
root_metadata = workflow_image.workflow_root_ancestor_metadata
137+
138+
# retrieving `VideoMetadata` object - see the usage guide section below
139+
# if `workflow_image` is not provided with `VideoMetadata` - default metadata object will
140+
# be created on accessing the property
141+
video_metadata = workflow_image.video_metadata
131142
```
132143

133144
Below you can find an example showcasing how to preserve metadata, while transforming image
@@ -140,29 +151,36 @@ from inference.core.workflows.execution_engine.entities.base import WorkflowImag
140151

141152
def transform_image(image: WorkflowImageData) -> WorkflowImageData:
142153
transformed_image = some_transformation(image.numpy_image)
143-
return WorkflowImageData(
144-
parent_metadata=image.parent_metadata,
145-
workflow_root_ancestor_metadata=image.workflow_root_ancestor_metadata,
154+
# `WorkflowImageData` exposes helper method to return a new object with
155+
# updated image, but with preserved metadata. Metadata preservation
156+
# should only be used when the output image is compatible regarding
157+
# data lineage (the predecessor-successor relation for images).
158+
# Lineage is not preserved for cropping and merging images (without common predecessor)
159+
# - below you may find implementation tips.
160+
return WorkflowImageData.copy_and_replace(
161+
origin_image_data=image,
146162
numpy_image=transformed_image,
147163
)
148164

165+
149166
def some_transformation(image: np.ndarray) -> np.ndarray:
150167
...
151168
```
152169

153170
??? tip "Images cropping"
154171

155172
When your block increases dimensionality and provides output with `image` kind - usually that means cropping the
156-
image. Below you can find scratch of implementation for that operation:
157-
173+
image. In such cases input image `video_metadata` is to be removed (as usually it does not make sense to
174+
keep them, as underlying video processing blocks will not work correctly when for dynamically created blocks).
175+
176+
Below you can find scratch of implementation for that operation:
177+
158178
```python
159179
from typing import List, Tuple
160180

161181
from dataclasses import replace
162-
from inference.core.workflows.execution_engine.entities.base import \
163-
WorkflowImageData, ImageParentMetadata, OriginCoordinatesSystem
164-
165-
182+
from inference.core.workflows.execution_engine.entities.base import WorkflowImageData
183+
166184
def crop_images(
167185
image: WorkflowImageData,
168186
crops: List[Tuple[str, int, int, int, int]],
@@ -171,45 +189,79 @@ def some_transformation(image: np.ndarray) -> np.ndarray:
171189
original_image = image.numpy_image
172190
for crop_id, x_min, y_min, x_max, y_max in crops:
173191
cropped_image = original_image[y_min:y_max, x_min:x_max]
174-
crop_parent_metadata = ImageParentMetadata(
175-
parent_id=crop_id,
176-
origin_coordinates=OriginCoordinatesSystem(
177-
left_top_x=x_min,
178-
left_top_y=y_min,
179-
origin_width=original_image.shape[1],
180-
origin_height=original_image.shape[0],
181-
),
182-
)
183-
# adding shift to root ancestor coordinates system
184-
crop_root_ancestor_coordinates = replace(
185-
image.workflow_root_ancestor_metadata.origin_coordinates,
186-
left_top_x=image.workflow_root_ancestor_metadata.origin_coordinates.left_top_x + x_min,
187-
left_top_y=image.workflow_root_ancestor_metadata.origin_coordinates.left_top_y + y_min,
188-
)
189-
workflow_root_ancestor_metadata = ImageParentMetadata(
190-
parent_id=image.workflow_root_ancestor_metadata.parent_id,
191-
origin_coordinates=crop_root_ancestor_coordinates,
192-
)
193-
result_crop = WorkflowImageData(
194-
parent_metadata=crop_parent_metadata,
195-
workflow_root_ancestor_metadata=workflow_root_ancestor_metadata,
196-
numpy_image=cropped_image,
192+
if not cropped_image.size:
193+
# discarding empty crops
194+
continue
195+
result_crop = WorkflowImageData.create_crop(
196+
origin_image_data=image,
197+
crop_identifier=crop_id,
198+
cropped_image=cropped_image,
199+
offset_x=x_min,
200+
offset_y=y_min,
197201
)
198202
crops.append(result_crop)
199203
return crops
200204
```
201205

206+
In some cases you may want to preserve `video_metadata`. Example of such situation is when
207+
your block produces crops based on fixed coordinates (like video single footage with multiple fixed Regions of
208+
Interest to be applied individual trackers) - then you want result crops to be processed in context of video,
209+
as if they were produced by separate cameras. To adjust behaviour of `create_crop(...)` method, simply add
210+
`preserve_video_metadata=True`:
202211

203-
## `VideoMetadata`
212+
```{ .py linenums="1" hl_lines="11"}
213+
def crop_images(
214+
image: WorkflowImageData,
215+
crops: List[Tuple[str, int, int, int, int]],
216+
) -> List[WorkflowImageData]:
217+
# [...]
218+
result_crop = WorkflowImageData.create_crop(
219+
origin_image_data=image,
220+
crop_identifier=crop_id,
221+
cropped_image=cropped_image,
222+
offset_x=x_min,
223+
offset_y=y_min,
224+
preserve_video_metadata=True
225+
)
226+
# [...]
227+
```
228+
229+
230+
??? tip "Merging images without common predecessor"
231+
232+
If common `parent_metadata` cannot be pointed for multiple images you try to merge, you should denote that
233+
"a new" image appears in the Workflow. To do it simply:
234+
235+
```python
236+
from typing import List, Tuple
237+
238+
from dataclasses import replace
239+
from inference.core.workflows.execution_engine.entities.base import \
240+
WorkflowImageData, ImageParentMetadata
241+
242+
def merge_images(image_1: WorkflowImageData, image_2: WorkflowImageData) -> WorkflowImageData:
243+
merged_image = some_mergin_operation(
244+
image_1=image_1.numpy_image,
245+
image_2=image_2.numpy_image
246+
)
247+
new_parent_metadata = ImageParentMetadata(
248+
# this is just one of the option for creating id, yet sensible one
249+
parent_id=f"{image_1.parent_metadata.parent_id} + {image_2.parent_metadata.parent_id}"
250+
)
251+
return WorkflowImageData(
252+
parent_metadata=new_parent_metadata,
253+
numpy_image=merged_imagem
254+
)
255+
```
204256

205-
!!! warning "Early adoption"
206257

207-
`video_metadata` kind and `VideoMetadata` data representatio are in early adoption at the moment. They represent
208-
new batch-oriented data type added to Workflows ecosystem that should provide extended set of metadata on top
209-
of video frame, to make it possible to create stateful video processing blocks like ByteTracker.
258+
## `VideoMetadata`
259+
260+
!!! warning "Deprecation"
210261

211-
Authors still experiment with different, potenially more handy ways of onboarding video processing. Stay tuned
212-
and observe [video processing updates](/workflows/video_processing/overview/).
262+
[`video_metadata` kind](/workflows/kinds/video_metadata) is deprecated - we advise not using that kind in new
263+
blocks. `VideoMetadata` data representation became a member of `WorkflowImageData` in Execution Engine `v1.2.0`
264+
(`inference` release `v0.23.0`)
213265

214266
`VideoMetadata` is a dataclass that provides the following metadata about video frame and video source:
215267

docs/workflows/kinds.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -37,36 +37,36 @@ for the presence of a mask in the input.
3737

3838
## Kinds declared in Roboflow plugins
3939
<!--- AUTOGENERATED_KINDS_LIST -->
40-
* [`list_of_values`](/workflows/kinds/list_of_values): List of values of any type
41-
* [`float`](/workflows/kinds/float): Float value
42-
* [`point`](/workflows/kinds/point): Single point in 2D
43-
* [`top_class`](/workflows/kinds/top_class): String value representing top class predicted by classification model
44-
* [`object_detection_prediction`](/workflows/kinds/object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object
4540
* [`bar_code_detection`](/workflows/kinds/bar_code_detection): Prediction with barcode detection
46-
* [`image_metadata`](/workflows/kinds/image_metadata): Dictionary with image metadata required by supervision
47-
* [`roboflow_project`](/workflows/kinds/roboflow_project): Roboflow project name
48-
* [`qr_code_detection`](/workflows/kinds/qr_code_detection): Prediction with QR code detection
49-
* [`*`](/workflows/kinds/*): Equivalent of any element
50-
* [`video_metadata`](/workflows/kinds/video_metadata): Video image metadata
51-
* [`rgb_color`](/workflows/kinds/rgb_color): RGB color
52-
* [`keypoint_detection_prediction`](/workflows/kinds/keypoint_detection_prediction): Prediction with detected bounding boxes and detected keypoints in form of sv.Detections(...) object
5341
* [`prediction_type`](/workflows/kinds/prediction_type): String value with type of prediction
54-
* [`image_keypoints`](/workflows/kinds/image_keypoints): Image keypoints detected by classical Computer Vision method
55-
* [`language_model_output`](/workflows/kinds/language_model_output): LLM / VLM output
56-
* [`detection`](/workflows/kinds/detection): Single element of detections-based prediction (like `object_detection_prediction`)
57-
* [`image`](/workflows/kinds/image): Image in workflows
58-
* [`float_zero_to_one`](/workflows/kinds/float_zero_to_one): `float` value in range `[0.0, 1.0]`
59-
* [`zone`](/workflows/kinds/zone): Definition of polygon zone
6042
* [`boolean`](/workflows/kinds/boolean): Boolean flag
61-
* [`instance_segmentation_prediction`](/workflows/kinds/instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object
62-
* [`contours`](/workflows/kinds/contours): List of numpy arrays where each array represents contour points
63-
* [`integer`](/workflows/kinds/integer): Integer value
43+
* [`numpy_array`](/workflows/kinds/numpy_array): Numpy array
44+
* [`list_of_values`](/workflows/kinds/list_of_values): List of values of any type
45+
* [`keypoint_detection_prediction`](/workflows/kinds/keypoint_detection_prediction): Prediction with detected bounding boxes and detected keypoints in form of sv.Detections(...) object
46+
* [`point`](/workflows/kinds/point): Single point in 2D
6447
* [`roboflow_api_key`](/workflows/kinds/roboflow_api_key): Roboflow API key
48+
* [`dictionary`](/workflows/kinds/dictionary): Dictionary
49+
* [`image_keypoints`](/workflows/kinds/image_keypoints): Image keypoints detected by classical Computer Vision method
50+
* [`zone`](/workflows/kinds/zone): Definition of polygon zone
51+
* [`detection`](/workflows/kinds/detection): Single element of detections-based prediction (like `object_detection_prediction`)
52+
* [`qr_code_detection`](/workflows/kinds/qr_code_detection): Prediction with QR code detection
6553
* [`parent_id`](/workflows/kinds/parent_id): Identifier of parent for step output
6654
* [`classification_prediction`](/workflows/kinds/classification_prediction): Predictions from classifier
55+
* [`*`](/workflows/kinds/*): Equivalent of any element
6756
* [`roboflow_model_id`](/workflows/kinds/roboflow_model_id): Roboflow model id
68-
* [`dictionary`](/workflows/kinds/dictionary): Dictionary
57+
* [`rgb_color`](/workflows/kinds/rgb_color): RGB color
58+
* [`roboflow_project`](/workflows/kinds/roboflow_project): Roboflow project name
59+
* [`image_metadata`](/workflows/kinds/image_metadata): Dictionary with image metadata required by supervision
60+
* [`contours`](/workflows/kinds/contours): List of numpy arrays where each array represents contour points
61+
* [`video_metadata`](/workflows/kinds/video_metadata): Video image metadata
6962
* [`string`](/workflows/kinds/string): String value
63+
* [`instance_segmentation_prediction`](/workflows/kinds/instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object
64+
* [`integer`](/workflows/kinds/integer): Integer value
65+
* [`top_class`](/workflows/kinds/top_class): String value representing top class predicted by classification model
66+
* [`float_zero_to_one`](/workflows/kinds/float_zero_to_one): `float` value in range `[0.0, 1.0]`
67+
* [`language_model_output`](/workflows/kinds/language_model_output): LLM / VLM output
68+
* [`object_detection_prediction`](/workflows/kinds/object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object
7069
* [`serialised_payloads`](/workflows/kinds/serialised_payloads): Serialised element that is usually accepted by sink
71-
* [`numpy_array`](/workflows/kinds/numpy_array): Numpy array
70+
* [`float`](/workflows/kinds/float): Float value
71+
* [`image`](/workflows/kinds/image): Image in workflows
7272
<!--- AUTOGENERATED_KINDS_LIST -->

inference/core/interfaces/stream/model_handlers/workflows.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,7 @@ def run_workflow(
2222
if fps is None:
2323
# for FPS reporting we expect 0 when FPS cannot be determined
2424
fps = 0
25-
workflows_parameters[image_input_name] = [
26-
video_frame.image for video_frame in video_frames
27-
]
28-
workflows_parameters[video_metadata_input_name] = [
25+
video_metadata_for_images = [
2926
VideoMetadata(
3027
video_identifier=(
3128
str(video_frame.source_id)
@@ -39,6 +36,17 @@ def run_workflow(
3936
)
4037
for video_frame in video_frames
4138
]
39+
workflows_parameters[image_input_name] = [
40+
{
41+
"type": "numpy_object",
42+
"value": video_frame.image,
43+
"video_metadata": video_metadata,
44+
}
45+
for video_frame, video_metadata in zip(
46+
video_frames, video_metadata_for_images
47+
)
48+
]
49+
workflows_parameters[video_metadata_input_name] = video_metadata_for_images
4250
return execution_engine.run(
4351
runtime_parameters=workflows_parameters,
4452
fps=fps,

0 commit comments

Comments
 (0)