[PyOV][DOCS] Update inference documentation with shared memory flags (#18561)

Jan Iwaszkiewicz · web-flow · commit ec26537b3e74 · 2023-07-18T13:15:10.000+02:00
diff --git a/docs/OV_Runtime_UG/Python_API_inference.md b/docs/OV_Runtime_UG/Python_API_inference.md
@@ -26,16 +26,17 @@ The ``CompiledModel`` class provides the ``__call__`` method that runs a single
    :fragment: [direct_inference]
 
 
-Shared Memory on Inputs
-#######################
+Shared Memory on Inputs and Outputs
+###################################
 
 While using ``CompiledModel``, ``InferRequest`` and ``AsyncInferQueue``, 
 OpenVINO™ Runtime Python API provides an additional mode - "Shared Memory". 
-Specify the ``shared_memory`` flag to enable or disable this feature. 
-The "Shared Memory" mode may be beneficial when inputs are large and copying 
-data is considered an expensive operation. This feature creates shared ``Tensor`` 
+Specify the ``share_inputs`` and ``share_outputs`` flag to enable or disable this feature. 
+The "Shared Memory" mode may be beneficial when inputs or outputs are large and copying data is considered an expensive operation.
+
+This feature creates shared ``Tensor`` 
 instances with the "zero-copy" approach, reducing overhead of setting inputs 
-to minimum. Example usage:
+to minimum. For outputs this feature creates numpy views on data. Example usage:
 
 
 .. doxygensnippet:: docs/snippets/ov_python_inference.py
@@ -45,13 +46,14 @@ to minimum. Example usage:
 
 .. note:: 
 
-   "Shared Memory" is enabled by default in ``CompiledModel.__call__``. 
+   "Shared Memory" on inputs is enabled by default in ``CompiledModel.__call__``. 
    For other methods, like ``InferRequest.infer`` or ``InferRequest.start_async``, 
    it is required to set the flag to ``True`` manually.
+   "Shared Memory" on outputs is disabled by default in all sequential inference methods (``CompiledModel.__call__`` and ``InferRequest.infer``). It is required to set the flag to ``True`` manually.
 
 .. warning:: 
 
-   When data is being shared, all modifications may affect inputs of the inference! 
+   When data is being shared, all modifications (including subsequent inference calls) may affect inputs and outputs of the inference! 
    Use this feature with caution, especially in multi-threaded/parallel code,
    where data can be modified outside of the function's control flow.
 
diff --git a/docs/snippets/ov_python_inference.py b/docs/snippets/ov_python_inference.py
@@ -32,9 +32,13 @@
 request = compiled_model.create_infer_request()
 
 #! [shared_memory_inference]
-# Data can be shared
-_ = compiled_model({"input_0": data_0, "input_1": data_1}, shared_memory=True)
-_ = request.infer({"input_0": data_0, "input_1": data_1}, shared_memory=True)
+# Data can be shared only on inputs
+_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=True)
+_ = request.infer({"input_0": data_0, "input_1": data_1}, share_inputs=True)
+# Data can be shared only on outputs
+_ = request.infer({"input_0": data_0, "input_1": data_1}, share_outputs=True)
+# Or both flags can be combined to achieve desired behavior
+_ = compiled_model({"input_0": data_0, "input_1": data_1}, share_inputs=False, share_outputs=True)
 #! [shared_memory_inference]
 
 time_in_sec = 2.0