Skip to content

Commit 7bf73de

Browse files
committed
Issue #604/#644 move ProcessBasedJobCreator example to more extensive doc page
1 parent 8ffa2a6 commit 7bf73de

File tree

4 files changed

+131
-43
lines changed

4 files changed

+131
-43
lines changed

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
- `MultiBackendJobManager`: Added `initialize_from_df(df)` (to `CsvJobDatabase` and `ParquetJobDatabase`) to initialize (and persist) the job database from a given DataFrame.
1414
Also added `create_job_db()` factory to easily create a job database from a given dataframe and its type guessed from filename extension.
1515
([#635](https://github.com/Open-EO/openeo-python-client/issues/635))
16-
- `MultiBackendJobManager.run_jobs()` now returns a dictionary with counters/stats about various events during the job run ([#645](https://github.com/Open-EO/openeo-python-client/issues/645))
17-
- Added `ProcessBasedJobCreator` to be used as `start_job` callable with `MultiBackendJobManager` to create multiple jobs from a single parameterized process (e.g. a UDP or remote process definition) ([#604](https://github.com/Open-EO/openeo-python-client/issues/604))
16+
- `MultiBackendJobManager.run_jobs()` now returns a dictionary with counters/stats about various events during the full run of the job manager ([#645](https://github.com/Open-EO/openeo-python-client/issues/645))
17+
- Added (experimental) `ProcessBasedJobCreator` to be used as `start_job` callable with `MultiBackendJobManager` to create multiple jobs from a single parameterized process (e.g. a UDP or remote process definition) ([#604](https://github.com/Open-EO/openeo-python-client/issues/604))
1818

1919
### Changed
2020

docs/cookbook/job_manager.rst

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
Multi Backend Job Manager
33
====================================
44

5+
API
6+
===
7+
58
.. warning::
69
This is a new experimental API, subject to change.
710

@@ -15,6 +18,105 @@ Multi Backend Job Manager
1518

1619
.. autoclass:: openeo.extra.job_management.ParquetJobDatabase
1720

21+
1822
.. autoclass:: openeo.extra.job_management.ProcessBasedJobCreator
1923
:members:
2024
:special-members: __call__
25+
26+
27+
.. _job-management-with-process-based-job-creator:
28+
29+
Job creation based on parameterized processes
30+
===============================================
31+
32+
The openEO API supports parameterized processes out of the box,
33+
which allows to work with flexible, reusable openEO building blocks
34+
in the form of :ref:`user-defined processes <user-defined-processes>`
35+
or `remote openEO process definitions <https://github.com/Open-EO/openeo-api/tree/draft/extensions/remote-process-definition>`_.
36+
This can also be leveraged for job creation in the context of the
37+
:py:class:`~openeo.extra.job_management.MultiBackendJobManager`:
38+
define a "template" job as a parameterized process
39+
and let the job manager fill in the parameters
40+
from a given data frame.
41+
42+
The :py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` helper class
43+
allows to do exactly that.
44+
Given a reference to a parameterized process,
45+
such as a user-defined process or remote process definition,
46+
it can be used directly as ``start_job`` callable to
47+
:py:meth:`~openeo.extra.job_management.MultiBackendJobManager.run_jobs`
48+
which will fill in the process parameters from the dataframe.
49+
50+
Basic :py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` example
51+
-----------------------------------------------------------------------------
52+
53+
Basic usage example with a remote process definition:
54+
55+
.. code-block:: python
56+
:linenos:
57+
:caption: Basic :py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` example snippet
58+
:emphasize-lines: 10-15, 28
59+
60+
from openeo.extra.job_management import (
61+
MultiBackendJobManager,
62+
create_job_db,
63+
ProcessBasedJobCreator,
64+
)
65+
66+
# Job creator, based on a parameterized openEO process
67+
# (specified by the remote process definition at given URL)
68+
# which has parameters "start_date" and "bands" for example.
69+
job_starter = ProcessBasedJobCreator(
70+
namespace="https://example.com/my_process.json",
71+
parameter_defaults={
72+
"bands": ["B02", "B03"],
73+
},
74+
)
75+
76+
# Initialize job database from a dataframe,
77+
# with desired parameter values to fill in.
78+
df = pd.DataFrame({
79+
"start_date": ["2021-01-01", "2021-02-01", "2021-03-01"],
80+
})
81+
job_db = create_job_db("jobs.csv").initialize_from_df(df)
82+
83+
# Create and run job manager,
84+
# which will start a job for each of the `start_date` values in the dataframe
85+
# and use the default band list ["B02", "B03"] for the "bands" parameter.
86+
job_manager = MultiBackendJobManager(...)
87+
job_manager.run_jobs(job_db=job_db, start_job=job_starter)
88+
89+
In this example, a :py:class:`ProcessBasedJobCreator` is instantiated
90+
based on a remote process definition,
91+
which has parameters ``start_date`` and ``bands``.
92+
When passed to :py:meth:`~openeo.extra.job_management.MultiBackendJobManager.run_jobs`,
93+
a job for each row in the dataframe will be created,
94+
with parameter values based on matching columns in the dataframe:
95+
96+
- the ``start_date`` parameter will be filled in
97+
with the values from the "start_date" column of the dataframe,
98+
- the ``bands`` parameter has no corresponding column in the dataframe,
99+
and will get its value from the default specified in the ``parameter_defaults`` argument.
100+
101+
102+
:py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` with geometry handling
103+
---------------------------------------------------------------------------------------------
104+
105+
Apart from the intuitive name-based parameter-column linking,
106+
:py:class:`~openeo.extra.job_management.ProcessBasedJobCreator`
107+
also automatically links:
108+
109+
- a process parameters that accepts inline GeoJSON geometries/features
110+
(which practically means it has a schema like ``{"type": "object", "subtype": "geojson"}``,
111+
as produced by :py:meth:`Parameter.geojson <openeo.api.process.Parameter.geojson>`).
112+
- with the geometry column in a `GeoPandas <https://geopandas.org/>`_ dataframe.
113+
114+
even if the name of the parameter does not exactly match
115+
the name of the GeoPandas geometry column (``geometry`` by default).
116+
This automatic liking is only done if there is only one
117+
GeoJSON parameter and one geometry column in the dataframe.
118+
119+
120+
.. admonition:: to do
121+
122+
Add example with geometry handling.

docs/rst-cheatsheet.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,15 @@ More explicit code block with language hint (and no need for double colon)
5050
>>> 3 + 5
5151
8
5252
53+
Code block with additional features (line numbers, caption, highlighted lines,
54+
for more see https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-code-block)
55+
56+
.. code-block:: python
57+
:linenos:
58+
:caption: how to say hello
59+
:emphasize-lines: 1
60+
61+
print("hello world")
5362
5463
5564
References:
@@ -60,4 +69,8 @@ References:
6069

6170
- refer to the reference with::
6271

63-
:ref:`target`
72+
:ref:`target` or :ref:`custom text <target>`
73+
74+
- inline URL references::
75+
76+
`Python <https://www.python.org/>`_

openeo/extra/job_management.py

Lines changed: 13 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -948,47 +948,13 @@ class ProcessBasedJobCreator:
948948
for each row of the dataframe managed by the :py:class:`MultiBackendJobManager`
949949
by filling in the process parameters with corresponding row values.
950950
951-
Usage example with a remote process definition:
951+
.. seealso::
952+
See :ref:`job-management-with-process-based-job-creator`
953+
for more information and examples.
952954
953-
.. code-block:: python
954-
955-
from openeo.extra.job_management import (
956-
MultiBackendJobManager,
957-
create_job_db,
958-
ProcessBasedJobCreator,
959-
)
960-
961-
# Job creator, based on a parameterized openEO process
962-
# (specified by the remote process definition at given URL)
963-
# which has, say, parameters "start_date" and "bands" for example.
964-
job_starter = ProcessBasedJobCreator(
965-
namespace="https://example.com/my_process.json",
966-
parameter_defaults={
967-
# Default value for the "bands" parameter
968-
# (to be used when not available in the dataframe)
969-
"bands": ["B02", "B03"],
970-
},
971-
)
972-
973-
# Initialize job database from a dataframe,
974-
# with desired parameter values to fill in.
975-
df = pd.DataFrame({
976-
"start_date": ["2021-01-01", "2021-02-01", "2021-03-01"],
977-
...
978-
})
979-
job_db = create_job_db("jobs.csv").initialize_from_df(df)
980-
981-
# Create and run job manager
982-
job_manager = MultiBackendJobManager(...)
983-
job_manager.run_jobs(job_db=job_db, start_job=job_starter)
984-
985-
The factory will take care of filling in the process parameters
986-
based on matching column names in the dataframe from the job database
987-
(like "start_date" in the example above).
988-
989-
This intuitive name-based matching should cover most use cases,
990-
but for some more advanced use cases, there are additional options
991-
to provide overrides and fallbacks:
955+
Process parameters are linked to dataframe columns by name.
956+
While this intuitive name-based matching should cover most use cases,
957+
there are additional options for overrides or fallbacks:
992958
993959
- When provided, ``parameter_column_map`` will be consulted
994960
for resolving a process parameter name (key in the dictionary)
@@ -1010,6 +976,7 @@ class ProcessBasedJobCreator:
1010976
- Finally if no (default) value can be determined and the parameter
1011977
is not flagged as optional, an error will be raised.
1012978
979+
1013980
:param process_id: (optional) openEO process identifier.
1014981
Can be omitted when working with a remote process definition
1015982
that is fully defined with a URL in the ``namespace`` parameter.
@@ -1024,6 +991,12 @@ class ProcessBasedJobCreator:
1024991
to dataframe column names as value.
1025992
1026993
.. versionadded:: 0.33.0
994+
995+
.. warning::
996+
This is an experimental API subject to change,
997+
and we greatly welcome
998+
`feedback and suggestions for improvement <https://github.com/Open-EO/openeo-python-client/issues>`_.
999+
10271000
"""
10281001
def __init__(
10291002
self,

0 commit comments

Comments
 (0)