Skip to content

Commit dcc7119

Browse files
committed
Merge branch 'issue604-udp-based-job-manager'
2 parents 40af9cd + 7bf73de commit dcc7119

File tree

10 files changed

+1382
-62
lines changed

10 files changed

+1382
-62
lines changed

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
- `MultiBackendJobManager`: Added `initialize_from_df(df)` (to `CsvJobDatabase` and `ParquetJobDatabase`) to initialize (and persist) the job database from a given DataFrame.
1414
Also added `create_job_db()` factory to easily create a job database from a given dataframe and its type guessed from filename extension.
1515
([#635](https://github.com/Open-EO/openeo-python-client/issues/635))
16-
17-
16+
- `MultiBackendJobManager.run_jobs()` now returns a dictionary with counters/stats about various events during the full run of the job manager ([#645](https://github.com/Open-EO/openeo-python-client/issues/645))
17+
- Added (experimental) `ProcessBasedJobCreator` to be used as `start_job` callable with `MultiBackendJobManager` to create multiple jobs from a single parameterized process (e.g. a UDP or remote process definition) ([#604](https://github.com/Open-EO/openeo-python-client/issues/604))
1818

1919
### Changed
2020

docs/cookbook/job_manager.rst

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
Multi Backend Job Manager
33
====================================
44

5+
API
6+
===
7+
58
.. warning::
69
This is a new experimental API, subject to change.
710

@@ -14,3 +17,106 @@ Multi Backend Job Manager
1417
.. autoclass:: openeo.extra.job_management.CsvJobDatabase
1518

1619
.. autoclass:: openeo.extra.job_management.ParquetJobDatabase
20+
21+
22+
.. autoclass:: openeo.extra.job_management.ProcessBasedJobCreator
23+
:members:
24+
:special-members: __call__
25+
26+
27+
.. _job-management-with-process-based-job-creator:
28+
29+
Job creation based on parameterized processes
30+
===============================================
31+
32+
The openEO API supports parameterized processes out of the box,
33+
which allows to work with flexible, reusable openEO building blocks
34+
in the form of :ref:`user-defined processes <user-defined-processes>`
35+
or `remote openEO process definitions <https://github.com/Open-EO/openeo-api/tree/draft/extensions/remote-process-definition>`_.
36+
This can also be leveraged for job creation in the context of the
37+
:py:class:`~openeo.extra.job_management.MultiBackendJobManager`:
38+
define a "template" job as a parameterized process
39+
and let the job manager fill in the parameters
40+
from a given data frame.
41+
42+
The :py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` helper class
43+
allows to do exactly that.
44+
Given a reference to a parameterized process,
45+
such as a user-defined process or remote process definition,
46+
it can be used directly as ``start_job`` callable to
47+
:py:meth:`~openeo.extra.job_management.MultiBackendJobManager.run_jobs`
48+
which will fill in the process parameters from the dataframe.
49+
50+
Basic :py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` example
51+
-----------------------------------------------------------------------------
52+
53+
Basic usage example with a remote process definition:
54+
55+
.. code-block:: python
56+
:linenos:
57+
:caption: Basic :py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` example snippet
58+
:emphasize-lines: 10-15, 28
59+
60+
from openeo.extra.job_management import (
61+
MultiBackendJobManager,
62+
create_job_db,
63+
ProcessBasedJobCreator,
64+
)
65+
66+
# Job creator, based on a parameterized openEO process
67+
# (specified by the remote process definition at given URL)
68+
# which has parameters "start_date" and "bands" for example.
69+
job_starter = ProcessBasedJobCreator(
70+
namespace="https://example.com/my_process.json",
71+
parameter_defaults={
72+
"bands": ["B02", "B03"],
73+
},
74+
)
75+
76+
# Initialize job database from a dataframe,
77+
# with desired parameter values to fill in.
78+
df = pd.DataFrame({
79+
"start_date": ["2021-01-01", "2021-02-01", "2021-03-01"],
80+
})
81+
job_db = create_job_db("jobs.csv").initialize_from_df(df)
82+
83+
# Create and run job manager,
84+
# which will start a job for each of the `start_date` values in the dataframe
85+
# and use the default band list ["B02", "B03"] for the "bands" parameter.
86+
job_manager = MultiBackendJobManager(...)
87+
job_manager.run_jobs(job_db=job_db, start_job=job_starter)
88+
89+
In this example, a :py:class:`ProcessBasedJobCreator` is instantiated
90+
based on a remote process definition,
91+
which has parameters ``start_date`` and ``bands``.
92+
When passed to :py:meth:`~openeo.extra.job_management.MultiBackendJobManager.run_jobs`,
93+
a job for each row in the dataframe will be created,
94+
with parameter values based on matching columns in the dataframe:
95+
96+
- the ``start_date`` parameter will be filled in
97+
with the values from the "start_date" column of the dataframe,
98+
- the ``bands`` parameter has no corresponding column in the dataframe,
99+
and will get its value from the default specified in the ``parameter_defaults`` argument.
100+
101+
102+
:py:class:`~openeo.extra.job_management.ProcessBasedJobCreator` with geometry handling
103+
---------------------------------------------------------------------------------------------
104+
105+
Apart from the intuitive name-based parameter-column linking,
106+
:py:class:`~openeo.extra.job_management.ProcessBasedJobCreator`
107+
also automatically links:
108+
109+
- a process parameters that accepts inline GeoJSON geometries/features
110+
(which practically means it has a schema like ``{"type": "object", "subtype": "geojson"}``,
111+
as produced by :py:meth:`Parameter.geojson <openeo.api.process.Parameter.geojson>`).
112+
- with the geometry column in a `GeoPandas <https://geopandas.org/>`_ dataframe.
113+
114+
even if the name of the parameter does not exactly match
115+
the name of the GeoPandas geometry column (``geometry`` by default).
116+
This automatic liking is only done if there is only one
117+
GeoJSON parameter and one geometry column in the dataframe.
118+
119+
120+
.. admonition:: to do
121+
122+
Add example with geometry handling.

docs/rst-cheatsheet.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,15 @@ More explicit code block with language hint (and no need for double colon)
5050
>>> 3 + 5
5151
8
5252
53+
Code block with additional features (line numbers, caption, highlighted lines,
54+
for more see https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-code-block)
55+
56+
.. code-block:: python
57+
:linenos:
58+
:caption: how to say hello
59+
:emphasize-lines: 1
60+
61+
print("hello world")
5362
5463
5564
References:
@@ -60,4 +69,8 @@ References:
6069

6170
- refer to the reference with::
6271

63-
:ref:`target`
72+
:ref:`target` or :ref:`custom text <target>`
73+
74+
- inline URL references::
75+
76+
`Python <https://www.python.org/>`_

0 commit comments

Comments
 (0)