Skip to content

predict_class and predict_probabilities #368

@soxofaan

Description

@soxofaan

@JeroenVerstraelen is working on an implementation of CatBoost base ML in the VITO backend and while discussing details a couple of things came up:

  • the predict_catboost process would be practically identical to predict_random_forest, except for some textual differences in title and descriptions. Turns out that it is not really necessary to define a dedicated predict_ process for each kind of machine learning model: all the model details are embedded in the ml-model object and you could just use a single predict(data: array, model: ml-model) for all kinds of ML models.
  • for some use cases we want to predict the probability of each class instead of a single class prediction. We first considered adding a parameter to toggle between class output or probabilities output, but that would mean that the output type would change: scalar for class prediction and array for probability prediction. Moreover, the former has to be used in reduce_dimension and the other in apply_dimension. It felt error prone and confusing to let these two different patterns depend on a rather inconspicuous boolean parameter. It might be better to have a separate processes for class prediction and probabilities prediction

So with this background, the proposal is to introduce two generic ml prediction processes:

  • predict_class(data: array, model: ml-model) -> number
  • predict_probabilities(data: array, model: ml-model) -> array

both can be easily spec'ed based on current https://github.com/Open-EO/openeo-processes/blob/draft/proposals/predict_random_forest.json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions