-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Labels
Milestone
Description
@JeroenVerstraelen is working on an implementation of CatBoost base ML in the VITO backend and while discussing details a couple of things came up:
- the
predict_catboostprocess would be practically identical topredict_random_forest, except for some textual differences in title and descriptions. Turns out that it is not really necessary to define a dedicatedpredict_process for each kind of machine learning model: all the model details are embedded in theml-modelobject and you could just use a singlepredict(data: array, model: ml-model)for all kinds of ML models. - for some use cases we want to predict the probability of each class instead of a single class prediction. We first considered adding a parameter to toggle between class output or probabilities output, but that would mean that the output type would change: scalar for class prediction and array for probability prediction. Moreover, the former has to be used in
reduce_dimensionand the other inapply_dimension. It felt error prone and confusing to let these two different patterns depend on a rather inconspicuous boolean parameter. It might be better to have a separate processes for class prediction and probabilities prediction
So with this background, the proposal is to introduce two generic ml prediction processes:
predict_class(data: array, model: ml-model) -> numberpredict_probabilities(data: array, model: ml-model) -> array
both can be easily spec'ed based on current https://github.com/Open-EO/openeo-processes/blob/draft/proposals/predict_random_forest.json