- 
                Notifications
    You must be signed in to change notification settings 
- Fork 58
Open
Labels
feature requestNew feature or requestNew feature or requestmodule:regressionprobabilistic regression moduleprobabilistic regression modulemodule:transformationstransformations module: feature extraction, pre-/post-processingtransformations module: feature extraction, pre-/post-processing
Description
Is your feature request related to a problem? Please describe.
Yes! When trying to use an skpro GridSearch across distributions you can run into lots of support issues if your target variable is not scaled to the smallest support range. For example, if you use xgblss with target range (-inf, inf) and search across ["Normal", "Gamma"], the GridSearch will fail or return NaNs for the Gamma search. Scaling to the range of the of the smallest leads to less interpretable skpro/scipy distribution parameter for the normal distribution in this case.
Describe the solution you'd like
I have implemented a something locally that works rather nicely:
# imports ...
full_reals = {
    "Laplace",
    "Logistic",
    "Normal",
    "SkewNormal",
    "TDistribution",
    "TruncatedNormal",
}
class SupportTransformer(TransformerMixin, BaseEstimator, OneToOneFeatureMixin):
    def __init__(self, dist, rtol):
        ...
    def _get_skpro_distr(self, distr):
        """Copied from xgblss code"""
        ...
    def _get_support(self):
        # logic to get scipy rvs which includes support
        # calls _get_skpro_distr
        return rvs.support(**sc_params)
    def fit(self, X, y=None):
        if self.dist in full_reals:
            # no fit needed
            return self
        self.support = self._get_support()
        # check if X is within support
        if any(
            [np.any(X.max() >= self.support[1]), np.any(X.min() <= self.support[0])]
        ):
            # some more implementation logic
            self.mms = MinMaxScaler((support_lower, support_upper))
            self.mms.fit(X)
            self.scale_ = mms.scale_
        return self
    def transform(self, X):
        if hasattr(self, "mms"):
            return self.mms.transform(X)
        else:
            return X
    def inverse_transform(self, X):
        if hasattr(self, "mms"):
            return self.mms.inverse_transform(X)
        else:
            return XUsage is as follows:
ttr = TransformedTargetRegressorProba(
    xgboostlss.XGBoostLSS(),
    SupportTransformer(),
)
param_distributions = [
    {"regressor__dist": "Normal", "transformer__dist": "Normal"},
    {"regressor__dist": "Gamma", "transformer__dist": "Gamma"},
]
rscv = GridSearchCV(
    estimator=ttr,
    param_grid=param_distributions,
    cv=cv,
    scoring=CRPS(),
    error_score='raise',
)
# for some -inf < y < inf
rscv.fit(X, y)Describe alternatives you've considered
Horrible, horrible loops 😆
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requestmodule:regressionprobabilistic regression moduleprobabilistic regression modulemodule:transformationstransformations module: feature extraction, pre-/post-processingtransformations module: feature extraction, pre-/post-processing