[ENH] Add a "SupportScaler" or "SupportTransformer"

**Is your feature request related to a problem? Please describe.**
Yes! When trying to use an skpro GridSearch across distributions you can run into lots of support issues if your target variable is not scaled to the smallest support range. For example, if you use xgblss with target range (-inf, inf) and search across `["Normal", "Gamma"]`, the GridSearch will fail or return NaNs for the Gamma search. Scaling to the range of the of the smallest leads to less interpretable skpro/scipy distribution parameter for the normal distribution in this case.

**Describe the solution you'd like**
I have implemented a something locally that works rather nicely:

```python
# imports ...

full_reals = {
    "Laplace",
    "Logistic",
    "Normal",
    "SkewNormal",
    "TDistribution",
    "TruncatedNormal",
}


class SupportTransformer(TransformerMixin, BaseEstimator, OneToOneFeatureMixin):

    def __init__(self, dist, rtol):
        ...

    def _get_skpro_distr(self, distr):
        """Copied from xgblss code"""
        ...

    def _get_support(self):
        # logic to get scipy rvs which includes support
        # calls _get_skpro_distr
        return rvs.support(**sc_params)

    def fit(self, X, y=None):
        if self.dist in full_reals:
            # no fit needed
            return self

        self.support = self._get_support()

        # check if X is within support
        if any(
            [np.any(X.max() >= self.support[1]), np.any(X.min() <= self.support[0])]
        ):
            # some more implementation logic
            self.mms = MinMaxScaler((support_lower, support_upper))
            self.mms.fit(X)

            self.scale_ = mms.scale_

        return self

    def transform(self, X):
        if hasattr(self, "mms"):
            return self.mms.transform(X)
        else:
            return X

    def inverse_transform(self, X):
        if hasattr(self, "mms"):
            return self.mms.inverse_transform(X)
        else:
            return X
```

Usage is as follows:

```python
ttr = TransformedTargetRegressorProba(
    xgboostlss.XGBoostLSS(),
    SupportTransformer(),
)

param_distributions = [
    {"regressor__dist": "Normal", "transformer__dist": "Normal"},
    {"regressor__dist": "Gamma", "transformer__dist": "Gamma"},
]

rscv = GridSearchCV(
    estimator=ttr,
    param_grid=param_distributions,
    cv=cv,
    scoring=CRPS(),
    error_score='raise',
)

# for some -inf < y < inf
rscv.fit(X, y)
```

**Describe alternatives you've considered**
Horrible, horrible loops 😆 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Add a "SupportScaler" or "SupportTransformer" #588

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] Add a "SupportScaler" or "SupportTransformer" #588

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions