Skip to content

Commit 41d1345

Browse files
dennisbaderjonasblancjonasblanc
authored
Fix/categorical comp specific lags (#2852)
* Add categorical cov support to XGBoost, CatBoost * Add type check for cat features, refactor cat indices logic * Split cat. comp. validation logic and test it * Support categorical cov created via an encoder * Validate categorical features * Support categorical features for HistGradientBoostingRegressor * Fix typos * Apply suggestions, limit cat cov support to LightGBM and CatBoost * Update changelog and doc * Fix typo in TS doc * Rebase cat forecasting PR, on cat covariates PR * Speed up tests by limiting lgbm and catboost depth and iterations * Extend test categorical target * Add categorical cov support to XGBoost, CatBoost * Fix typo in TS doc * Rebase cat forecasting PR, on cat covariates PR * Speed up tests by limiting lgbm and catboost depth and iterations * Extend test categorical target * Add classification accuracy metric * Fix master rebase * Fix typo rebase * Keep categorical metrics for separate PR * Add categorical forecasting models to module __init__ * Refactor MutliOutput to support MultiOutputClassifier, wip * Move _forecasting_type into CategoricalForecastingMixin * Further refactoring of multioutput wrapper * Implement ClassProbabilityLikelihood to forecast categorical probabilities * Add support for ClassProbabilityLikelihood in XGB, CatBoost and LGB * Reorder functions * Create categorical forecasting likelihood specific tests * Add docstring to CatBoostCategoricalModel * Add LightGBMCategoricalModel to model module init * Allow likelihoodType in _check_likelihood * Update doc * Update ClassProbability name from class_probability to classprobability * Remove default model, update doc * Rename categorical forecasting to classification forecasting * Set ClassProbabilityLikelihood as default for all classifiers models * Update darts/models/forecasting/regression_model.py Co-authored-by: Dennis Bader <[email protected]> * Update darts/utils/multioutput.py Co-authored-by: Dennis Bader <[email protected]> * Addresses review suggestions * Move ClassProbabilityLikelihood to sklearn likelihood * Extends classes_ test to multi-output * Expose likelihood in classifiers constructor * Bump test env to macos-14 * Fix test * Add multioutput validation test * Fix categorical validation features * Extend multi-ouput tests * fix merge conflicts * Extend likelihood tests * Merge _check_likelihood into _get_likelihood * Address suggestions * Rename .classes_ to .class_labels, fix tests * Check estimators for same component have same labels * Improve ClassProbabilityLikelihood robustness to input format * Add input format to tests * Extend test case for multioutput wrapper * Improve ClassProbabilitiy robustness to TS formats * Move and refactor classes in CLassifierMixin * Refactor internal class proba representation * Extend test to component names and warnings * Fix test randomness * Extend class probabilites checks to multivariate/mulitseries * Unify CatBoostClassifier prediction shape * Improve robustness of multi sample prediction * Refactor probabilistic tests * Return self on fit * Test ClassProbability for reproducible output * Update changelog * Test edge case multioutput/likelihood * Address small suggestions from review * Optimize likelihood sampling * update class probability likelihood component names * Fix sample, add tests * Apply minor suggestions * Fix lint * Fix merge * remove random state params * udpate tests * minor fixes * add example notebook * add first backtesting tests * fix issue with categorical lagged feature extraction for compenent specific lags * remove examples * last updates * update changelog --------- Co-authored-by: jonasblanc <[email protected]> Co-authored-by: Jonas Blanc <[email protected]>
1 parent cf5fa83 commit 41d1345

File tree

3 files changed

+132
-42
lines changed

3 files changed

+132
-42
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
2020
**Fixed**
2121

2222
- Fixed a bug in `SKLearnModel.get_estimator()` for univariate quantile models that use `multi_models=False` , where using `quantile` did not return the correct fitted quantile model / estimator. [#2838](https://github.com/unit8co/darts/pull/2838) by [Dennis Bader](https://github.com/dennisbader).
23+
- Fixed a bug in `LightGBMModel` and `CatBoostModel` when using component-specific lags and categorical features, where certain lag scenarios could result in incorrect categorical feature declaration. [#2852](https://github.com/unit8co/darts/pull/2852) by [Dennis Bader](https://github.com/dennisbader).
2324

2425
**Dependencies**
2526

darts/models/forecasting/sklearn_model.py

Lines changed: 75 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -457,6 +457,75 @@ def _get_lags(self, lags_type: str):
457457
else:
458458
return self.lags.get(lags_type, None)
459459

460+
def _get_lagged_features(
461+
self,
462+
series: TimeSeries,
463+
past_covariates: Optional[TimeSeries],
464+
future_covariates: Optional[TimeSeries],
465+
) -> list[list[tuple[str, str, int]]]:
466+
"""Returns a list of lagged features for the target, past, future, and static covariates.
467+
468+
The returned lagged features are in the same order as the features passed to the underlying model.
469+
"""
470+
feature_list = []
471+
for series_, lags_name, feature_type, series_name in zip(
472+
[series, past_covariates, future_covariates],
473+
["target", "past", "future"],
474+
["target", "past_cov", "fut_cov"],
475+
["series", "past_covariates", "future_covariates"],
476+
):
477+
series_ = get_single_series(series_)
478+
lags = self._get_lags(lags_name)
479+
if lags is None:
480+
feature_list.append([])
481+
continue
482+
483+
if series_ is None:
484+
raise_log(
485+
ValueError(
486+
f"`{series_name}` cannot be `None` when lags are specified for it."
487+
),
488+
logger=logger,
489+
)
490+
491+
is_comp_specific = self.component_lags.get(lags_name) is not None
492+
series_comps = series_.components.tolist()
493+
component_lags = {}
494+
495+
# create a mapping of lags to components (components are in the same order as in the series)
496+
if is_comp_specific:
497+
# component-specific lags are specific to each component
498+
for comp_name in series_comps:
499+
comp_lags = lags[comp_name]
500+
for lag in comp_lags:
501+
if lag not in component_lags:
502+
component_lags[lag] = [comp_name]
503+
else:
504+
component_lags[lag].append(comp_name)
505+
else:
506+
# global lags are identical for each component
507+
for lag in lags:
508+
component_lags[lag] = series_comps
509+
510+
# create the feature list (in order of increasing lag)
511+
lags_sorted = sorted(component_lags.keys())
512+
feature_list.append([
513+
(feature_type, component, lag)
514+
for lag in lags_sorted
515+
for component in component_lags[lag]
516+
])
517+
518+
# add static covariates at the end
519+
target_ts = get_single_series(series)
520+
if target_ts.has_static_covariates:
521+
feature_list.append([
522+
("static_cov", component, 0)
523+
for component in list(target_ts.static_covariates.columns)
524+
])
525+
else:
526+
feature_list.append([])
527+
return feature_list
528+
460529
@property
461530
def _model_encoder_settings(
462531
self,
@@ -1723,36 +1792,13 @@ def _get_categorical_features(
17231792
if sum(len(cat_cov) for cat_cov in categorical_covariates) == 0:
17241793
return [], []
17251794

1726-
past_covs_ts = get_single_series(past_covariates)
1727-
fut_covs_ts = get_single_series(future_covariates)
1728-
1729-
feature_list = [
1730-
[
1731-
("target", component, lag)
1732-
for lag in self.lags.get("target", [])
1733-
for component in target_ts.components
1734-
],
1735-
[
1736-
("past_cov", component, lag)
1737-
for lag in self.lags.get("past", [])
1738-
for component in past_covs_ts.components
1739-
],
1740-
[
1741-
("fut_cov", component, lag)
1742-
for lag in self.lags.get("future", [])
1743-
for component in fut_covs_ts.components
1744-
],
1745-
(
1746-
[
1747-
("static_cov", component, 0)
1748-
for component in list(target_ts.static_covariates.columns)
1749-
]
1750-
if target_ts.has_static_covariates
1751-
else []
1752-
),
1753-
]
1754-
17551795
# keep track of feature list index to refer to the columns indices
1796+
feature_list = self._get_lagged_features(
1797+
series=series,
1798+
past_covariates=past_covariates,
1799+
future_covariates=future_covariates,
1800+
)
1801+
17561802
index = 0
17571803
indices = []
17581804
col_names = []

darts/tests/models/forecasting/test_sklearn_models.py

Lines changed: 56 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3678,27 +3678,74 @@ def test_fit_with_categorical_features_raises_error(self, model_config):
36783678
not lgbm_available and not cb_available, reason="requires lightgbm or catboost"
36793679
)
36803680
@pytest.mark.parametrize(
3681-
"model_cls",
3682-
([CatBoostModel] if cb_available else [])
3683-
+ ([LightGBMModel] if lgbm_available else []),
3681+
"config",
3682+
product(
3683+
([(CatBoostModel, cb_test_params)] if cb_available else [])
3684+
+ ([(LightGBMModel, lgbm_test_params)] if lgbm_available else []),
3685+
[
3686+
(
3687+
1,
3688+
[1],
3689+
[2, 3, 5],
3690+
[
3691+
"past_cov_past_cov_cat_dummy_lag-1",
3692+
"fut_cov_fut_cov_promo_mechanism_lag1",
3693+
"static_cov_product_id_lag0",
3694+
],
3695+
),
3696+
(
3697+
{"default_lags": [-3, -2, -1], "past_cov_cat_dummy": [-3, -1]},
3698+
[1],
3699+
[2, 5, 6, 8],
3700+
[
3701+
"past_cov_past_cov_cat_dummy_lag-3",
3702+
"past_cov_past_cov_cat_dummy_lag-1",
3703+
"fut_cov_fut_cov_promo_mechanism_lag1",
3704+
"static_cov_product_id_lag0",
3705+
],
3706+
),
3707+
(
3708+
1,
3709+
{"default_lags": [0, 1, 2], "fut_cov_promo_mechanism": [0, 2]},
3710+
[2, 3, 6, 8],
3711+
[
3712+
"past_cov_past_cov_cat_dummy_lag-1",
3713+
"fut_cov_fut_cov_promo_mechanism_lag0",
3714+
"fut_cov_fut_cov_promo_mechanism_lag2",
3715+
"static_cov_product_id_lag0",
3716+
],
3717+
),
3718+
],
3719+
),
36843720
)
3685-
def test_get_categorical_features_helper(self, model_cls):
3721+
def test_get_categorical_features_helper(self, config):
36863722
"""Test helper function responsible for retrieving indices of categorical features"""
3687-
3723+
(
3724+
(model_cls, model_kwargs),
3725+
(lags_pc, lags_fc, indices_expected, f_names_expected),
3726+
) = config
36883727
model = model_cls(
36893728
lags=1,
3690-
lags_past_covariates=1,
3691-
lags_future_covariates=[1],
3729+
lags_past_covariates=lags_pc,
3730+
lags_future_covariates=lags_fc,
36923731
output_chunk_length=1,
36933732
categorical_future_covariates=["fut_cov_promo_mechanism"],
36943733
categorical_past_covariates=["past_cov_cat_dummy"],
36953734
categorical_static_covariates=["product_id"],
3735+
**model_kwargs,
36963736
)
36973737
(
36983738
series,
36993739
past_covariates,
37003740
future_covariates,
37013741
) = self.inputs_for_tests_categorical_covariates()
3742+
# fit the model first for component-specific lags
3743+
model.fit(
3744+
series=series,
3745+
past_covariates=past_covariates,
3746+
future_covariates=future_covariates,
3747+
)
3748+
37023749
(
37033750
indices,
37043751
column_names,
@@ -3707,12 +3754,8 @@ def test_get_categorical_features_helper(self, model_cls):
37073754
past_covariates=past_covariates,
37083755
future_covariates=future_covariates,
37093756
)
3710-
assert indices == [2, 3, 5]
3711-
assert column_names == [
3712-
"past_cov_past_cov_cat_dummy_lag-1",
3713-
"fut_cov_fut_cov_promo_mechanism_lag1",
3714-
"static_cov_product_id_lag0",
3715-
]
3757+
assert indices == indices_expected
3758+
assert column_names == f_names_expected
37163759

37173760
@pytest.mark.skipif(
37183761
not lgbm_available and not cb_available, reason="requires lightgbm or catboost"

0 commit comments

Comments
 (0)