You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance OrdinalEncoder conversion to handle infrequent categories (#1195)
* Enhance OrdinalEncoder conversion to handle infrequent categories
- Added logic to check if infrequent categories are enabled in the OrdinalEncoder.
- Introduced handling for `infrequent_categories_` to adjust `values_int64s` accordingly.
- Updated conversion process to account for `max_categories` or `min_frequency` by modifying the attribute values for infrequent categories.
Signed-off-by: Danil Petrov <[email protected]>
* Refactor handling of infrequent categories in OrdinalEncoder conversion
- Replaced `current_infrequent_categories_` with `default_to_infrequent_mappings` for clarity.
- Updated logic to handle `default_to_infrequent_mappings` when encoding missing values.
- Simplified the assignment of `attrs["values_int64s"]` by using `default_to_infrequent_mappings` where applicable.
- Ensured consistent handling of `max_categories` or `min_frequency` scenarios.
Signed-off-by: Danil Petrov <[email protected]>
* Fix linting
Signed-off-by: Danil Petrov <[email protected]>
* Enable conversion of OrdinalEncoder with max_categories and min_frequency
- Added `max_categories_support` function to check scikit-learn version >= 1.3 for `max_categories` and `min_frequency` support in `OrdinalEncoder`.
- Updated tests to skip tests if `max_categories` and `min_frequency` are not supported.
- Added a check for `_infrequent_enabled` attribute before accessing it to ensure compatibility with older versions of scikit-learn.
Signed-off-by: Danil Petrov <[email protected]>
* Improve infrequent category handling and missing value encoding in OrdinalEncoder conversion
- Modified the condition for checking `_infrequent_enabled` to improve readability.
- Ensured correct concatenation of `encoded_missing_value` with `values_int64s` when `default_to_infrequent_mappings` is not None.
- Added a test case `SklearnOrdinalEncoderCatList` to verify the conversion of `OrdinalEncoder` with a list of categories.
- Updated `dump_data_and_model` call in `SklearnOrdinalEncoderCatList` test for better readability.
Signed-off-by: Danil Petrov <[email protected]>
---------
Signed-off-by: Danil Petrov <[email protected]>
Co-authored-by: Xavier Dupré <[email protected]>
0 commit comments