Skip to content

[ENH] TransformedDistribution handling of index passing in subsetting #617

@fkiraly

Description

@fkiraly

I noticed a few problems with TransformedDistribution subsetting, and it is an issue in-principle.

Comments on how to handle this would be appreciated.

Namely, consider a univariate td = TransformedDistribution(distr, trafo, inv_trafo).

Problem 1 - columns forgotten in scalar subsetting

Let's say trafo requires a specific column name to work.

If we do td_subset = td.iat[0,0], this produces a column-less scalar distribution.
However, whenever we pass data to trafo or inv_trafo, it will require the column name, which is no longer stored in td_subset in the current implementation.

It seems we need to remember the column name when we subset to scalar.

Problem 2 - multivariate transformations

Let's now consider td = TransformedDistribution(distr, trafo, inv_trafo) of shape (3, 2), i.e., two columns.

If we subset to a single column, applying trafo or inv_trafo to (3,1) objects no longer work, since both will require an (n, 2) object as per sklearn contract.

So, in the multivariate transformation case it is even worse, we seem to require a copied memory of the entire original distribution.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions