-
Notifications
You must be signed in to change notification settings - Fork 58
Description
I noticed a few problems with TransformedDistribution subsetting, and it is an issue in-principle.
Comments on how to handle this would be appreciated.
Namely, consider a univariate td = TransformedDistribution(distr, trafo, inv_trafo).
Problem 1 - columns forgotten in scalar subsetting
Let's say trafo requires a specific column name to work.
If we do td_subset = td.iat[0,0], this produces a column-less scalar distribution.
However, whenever we pass data to trafo or inv_trafo, it will require the column name, which is no longer stored in td_subset in the current implementation.
It seems we need to remember the column name when we subset to scalar.
Problem 2 - multivariate transformations
Let's now consider td = TransformedDistribution(distr, trafo, inv_trafo) of shape (3, 2), i.e., two columns.
If we subset to a single column, applying trafo or inv_trafo to (3,1) objects no longer work, since both will require an (n, 2) object as per sklearn contract.
So, in the multivariate transformation case it is even worse, we seem to require a copied memory of the entire original distribution.