-
-
Notifications
You must be signed in to change notification settings - Fork 157
ENH: Make DataFrame generic
#1566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6c3b2d5 to
c8d80ef
Compare
loicdiridollou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I see the whole idea about this PR, also I would have expected more breakages in the code, like for transpose method it should flip the index and columns
| @overload | ||
| def __new__( | ||
| cls, | ||
| data: DataFrame[IndexT0, IndexStrT0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally sure if this is gonna break a lot of things, by default DataFrame will have RangeIndex as index and columns, isn't this gonna change that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copying another DataFrame, I suppose, so nothing is changed. Will add tests later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you create a DataFrame from data without an index, RangeIndex will surely take over. That takes further overloads of __new__.
As for now I just want to show a prototype of what can happen.
|
|
||
| if TYPE_CHECKING: # noqa: PYI002 | ||
| IndexT0 = TypeVar("IndexT0", bound=Index, default=Index) | ||
| IndexStrT0 = TypeVar("IndexStrT0", bound=Index, default=Index[str]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the default be RangeIndex here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For df.columns we somehow have a preference for it being Index[str], see def columns(self) -> Index[str]: ... in the old code. That's the reason.
cmp0xff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transpose method it should flip the index and columns
Can certainly do, if the whole idea is viable.
More importantly, we need to figure out if _LocIndexFrame can be fixed at all, before continuing.
Last time when I was trying to add the backend type variable to Series, too many things crashed so that I could not continue. This time it seems more controllable for now.
| @overload | ||
| def __new__( | ||
| cls, | ||
| data: DataFrame[IndexT0, IndexStrT0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copying another DataFrame, I suppose, so nothing is changed. Will add tests later.
|
|
||
| if TYPE_CHECKING: # noqa: PYI002 | ||
| IndexT0 = TypeVar("IndexT0", bound=Index, default=Index) | ||
| IndexStrT0 = TypeVar("IndexStrT0", bound=Index, default=Index[str]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For df.columns we somehow have a preference for it being Index[str], see def columns(self) -> Index[str]: ... in the old code. That's the reason.
| @overload | ||
| def __new__( | ||
| cls, | ||
| data: DataFrame[IndexT0, IndexStrT0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you create a DataFrame from data without an index, RangeIndex will surely take over. That takes further overloads of __new__.
As for now I just want to show a prototype of what can happen.
I'm in agreement. If this were to work, what would be the benefit? Note that |
BenefitsAside from paving the way to resolving #1548, we have the following issue: import pandas as pd
from typing_extensions import reveal_type
df = pd.DataFrame(
{"a": [1, 2, 1, 2], "b": [1, 1, 2, 2], "c": [1, 2, 3, 4], "d": [5, 6, 7, 8]},
)
pivoted = df.pivot_table(["c", "d"], "a", "b")
reveal_type(pivoted["c"]) # pyright gives Series[Any], runtime is DataFrame
reveal_type(pivoted.columns) # pyright gives Index[str], runtime is MultiIndexDefault typing for
|
Towards #1548
setterwork so that the typings can be updated when one assignsindexorcolumns.__getitem__in_LocIndexerFrameneed to be fixedComments and suggestions are welcomed.