ENH: Make `DataFrame` generic #1566

cmp0xff · 2025-12-18T09:26:01Z

Towards #1548

I don't know how to make setter work so that the typings can be updated when one assigns index or columns.
The two overlapping overloads of __getitem__ in _LocIndexerFrame need to be fixed

Comments and suggestions are welcomed.

loicdiridollou

I am not sure I see the whole idea about this PR, also I would have expected more breakages in the code, like for transpose method it should flip the index and columns

loicdiridollou · 2025-12-18T12:59:26Z

pandas-stubs/core/frame.pyi

+    @overload
+    def __new__(
+        cls,
+        data: DataFrame[IndexT0, IndexStrT0],


Not totally sure if this is gonna break a lot of things, by default DataFrame will have RangeIndex as index and columns, isn't this gonna change that?

This is copying another DataFrame, I suppose, so nothing is changed. Will add tests later.

If you create a DataFrame from data without an index, RangeIndex will surely take over. That takes further overloads of __new__.

As for now I just want to show a prototype of what can happen.

loicdiridollou · 2025-12-18T13:00:22Z

pandas-stubs/_typing.pyi


+if TYPE_CHECKING:  # noqa: PYI002
+    IndexT0 = TypeVar("IndexT0", bound=Index, default=Index)
+    IndexStrT0 = TypeVar("IndexStrT0", bound=Index, default=Index[str])


Shouldn't the default be RangeIndex here?

For df.columns we somehow have a preference for it being Index[str], see def columns(self) -> Index[str]: ... in the old code. That's the reason.

cmp0xff

transpose method it should flip the index and columns

Can certainly do, if the whole idea is viable.

More importantly, we need to figure out if _LocIndexFrame can be fixed at all, before continuing.

Last time when I was trying to add the backend type variable to Series, too many things crashed so that I could not continue. This time it seems more controllable for now.

cmp0xff · 2025-12-18T14:05:15Z

pandas-stubs/core/frame.pyi

+    @overload
+    def __new__(
+        cls,
+        data: DataFrame[IndexT0, IndexStrT0],


This is copying another DataFrame, I suppose, so nothing is changed. Will add tests later.

cmp0xff · 2025-12-18T14:06:03Z

pandas-stubs/_typing.pyi


+if TYPE_CHECKING:  # noqa: PYI002
+    IndexT0 = TypeVar("IndexT0", bound=Index, default=Index)
+    IndexStrT0 = TypeVar("IndexStrT0", bound=Index, default=Index[str])


For df.columns we somehow have a preference for it being Index[str], see def columns(self) -> Index[str]: ... in the old code. That's the reason.

cmp0xff · 2025-12-18T14:08:33Z

pandas-stubs/core/frame.pyi

+    @overload
+    def __new__(
+        cls,
+        data: DataFrame[IndexT0, IndexStrT0],


If you create a DataFrame from data without an index, RangeIndex will surely take over. That takes further overloads of __new__.

As for now I just want to show a prototype of what can happen.

Dr-Irv · 2025-12-18T19:56:03Z

I am not sure I see the whole idea about this PR

I'm in agreement. If this were to work, what would be the benefit?

Note that DatatFrame.columns returns Index[str] since most people use strings to name their columns. Technically, they could be any Hashable, or even a MultiIndex However, if we made columns be Index[Any] then for the majority of people, they'd have to do cast to get the actual string names.

cmp0xff · 2025-12-19T16:48:58Z

Benefits

Aside from paving the way to resolving #1548, we have the following issue:

import pandas as pd
from typing_extensions import reveal_type

df = pd.DataFrame(
    {"a": [1, 2, 1, 2], "b": [1, 1, 2, 2], "c": [1, 2, 3, 4], "d": [5, 6, 7, 8]},
)

pivoted = df.pivot_table(["c", "d"], "a", "b")

reveal_type(pivoted["c"])  # pyright gives Series[Any], runtime is DataFrame
reveal_type(pivoted.columns)  # pyright gives Index[str], runtime is MultiIndex

Default typing for `.columns`

I agree it should be Index[str]

Default typing for `.index`

I think RangeIndex might be too radical. Keeping Index[Any] means keeping the current behaviour.

We can give RangeIndex in the following (but not exclusively) cases:

when creating a DataFrame from unindex objects without specifying index
when ignore_index=True
when .reset_index()

make dataframe generic

c8d80ef

cmp0xff force-pushed the feature/make-dataframe-generic branch from 6c3b2d5 to c8d80ef Compare December 18, 2025 09:26

loicdiridollou requested changes Dec 18, 2025

View reviewed changes

cmp0xff commented Dec 18, 2025

View reviewed changes

cmp0xff marked this pull request as draft December 18, 2025 16:13

Uh oh!

ENH: Make DataFrame generic #1566

Are you sure you want to change the base?

ENH: Make DataFrame generic #1566

Conversation

cmp0xff commented Dec 18, 2025

Uh oh!

loicdiridollou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmp0xff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dr-Irv commented Dec 18, 2025

Uh oh!

cmp0xff commented Dec 19, 2025

Benefits

Default typing for .columns

Default typing for .index

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ENH: Make `DataFrame` generic #1566

ENH: Make `DataFrame` generic #1566

Default typing for `.columns`

Default typing for `.index`