Skip to content
This repository was archived by the owner on Sep 27, 2025. It is now read-only.
This repository was archived by the owner on Sep 27, 2025. It is now read-only.

Inconsistent COUNT(*) and GROUP BY behavior in Polars CLI #71

@kwkeefer

Description

@kwkeefer

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of the Polars CLI.

Reproducible example

# generate test.csv
cat<<EOF > test.csv
a
test
test
test2
test3
EOF

# run group by query
echo "SELECT COUNT(*) AS _count, a FROM read_csv('test.csv') GROUP BY a;" | polars

Output

┌────────┬───────┐
│ _count ┆ a     │
│ ---    ┆ ---   │
│ u32    ┆ str   │
╞════════╪═══════╡
│ 3      ┆ test2 │
│ 3      ┆ test3 │
│ 3      ┆ test  │
└────────┴───────┘

Issue description

COUNT(*) is seemingly counting all rows, instead of using the group by.

Expected behavior

import polars as pl

df = pl.read_csv('test.csv')

with pl.SQLContext(register_globals=True, eager=True) as ctx:
    df_small = ctx.execute("SELECT COUNT(*) AS _count, a FROM df GROUP BY a")
    print(df_small)
python3 polarstest.py
shape: (3, 2)
┌────────┬───────┐
│ _count ┆ a     │
│ ---    ┆ ---   │
│ u32    ┆ str   │
╞════════╪═══════╡
│ 2      ┆ test  │
│ 1      ┆ test3 │
│ 1      ┆ test2 │
└────────┴───────┘

Installed version

0.8.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions