Skip to content

Conversation

@mesejo
Copy link
Contributor

@mesejo mesejo commented Nov 17, 2025

Description of changes

Reported in #11726, this PR set the base changes for FirstValue and LastValue. Namely:

  1. Adding the include_null field to both classes
  2. Updating the function first_to_firstvalue to account for the new field
  3. Updating the base compiler to ignore the field (for those backends that do not support IGNORE / RESPECT NULLS)
  4. Implement IGNORE NULLS for the DuckDB backend

Once this change gets merged, the idea is to add the implementation of at least the following backends (progressively):

Issues closed

N/A

@github-actions github-actions bot added tests Issues or PRs related to tests sql Backends that generate SQL labels Nov 17, 2025
@mesejo mesejo marked this pull request as draft November 17, 2025 11:40
@mesejo mesejo force-pushed the fix/implement_ignore_nulls_last_first_value_duckdb branch 4 times, most recently from b1ed9f6 to c14cdbb Compare November 17, 2025 21:43
@mesejo mesejo marked this pull request as ready for review November 17, 2025 22:13
@mesejo mesejo force-pushed the fix/implement_ignore_nulls_last_first_value_duckdb branch from c14cdbb to c69158e Compare November 17, 2025 22:47
op.name, table=table.alias_or_name, quoted=self.quoted, copy=False
)

def visit_FirstLastValue(self, op, *, arg, include_null):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer more verbose but less tricky, and just list them separately.

Also, currently handling include_null is opt-in: the base compiler silently ignores that arg, so individual backends like duckdb need to implement it themselves. This leads to the risk that a backend that DOES handle IGNORE NULL will forget to do it (eg pyspark, as in this PR?) and we will get incorrect behavior.

I would prefer to handle the include_null at the SqlGlotCompiler level and then deriving compilers to opt-out. Then users will get an error, instead of the incorrect result, which I think is a better outcome.

So can you change this bit to this:

    def visit_FirstValue(self, op, *, arg, include_null):
        if include_null:
            return sge.RespectNulls(this=self.f.first_value(arg))
        else:
            return sge.IgnoreNulls(this=self.f.first_value(arg))

    def visit_LastValue(self, op, *, arg, include_null):
        if include_null:
            return sge.RespectNulls(this=self.f.last_value(arg))
        else:
            return sge.IgnoreNulls(this=self.f.last_value(arg))

and then I'll put some other comments in the other files for the related changes

@NickCrews
Copy link
Contributor

See #11311, where I tried to implement this.

I'm not sure which version we should abandon and which we should pursue, but in general I think

  • We should error during compile time if we try to use a value of include_null that is unsupported on the backend. eg sqlite doesn't support include_null=False. We should error, instead of returning the incorrect result.
  • We should make it so that any new implementing backends are forced to think about handling the include_null argument, if they forget to handle it then it should error instead silently giving the wrong result.
  • I don't think we should implement this only for duckdb in this PR and then fix for other backends in followups, I think we should do it all atomically in a single PR/commit

This PR has inspired me to resurrect that PR, once that test is passing we can have two options to choose between

@NickCrews
Copy link
Contributor

NickCrews commented Nov 20, 2025

Also related, I'm not sure if this unsupported error should get pushed down in sqlglot: tobymao/sqlglot#6376

and then possibly surfaced in ibis automatically: #11772

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql Backends that generate SQL tests Issues or PRs related to tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants