-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Improve get_smallest_cron_interval
#32626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
get_smallest_cron_interval
| assert not is_valid_cron_string("0 0 32 1 *") | ||
|
|
||
|
|
||
| def test_get_smallest_cron_interval_basic(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved these tests to a dedicated test file and consolidated with other tests
|
what bad thing happens if we remove the sampling-based approach entirely? |
|
With the current implementation, we would lose the ability to accurately determine the smallest cron interval for more complex cron strings such as i can try to have claude improve the approach so it can handle more of these cases |
|
what would the impact of that be in practice though? this is just to do some validation when the freshness policy is created right, do we need that to be there for all cron strings? |
|
or could it happen later when the cronstring is actually evaluated maybe? |
|
Right now dagster/python_modules/dagster/dagster/_core/definitions/freshness.py Lines 171 to 172 in 5a786ba
specifically the validation is there to make sure we allow the user to set the cron freshness policy's |
|
we could definitely try a conservative "best guess" approach for the more complex cases - so for a cron string with time intervals of varying length, we make a conservative guess at the smallest time interval. it's not perfect but probably good enough for this use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think aiming to get the sampling path out entirely would be a good idea. The main thing that troubles me about it is that it is called every time one of these objects is instantiated, including, for example, when we pull one out of the database when generating the asset graph in the daemon, for example. The fact that its non-deterministic is also a little troubling. For the first thing, if there was some way of distiniguishing between the codepaht when they are defined in code and the codepath when they are pulled out of the database that could also work.
|
In I can move the |

Summary & Motivation
Makes calculating the smallest time interval in a cron string deterministic for most common/simple cron strings. For more complex crons we fall back to the old sampling-based approach.
Also adds a guard against negative cron intervals to the sampling-based approach that can occur when the clocks roll back in a DST transition.
How I Tested These Changes
Lots of unit tests. Claude Code did a lot of heavy lifting here.
Changelog