[SPARK-54216] Fix cache refresh for DataSource V2 tables with immutable Table instances #52914
+573
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This commit fixes cache refresh operations (recacheByPlan and recacheTableOrView) to properly handle DataSource V2 tables that use immutable Table instances.
Changes:
The fix ensures that when cache refresh is triggered, the fresh plan (with updated table metadata/snapshot) is used for both re-execution and updating the cached plan, rather than re-executing the old cached plan which would contain stale data.
Note: V1 tables use mutable file indexes that implicitly refresh when queried, so re-executing the old plan picks up new files. V2 tables use immutable Table instances that capture a specific snapshot at resolution time, so re-executing the old plan reads the same old snapshot.
Why are the changes needed?
These changes are needed to fix cache refresh for DataSource V2 tables. Currently, when a V2 table is modified and cache refresh is triggered, the cache manager re-executes the old cached plan which contains an immutable Table instance pointing to the previous table snapshot. This results in stale data being re-cached instead of fresh data. The fix ensures that a freshly resolved plan with updated table metadata is used for cache refresh, allowing queries to correctly read the latest data after table modifications.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New test suite
Was this patch authored or co-authored using generative AI tooling?
No