-
Notifications
You must be signed in to change notification settings - Fork 6.1k
planner: optimize decorrelate by removing redundant LIMIT when join key is unique #64214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #64214 +/- ##
================================================
+ Coverage 72.7334% 73.3901% +0.6567%
================================================
Files 1859 1862 +3
Lines 503870 508936 +5066
================================================
+ Hits 366482 373509 +7027
+ Misses 115127 113214 -1913
+ Partials 22261 22213 -48
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
pkg/planner/core/rule_decorrelate.go
Outdated
| canRemove = true | ||
| } | ||
| } | ||
| } else if proj, ok := mChild.(*logicalop.LogicalProjection); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will this happen? Can you provide some cases for this situations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/planner/core/rule_decorrelate.go
Outdated
| apply.SetChildren(outerPlan, innerPlan) | ||
| return s.optimize(ctx, p, groupByColumn) | ||
| } else if m, ok := innerPlan.(*logicalop.LogicalMaxOneRow); ok { | ||
| // Check if MaxOneRow's child is Limit or TopN, and if we can remove it for LeftOuterJoin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this PR doesn't handle the TopN case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage, topn is still just a LIMIT, so it doesn’t matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that if the middle node is topN (limit + sort), it seems can not to decorrelate. Because sort will change the meaning? Could you add a test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the single-row result is determined by the join key, its final result isn’t affected by any ordering. I’ve already added an ORDER BY test case in TestDecorrelateLimitOptimization.
pkg/planner/core/rule_decorrelate.go
Outdated
| if decExpr := apply.DeCorColFromEqExpr(cond); decExpr != nil { | ||
| if sf, ok := decExpr.(*expression.ScalarFunction); ok && sf.FuncName.L == ast.EQ { | ||
| args := sf.GetArgs() | ||
| if len(args) == 2 { | ||
| if innerCol, ok := args[1].(*expression.Column); ok { | ||
| if sel.Schema().Contains(innerCol) { | ||
| innerJoinKeys = append(innerJoinKeys, innerCol) | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this a function? It has been used several times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/planner/core/rule_decorrelate.go
Outdated
| if sf, ok := decExpr.(*expression.ScalarFunction); ok && sf.FuncName.L == ast.EQ { | ||
| args := sf.GetArgs() | ||
| if len(args) == 2 { | ||
| if innerCol, ok := args[1].(*expression.Column); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we guarantee the args[1] rather than args[0] is the column from the inner side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeCorColFromEqExpr preserves the order of args[1], which is on the inner side.
pkg/planner/core/rule_decorrelate.go
Outdated
| break | ||
| } | ||
| } | ||
| if allMatch && len(keyInfo) == len(innerJoinKeys) && len(keyInfo) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems no need to check len(keyInfo) == len(innerJoinKeys). For example, the unique key is (a, b). And the filter columns contain (a, b, c). The (a, b) can guarantee the uniqueness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good idea~
| // isJoinKeyUniqueKey checks if join key is unique key. | ||
| // Returns true if the join key forms a unique key constraint. | ||
| func isJoinKeyUniqueKey(apply *logicalop.LogicalApply, plan base.LogicalPlan) bool { | ||
| var hasMultiRowOperator func(base.LogicalPlan) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function needs to guarantee contains all the cases which will generate more rows. If there lacks some cases, it will generate the wring answer. For example, please add some cases related to the unnest function, it will generate more rows? So here should be considered more seriously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there may be mis-deletions down the road or in certain cases. But the NoDecorrelate hint lets us sidestep the issue, even if we miss maintaining the list when new funcs are introduced.
|
/retest |
pkg/planner/core/rule_decorrelate.go
Outdated
| } else if m, ok := innerPlan.(*logicalop.LogicalMaxOneRow); ok { | ||
| // Check if MaxOneRow's child is Limit or TopN, and if we can remove it for LeftOuterJoin | ||
| // Also handle the case where there's a Projection between MaxOneRow and Limit: MaxOneRow -> Projection -> Limit | ||
| if apply.JoinType == base.LeftOuterJoin { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use switch to rewrite this code? it can make code readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
pkg/planner/core/rule_decorrelate.go
Outdated
|
|
||
| // Find the underlying DataSource to get PKOrUK | ||
| var findDataSource func(base.LogicalPlan) *logicalop.DataSource | ||
| findDataSource = func(p base.LogicalPlan) *logicalop.DataSource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it a unique function? Don't use an anonymous function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
/retest |
What problem does this PR solve?
Issue Number: close #64200
Problem Summary:
When decorrelating correlated subqueries, if the join key forms a unique key constraint, the LIMIT operator in the subquery becomes redundant since the join key guarantees at most one row. This optimization removes such redundant LIMIT operators and also eliminates redundant MaxOneRow wrappers to improve query performance.
What changed and how does it work?
Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.