Skip to content

Conversation

@King-Dylan
Copy link
Member

@King-Dylan King-Dylan commented Oct 31, 2025

What problem does this PR solve?

Issue Number: close #64200

Problem Summary:
When decorrelating correlated subqueries, if the join key forms a unique key constraint, the LIMIT operator in the subquery becomes redundant since the join key guarantees at most one row. This optimization removes such redundant LIMIT operators and also eliminates redundant MaxOneRow wrappers to improve query performance.

What changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 31, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 31, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign 0xpoe for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 90.55375% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.3901%. Comparing base (bdd2b6f) to head (b596777).
⚠️ Report is 23 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #64214        +/-   ##
================================================
+ Coverage   72.7334%   73.3901%   +0.6567%     
================================================
  Files          1859       1862         +3     
  Lines        503870     508936      +5066     
================================================
+ Hits         366482     373509      +7027     
+ Misses       115127     113214      -1913     
+ Partials      22261      22213        -48     
Flag Coverage Δ
integration 41.9102% <85.6677%> (?)
unit 72.6106% <86.9706%> (+0.3115%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.8700% <ø> (ø)
parser ∅ <ø> (∅)
br 45.8089% <ø> (-0.5764%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

canRemove = true
}
}
} else if proj, ok := mChild.(*logicalop.LogicalProjection); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will this happen? Can you provide some cases for this situations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

apply.SetChildren(outerPlan, innerPlan)
return s.optimize(ctx, p, groupByColumn)
} else if m, ok := innerPlan.(*logicalop.LogicalMaxOneRow); ok {
// Check if MaxOneRow's child is Limit or TopN, and if we can remove it for LeftOuterJoin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this PR doesn't handle the TopN case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, topn is still just a LIMIT, so it doesn’t matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that if the middle node is topN (limit + sort), it seems can not to decorrelate. Because sort will change the meaning? Could you add a test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the single-row result is determined by the join key, its final result isn’t affected by any ordering. I’ve already added an ORDER BY test case in TestDecorrelateLimitOptimization.

Comment on lines 549 to 560
if decExpr := apply.DeCorColFromEqExpr(cond); decExpr != nil {
if sf, ok := decExpr.(*expression.ScalarFunction); ok && sf.FuncName.L == ast.EQ {
args := sf.GetArgs()
if len(args) == 2 {
if innerCol, ok := args[1].(*expression.Column); ok {
if sel.Schema().Contains(innerCol) {
innerJoinKeys = append(innerJoinKeys, innerCol)
}
}
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a function? It has been used several times

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if sf, ok := decExpr.(*expression.ScalarFunction); ok && sf.FuncName.L == ast.EQ {
args := sf.GetArgs()
if len(args) == 2 {
if innerCol, ok := args[1].(*expression.Column); ok {
Copy link
Contributor

@Reminiscent Reminiscent Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we guarantee the args[1] rather than args[0] is the column from the inner side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeCorColFromEqExpr preserves the order of args[1], which is on the inner side.

break
}
}
if allMatch && len(keyInfo) == len(innerJoinKeys) && len(keyInfo) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems no need to check len(keyInfo) == len(innerJoinKeys). For example, the unique key is (a, b). And the filter columns contain (a, b, c). The (a, b) can guarantee the uniqueness.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea~

// isJoinKeyUniqueKey checks if join key is unique key.
// Returns true if the join key forms a unique key constraint.
func isJoinKeyUniqueKey(apply *logicalop.LogicalApply, plan base.LogicalPlan) bool {
var hasMultiRowOperator func(base.LogicalPlan) bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs to guarantee contains all the cases which will generate more rows. If there lacks some cases, it will generate the wring answer. For example, please add some cases related to the unnest function, it will generate more rows? So here should be considered more seriously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there may be mis-deletions down the road or in certain cases. But the NoDecorrelate hint lets us sidestep the issue, even if we miss maintaining the list when new funcs are introduced.

@King-Dylan
Copy link
Member Author

/retest

@King-Dylan King-Dylan requested review from AilinKid, D3Hunter and qw4990 and removed request for AilinKid, D3Hunter and qw4990 October 31, 2025 20:13
} else if m, ok := innerPlan.(*logicalop.LogicalMaxOneRow); ok {
// Check if MaxOneRow's child is Limit or TopN, and if we can remove it for LeftOuterJoin
// Also handle the case where there's a Projection between MaxOneRow and Limit: MaxOneRow -> Projection -> Limit
if apply.JoinType == base.LeftOuterJoin {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use switch to rewrite this code? it can make code readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok


// Find the underlying DataSource to get PKOrUK
var findDataSource func(base.LogicalPlan) *logicalop.DataSource
findDataSource = func(p base.LogicalPlan) *logicalop.DataSource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it a unique function? Don't use an anonymous function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@King-Dylan
Copy link
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove redundant LIMIT 1 to enable decorrelation of correlated scalar subqueries

3 participants