-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add another tablet load balancer algorithm: random #18787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
|
Neat! ❤️ |
ef98dfe to
7679c95
Compare
Signed-off-by: Nick Van Wiggeren <[email protected]>
7679c95 to
d127f64
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #18787 +/- ##
==========================================
+ Coverage 69.70% 69.72% +0.01%
==========================================
Files 1607 1608 +1
Lines 214608 214698 +90
==========================================
+ Hits 149592 149689 +97
+ Misses 65016 65009 -7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I noticed the previous strategy wasn't covered by end to end tests, so I'm going to write up a suite of them that should cover everything. |
238e02c to
617520c
Compare
Signed-off-by: Nick Van Wiggeren <[email protected]>
617520c to
74072b5
Compare
Signed-off-by: Nick Van Wiggeren <[email protected]>
c593a74 to
02a9ecb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good! Just some minor comments. Happy to chat about them. I can then go through the tests.
0871339 to
3128164
Compare
Signed-off-by: Nick Van Wiggeren <[email protected]>
3128164 to
6c8590e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! My only issues are with the tests as I think they will prove to be flaky over time. Let me know what you think about my comments so far, as I think they will apply throughout the others as well.
| cell1Count := 0 | ||
| cell2Count := 0 | ||
| for _, tablet := range allTablets { | ||
| if tablet.Type != "replica" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using topodata.TabletType_REPLICA.String() and similar throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there's a caps vs lowercase mixup, so I added a test-level variable that should work
Signed-off-by: Nick Van Wiggeren <[email protected]>
|
@mattlord thanks for the reviews! I've cleaned up the test suite significantly - there's no more I think I've addressed all of the naming feedback as well, but let me know if you think there's anything else missing! |
Signed-off-by: Nick Van Wiggeren <[email protected]>
b80e217 to
0db6fb6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this, @nickvanw ! I had a couple of very minor comments that you can deal with as you choose.
Signed-off-by: Nick Van Wiggeren <[email protected]>
Signed-off-by: Nick Van Wiggeren <[email protected]>
Description
This PR implements #18786 and adds a cell-agnostic "random" tablet load balancer algorithm.
At PlanetScale, we often find that traffic into VTGates is very uneven - either because of individual applications that might be single-AZ, a lack of AZ overlap, or a specific piece of infrastructure is singular, but produces a lot of load on replicas.
When this happens, it ends up that an individual cell's worth of VTTablets can get very overloaded, causing humans to hae to step in and manually scale components, etc. If the particular overloaded cell changes, it requires continued manual access.
Many times, we'd rather pay for the cross-zone latency and bandwidth costs and just spread it evenly across every replica. That is what this PR implements.
Related Issue(s)
RFC: #18786
Checklist
AI Disclosure
This PR was written by a human and Claude Code - all changes were reviewed by a human.