-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Initialize pool from scratch after TableGC.Close
#18281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
TableGC.Close
Signed-off-by: Mohamed Hamza <[email protected]>
6d3e840 to
9d81683
Compare
|
I have not yet been able to reproduce this in a test, but will be testing it on a cluster that is "stuck" in this state and |
Signed-off-by: Mohamed Hamza <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #18281 +/- ##
=======================================
Coverage 67.46% 67.47%
=======================================
Files 1602 1602
Lines 262245 262254 +9
=======================================
+ Hits 176930 176944 +14
+ Misses 85315 85310 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Mohamed Hamza <[email protected]>
181ff4d to
31b9e0e
Compare
|
Does this need backport labels? |
Yep, was wrangling with some tests before opening this up for review. Should be all supported versions. I've tested it on one of our clusters and PRS has been successful a few times. I can't definitively say this fixed it, but the bug does happen most of the time so it gives us a decent level of confidence. |
| if collector.pool == nil { | ||
| // Pass empty name so that metrics are not re-registered. | ||
| collector.pool = newPool(collector.env, "") | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we still get metrics recorded correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first I had thought so, but on a second glance, this new instance will not have the timings set up properly. Was getting this error when reinitializing the pool:
panic: Reuse of exported var name: TableGCPoolGetConnTime
Will come up with a different way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. The expvar package in Go doesn't allow deleting a registered var, so I instead chose to skip re-registering an already registered var.
Signed-off-by: Mohamed Hamza <[email protected]>
a4bd55f to
e9b56b6
Compare
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
|
Looking at test failures |
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
…y exist Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
Signed-off-by: Mohamed Hamza <[email protected]>
|
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
|
This PR was closed because it has been stale for 7 days with no activity. |
Description
When re-opening a
TableGC's pool, initialize a new pool rather than reusing the existing pool instance. This is to help prevent issues whereClosefails and the pool is left in an inconsistent state, causing any subsequentCloses (such as duringPlannedReparentShard) to fail.Related Issue(s)
Closes #18202
Checklist
Deployment Notes