-
Notifications
You must be signed in to change notification settings - Fork 2.3k
grpctmclient: validate tablet record on dial
#19009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
grpctmclient: validate tablet record on dial
#19009
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
grpctmclient: validate tablet record on dial
|
📝 Documentation updates detected! New suggestion: Add changelog entry for grpctmclient tablet validation |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19009 +/- ##
==========================================
+ Coverage 69.81% 69.82% +0.01%
==========================================
Files 1610 1610
Lines 215362 215434 +72
==========================================
+ Hits 150358 150436 +78
+ Misses 65004 64998 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
In hindsight, it appears it is valid for a tablet record to be in the topo with zero'd out values: https://github.com/vitessio/vitess/blob/main/go/vt/vttablet/tabletmanager/tm_init.go#L508-L510 So as it stands, this PR will just create Moving this back to draft 🤔 |
| DbNameOverride: initDbNameOverride, | ||
| Tags: mergeTags(buildTags, initTags), | ||
| DefaultConnCollation: uint32(charset), | ||
| IsShutdown: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vttablet must write this record to the topo to be able to successfully start (that's not new), so we will always set IsShutdown: false if vttablet starts 👍
The opposite is not guaranteed, because processes can crash, OOM, be-killed, etc. This is documented in comments
| tablet.Hostname = "" | ||
| tablet.MysqlHostname = "" | ||
| tablet.PortMap = nil | ||
| tablet.IsShutdown = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not guaranteed to run, and callers need to expect this (this is also not new)
If a tablet stops uncleanly, the existing/pre-v24 behaviour is seen: log noise and a fruitless connection to "":0
|
Please let me know when the tests are passing. FWIW, I would recommend keeping something in draft until the tests are passing and you're sure that you're happy with it (done self review, etc). It makes it much easier on everyone that way. ❤️ |
Good point, I was a bit too confident in this one. It seems the downgrade tests don't align 👀 |
ec07715 to
fa81a31
Compare
Signed-off-by: Tim Vaillancourt <[email protected]>
fa81a31 to
4397174
Compare
Signed-off-by: Tim Vaillancourt <[email protected]>
…te-tablet Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
|
📝 Documentation updates detected! New suggestion: Add changelog entry for tablet shutdown tracking and validation |
mattlord
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on this, @timvaillancourt !
| } | ||
|
|
||
| // <= v23 compatibility to similuate missing TabletShutdownTime field. Remove in v25. | ||
| if tablet.Hostname == "" && tablet.MysqlHostname == "" && tablet.PortMap == nil && tablet.Type != topodatapb.TabletType_UNKNOWN { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the Type check be == ? TabletType_UNKNOWN is the zero value for that field, so equivalent to "" for a string and nil for a pointer. I guess from the test you added that we don't reset the type on shutdown. Which makes sense because we recently added support to restart a tablet with the type in the record. :-) I wonder when it would be UNKNOWN then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattlord what I'm matching here is what the tabletmanager method .Close() does on shutdown, for tablets in <= v23 that don't have the new "shutdown time" field
Currently it does not change the tablet type and the tablets it .Close()s always have a type that is not UNKNOWN. It's kind of a brittle match, which is why I added the more-intentional TabletShutdownTime field
Co-authored-by: Matt Lord <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]>
Co-authored-by: Matt Lord <[email protected]> Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Signed-off-by: Tim Vaillancourt <[email protected]>
Description
This PR causes the
grpctmclientlibrary to validate the tablet record before trying to connect to a tablet. This to address this error (connecting to":0"🫤) whenHostnameand thegrpcport mapping are empty values (""and0respectively):[component.go:39] [core] [Channel #33 SubChannel #34]grpc: addrConn.createTransport failed to connect to {Addr: ":0", ServerName: "localhost:0", }. Err: connection error: desc = "transport: Error while dialing: dial tcp :0: connect: connection refused"Also, a new
TabletShutdownTimefield was added totopodatapb.Tablet. This field indicates to topo clients that a tablet cleanly shut itself down (and "when") - which is something hard to reliably match today because tablet shutdown nil/zero's many record values. This field is best-effort, and used to avoid useless connections and log noise when we know a connection will be unsuccessful. If a tablet is stopped uncleanly, this will remainnilin the topo and the current, <= v23 behaviour will be seen (a log error + useless connection to the downed tablet)For
vttabletto start, it must succeed in updating this value tonil, so there should be no risk of a running tablet havingShutdownTimeset to a non-nilvalue - it wouldn't be able to startTesting: before tablet shutdown
Testing: after shutdown
Related Issue(s)
Checklist
Deployment Notes
AI Disclosure