-
Notifications
You must be signed in to change notification settings - Fork 2.3k
grpctmclient: validate tablet record on dial
#19009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
grpctmclient: validate tablet record on dial
#19009
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
grpctmclient: validate tablet record on dial
|
📝 Documentation updates detected! New suggestion: Add changelog entry for grpctmclient tablet validation |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #19009 +/- ##
=======================================
Coverage 69.81% 69.82%
=======================================
Files 1610 1610
Lines 215362 215392 +30
=======================================
+ Hits 150358 150399 +41
+ Misses 65004 64993 -11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
In hindsight, it appears it is valid for a tablet record to be in the topo with zero'd out values: https://github.com/vitessio/vitess/blob/main/go/vt/vttablet/tabletmanager/tm_init.go#L508-L510 So as it stands, this PR will just create Moving this back to draft 🤔 |
| DbNameOverride: initDbNameOverride, | ||
| Tags: mergeTags(buildTags, initTags), | ||
| DefaultConnCollation: uint32(charset), | ||
| IsShutdown: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vttablet must write this record to the topo to be able to successfully start (that's not new), so we will always set IsShutdown: false if vttablet starts 👍
The opposite is not guaranteed, because processes can crash, OOM, be-killed, etc. This is documented in comments
| tablet.Hostname = "" | ||
| tablet.MysqlHostname = "" | ||
| tablet.PortMap = nil | ||
| tablet.IsShutdown = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not guaranteed to run, and callers need to expect this (this is also not new)
If a tablet stops uncleanly, the existing/pre-v24 behaviour is seen: log noise and a fruitless connection to "":0
|
Please let me know when the tests are passing. FWIW, I would recommend keeping something in draft until the tests are passing and you're sure that you're happy with it (done self review, etc). It makes it much easier on everyone that way. ❤️ |
Good point, I was a bit too confident in this one. It seems the downgrade tests don't align 👀 |
ec07715 to
fa81a31
Compare
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
fa81a31 to
4397174
Compare
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
…te-tablet Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
|
📝 Documentation updates detected! New suggestion: Add changelog entry for tablet shutdown tracking and validation |
Description
This PR causes the
grpctmclientlibrary to validate the tablet record before trying to connect to a tablet. This to address this error (connecting to":0"🫤) whenHostnameand thegrpcport mapping are empty values (""and0respectively):[component.go:39] [core] [Channel #33 SubChannel #34]grpc: addrConn.createTransport failed to connect to {Addr: ":0", ServerName: "localhost:0", }. Err: connection error: desc = "transport: Error while dialing: dial tcp :0: connect: connection refused"Also, a new
TabletShutdownTimefield was added totopodatapb.Tablet. This field indicates to topo clients that a tablet cleanly shut itself down (and "when") - which is something hard to reliably match today because tablet shutdown nil/zero's many record values. This field is best-effort, and used to avoid useless connections and log noise when we know a connection will be unsuccessful. If a tablet is stopped uncleanly, this will remainnilin the topo and the current, <= v23 behaviour will be seen (a log error + useless connection to the downed tablet)For
vttabletto start, it must succeed in updating this value tonil, so there should be no risk of a running tablet havingShutdownTimeset to a non-nilvalue - it wouldn't be able to startTesting: before tablet shutdown
Testing: after shutdown
Related Issue(s)
Checklist
Deployment Notes
AI Disclosure