-
Notifications
You must be signed in to change notification settings - Fork 6
[Reviewer EM] Upgrade etcd to V3.2.16 #570
base: dev
Are you sure you want to change the base?
Conversation
eleanor-merry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - one question inline
|
|
||
| # Run the real etcdctl. | ||
| /usr/share/clearwater/clearwater-etcd/$etcd_version/etcdctl -C $target_ip:4000 "$@" | ||
| /usr/share/clearwater/clearwater-etcd/$etcd_version/etcdctl -C http://$target_ip:4000 "$@" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to change? Does it work with all versions? I assume you've tested the 3.x ones - I wonder if we should be just removing the 2.x one entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but I assume it's related to the following section from the upgrade documentation
Change in --listen-peer-urls and --listen-client-urls
3.2 now rejects domains names for --listen-peer-urls and --listen-client-urls (3.1 only prints out warnings), since domain name is invalid for network interface binding. Make sure that those URLs are properly formated as scheme://IP:port.
I've checked it works against 3.1.7, but I couldn't be bothered to check with 2.2.5 (partly because I was pretty sure it would do and partly because I wasn't sure why I cared). Any idea what the last CC version we shipped with 2.2.5 was? V10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I'm going to remove support for 2.2.5
|
Whilst doing some live testing of this change, I noticed that etcd proxies in the cluster would run for around 5 minutes before consuming all of their available file handles and being restarted by monit. This was happening reliably and repeatedly. I’ve collected some diags from a single etcd proxy. The etcd proxy’s IP address is 10.230.11.141, the etcd masters’ IP addresses are 10.230.11.136, 10.230.11.137 and 10.230.11.138.
Here’s what I can see happening from these diags…
So, in conclusion…
There’s one other thing I’m confused about. In both etcd log files I’ve attached, there are lots of logs the look like the following…
This log corresponds to 5 seconds after sending the GET to the etcd master, and on etcd v3.1.7 it happens just after sending the FIN-ACK. Do we just spam this log out on all our systems? I don’t remember ever seeing it before? Is it because I have debug logging turned on?! FTR, I’ve tried using v3.2.1 too, but this exhibits the same problem. |
Some notes...