-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Open
Labels
RHEL 9kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.
Description
What happened?
Kubespray's etcd-metrics template makes kube-state-metrics report 3 distinct endpoint ports.
kube_endpoint_ports{namespace="kube-system",endpoint="etcd-metrics",port_name="http-metrics",port_protocol="TCP",port_number="2381"} 1
kube_endpoint_ports{namespace="kube-system",endpoint="etcd-metrics",port_name="http-metrics",port_protocol="TCP",port_number="2381"} 1
kube_endpoint_ports{namespace="kube-system",endpoint="etcd-metrics",port_name="http-metrics",port_protocol="TCP",port_number="2381"} 1
This situation makes prometheus alerting "Duplicate sample for timestamp"
time=2025-12-12T13:56:54.465Z level=DEBUG source=scrape.go:2029 msg="Duplicate sample for timestamp" component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://10.233.69.115:8080/metrics series="kube_endpoint_ports{namespace=\"kube-system\",endpoint=\"etcd-metrics\",port_name=\"http-metrics\",port_protocol=\"TCP\",port_number=\"2381\"}"
time=2025-12-12T13:56:54.465Z level=DEBUG source=scrape.go:2029 msg="Duplicate sample for timestamp" component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://10.233.69.115:8080/metrics series="kube_endpoint_ports{namespace=\"kube-system\",endpoint=\"etcd-metrics\",port_name=\"http-metrics\",port_protocol=\"TCP\",port_number=\"2381\"}"
time=2025-12-12T13:56:54.473Z level=WARN source=scrape.go:1923 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://10.233.69.115:8080/metrics num_dropped=2
What did you expect to happen?
The fix i'm proposing is to modify the roles/kubernetes-apps/ansible/templates/etcd_metrics-endpoints.yml.j2
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-metrics
namespace: kube-system
labels:
k8s-app: etcd
app.kubernetes.io/managed-by: Kubespray
subsets:
- addresses:
{% for etcd_metrics_address, etcd_host in etcd_metrics_addresses.split(',') | zip(etcd_hosts) %}
- ip: {{ etcd_metrics_address | urlsplit('hostname') }}
targetRef:
kind: Node
name: {{ etcd_host }}
{% endfor %}
ports:
- name: http-metrics
port: {{ etcd_metrics_address | urlsplit('port') }}
protocol: TCP
And go from that :
subsets:
- addresses:
- ip: 10.5.2.81
targetRef:
kind: Node
name: etcd1
ports:
- name: http-metrics
port: 2381
protocol: TCP
- addresses:
- ip: 10.5.2.80
targetRef:
kind: Node
name: etcd2
ports:
- name: http-metrics
port: 2381
protocol: TCP
- addresses:
- ip: 10.5.2.79
targetRef:
kind: Node
name: etcd3
ports:
- name: http-metrics
port: 2381
protocol: TCP
To that:
subsets:
- addresses:
- ip: 10.5.2.81
targetRef:
kind: Node
name: etcd1
- ip: 10.5.2.80
targetRef:
kind: Node
name: etcd2
- ip: 10.5.2.79
targetRef:
kind: Node
name: etcd3
ports:
- name: http-metrics
port: 2381
protocol: TCP
How can we reproduce it (as minimally and precisely as possible)?
Deploy kubepsray v2.29 with at least:
etcd_deployment_type: host
etcd_listen_metrics_urls: http://0.0.0.0:2381
etcd_metrics_port: 2381
etcd_metrics_service_labels:
k8s-app: etcd
app.kubernetes.io/managed-by: Kubespray
app: kube-prometheus-stack-kube-etcd
release: kube-prometheus-stack
And install kube-prometheus-stakc chart.
OS
RHEL 9
Version of Ansible
ansible [core 2.16.3]
config file =/kubernetes-mvp/kubespray/ansible.cfg
configured module search path = ['/kubernetes-mvp/kubespray/library']
ansible python module location = /usr/lib/python3/dist-packages/ansible
ansible collection location = ~/.ansible/collections:/usr/share/ansible/collections
executable location = /usr/bin/ansible
python version = 3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0] (/usr/bin/python3)
jinja version = 3.1.2
libyaml = True
Version of Python
3.1.2
Version of Kubespray (commit)
Network plugin used
cilium
Full inventory with variables
"groups": {
"all": [
"node1",
"node2",
"node3",
"etcd1",
"etcd2",
"etcd3",
"node4",
"node5",
"node6"
],
"etcd": [
"etcd1",
"etcd2",
"etcd3"
],
"kube_control_plane": [
"node1",
"node2",
"node3"
],
"kube_node": [
"node4",
"node5",
"node6"
],
Command used to invoke ansible
ansible-playbook -i inventory/mycluster/inventory.ini -b -v -u <remote_user> cluster.yml
Output of ansible run
It doesn't affect standard deployment operations.
Anything else we need to know
No response
Metadata
Metadata
Assignees
Labels
RHEL 9kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.