Skip to content

PrometheusDuplicateTimestamps with kubepsray's etcd-metics endpoint template #12788

@raqqun

Description

@raqqun

What happened?

Kubespray's etcd-metrics template makes kube-state-metrics report 3 distinct endpoint ports.

kube_endpoint_ports{namespace="kube-system",endpoint="etcd-metrics",port_name="http-metrics",port_protocol="TCP",port_number="2381"} 1
kube_endpoint_ports{namespace="kube-system",endpoint="etcd-metrics",port_name="http-metrics",port_protocol="TCP",port_number="2381"} 1
kube_endpoint_ports{namespace="kube-system",endpoint="etcd-metrics",port_name="http-metrics",port_protocol="TCP",port_number="2381"} 1

This situation makes prometheus alerting "Duplicate sample for timestamp"

time=2025-12-12T13:56:54.465Z level=DEBUG source=scrape.go:2029 msg="Duplicate sample for timestamp" component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://10.233.69.115:8080/metrics series="kube_endpoint_ports{namespace=\"kube-system\",endpoint=\"etcd-metrics\",port_name=\"http-metrics\",port_protocol=\"TCP\",port_number=\"2381\"}"
time=2025-12-12T13:56:54.465Z level=DEBUG source=scrape.go:2029 msg="Duplicate sample for timestamp" component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://10.233.69.115:8080/metrics series="kube_endpoint_ports{namespace=\"kube-system\",endpoint=\"etcd-metrics\",port_name=\"http-metrics\",port_protocol=\"TCP\",port_number=\"2381\"}"
time=2025-12-12T13:56:54.473Z level=WARN source=scrape.go:1923 msg="Error on ingesting samples with different value but same timestamp" component="scrape manager" scrape_pool=serviceMonitor/monitoring/kube-prometheus-stack-kube-state-metrics/0 target=http://10.233.69.115:8080/metrics num_dropped=2

What did you expect to happen?

The fix i'm proposing is to modify the roles/kubernetes-apps/ansible/templates/etcd_metrics-endpoints.yml.j2

apiVersion: v1
kind: Endpoints
metadata:
  name: etcd-metrics
  namespace: kube-system
  labels:
    k8s-app: etcd
    app.kubernetes.io/managed-by: Kubespray
subsets:
  - addresses:
{% for etcd_metrics_address, etcd_host in etcd_metrics_addresses.split(',') | zip(etcd_hosts) %}
      - ip: {{ etcd_metrics_address | urlsplit('hostname') }}
        targetRef:
          kind: Node
          name: {{ etcd_host }}
{% endfor %}
    ports:
      - name: http-metrics
        port: {{ etcd_metrics_address | urlsplit('port') }}
        protocol: TCP

And go from that :

subsets:
- addresses:
  - ip: 10.5.2.81
    targetRef:
      kind: Node
      name: etcd1
  ports:
  - name: http-metrics
    port: 2381
    protocol: TCP
- addresses:
  - ip: 10.5.2.80
    targetRef:
      kind: Node
      name: etcd2
  ports:
  - name: http-metrics
    port: 2381
    protocol: TCP
- addresses:
  - ip: 10.5.2.79
    targetRef:
      kind: Node
      name: etcd3
  ports:
  - name: http-metrics
    port: 2381
    protocol: TCP

To that:

subsets:
- addresses:
  - ip: 10.5.2.81
    targetRef:
      kind: Node
      name: etcd1
  - ip: 10.5.2.80
    targetRef:
      kind: Node
      name: etcd2
  - ip: 10.5.2.79
    targetRef:
      kind: Node
      name: etcd3
  ports:
  - name: http-metrics
    port: 2381
    protocol: TCP

How can we reproduce it (as minimally and precisely as possible)?

Deploy kubepsray v2.29 with at least:

etcd_deployment_type: host

etcd_listen_metrics_urls: http://0.0.0.0:2381
etcd_metrics_port: 2381
etcd_metrics_service_labels:
  k8s-app: etcd
  app.kubernetes.io/managed-by: Kubespray
  app: kube-prometheus-stack-kube-etcd
  release: kube-prometheus-stack

And install kube-prometheus-stakc chart.

OS

RHEL 9

Version of Ansible

ansible [core 2.16.3]
  config file =/kubernetes-mvp/kubespray/ansible.cfg
  configured module search path = ['/kubernetes-mvp/kubespray/library']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = ~/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.12.3 (main, Nov  6 2025, 13:44:16) [GCC 13.3.0] (/usr/bin/python3)
  jinja version = 3.1.2
  libyaml = True

Version of Python

3.1.2

Version of Kubespray (commit)

9991412

Network plugin used

cilium

Full inventory with variables

"groups": {
            "all": [
                "node1",
                "node2",
                "node3",
                "etcd1",
                "etcd2",
                "etcd3",
                "node4",
                "node5",
                "node6"
            ],
            "etcd": [
                "etcd1",
                "etcd2",
                "etcd3"
            ],
            "kube_control_plane": [
                "node1",
                "node2",
                "node3"
            ],
            "kube_node": [
                "node4",
                "node5",
                "node6"
            ],

Command used to invoke ansible

ansible-playbook -i inventory/mycluster/inventory.ini -b -v -u <remote_user> cluster.yml

Output of ansible run

It doesn't affect standard deployment operations.

Anything else we need to know

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    RHEL 9kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions