Ankur Rawal
Ankur Rawal CTO at Zenduty. I handle all things engineering at Zenduty. Aspiring sysadmin. Constantly learning and trying to keep up with the pace of innovation.

[Kubernetes tip] Prometheus for multi-cluster setups

[Kubernetes tip] Prometheus for multi-cluster setups

This tip is for those who are using Prometheus federation to monitor multiple clusters.

How should alertmanager be configured for multiple clusters? Let us say that if there’s an issue for Cluster A it only needs to send an alert for cluster A?

1
2
3
4
5
6
7
8
9
10
11
12
13
    alerting_rules.yml:
      groups:
        - name: Instances
          rules:
            - alert: TEST ALERT FROM PROMETHEUS PLEASE ACKNOWLEDGE
              expr: prometheus_build_info{instance="localhost:9090"} == 1
              for: 10s
              labels:
                severity: page
              annotations:
                description: ' of job  has been down for more than 5 minutes.'
                summary: 'Instance  down'
                action: TESTING PLEASE ACKNOWLEDGE, NO FURTHER ACTION REQUIRED ONLY A TEST

In such cases, every alert should be routed to proper team based on labels (if there is problem with application A on cluster B - team responsible should be notified). In the above case, two alerts are triggered by the same rule. You’ll have to deduplicate them. Now, if you don’t wish to be alerted on each trigger of very smiliar alertsyou can treat them as a group.

If you know some app in node A have disk issues, and all other apps on that node have the same issue (the same cause) you might not want to recieve 10 alerts, but you’d rather just want to be informed of one if the conditions are met(like they were triggered by similar rules/in similar place and withing given time interval).

Do read up on the AlertManager docs for more infomation on alert grouping.

Looking for an incident management and on-call scheduling platform?

Sign up for a 14-day free trial of Zenduty. No CC required. Implement modern incident response and on-call practices within your production operations and provide industry-leading SLAs to your customers

Sign up on Zenduty Login to Zenduty