User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1901

EtcdServerHasNoLeader

Level: Critical FIXME

Purpose: Scheduled discoveries utilised in the MDC rely on ETCD to manage the process of leader election i.e which RDF Agent does the scheduled discovery. If ETCD is not running, neither will do the discovery, which is why this alert is important.

Scenario: Etcd server has had no leader for 120s. The service failed to start or was stopped for some unknown reason on a particular server.

Resolution: Check the status of the etcd service on the server. Three servers need to have etcd running in order for it to work.

  • oat1
  • oat2
  • rdflb

Manual Action Steps:

  • Check etcd service status: sudo systemctl status etcd
  • Restart etcd: sudo systemctl restart etcd
  • Examine the service output logs when the service starts, look for any errors or mismatches

Auto Clear: Yes

resolution_area/prometheus_resolutions/res-p1901.txt · Last modified: 2021/07/05 20:18 by 10.91.120.28