User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p9129

This is an old revision of the document!


ElastAlertNoScrape

Level: Critical

Purpose: Notify Errigal Operations staff that the elastalert application on prometheus prod has not completed a scrape on the DF-80 (80% disk full) or DF-60 (60% disk full) Elast Alert rules in the last 2 minutes.

Resolution: Elastalert runs in a docker container on prometheus prod. It is monitoring NUCS Disk space in KLAs network by checking elastic search data that has pulled the Operating system stats. If these dont catch log files filling up the HDD the NUC will go offline.

Manual Action Steps:

sudo docker logs --tail 100 elastalert --follow
sudo docker restart elastalert

A restart should do the trick if there was an error and this caused the application to disable the rules.

Observe logs and be sure that the logs show elastalert running the query for both DF-80 and DF-60. These two lines should be present.

INFO:elastalert:Queried rule df-80 from 2024-08-06 19:41 UTC to 2024-08-06 19:52 UTC: 0 / 0 hits
INFO:elastalert:Queried rule df-60 from 2024-08-06 19:41 UTC to 2024-08-06 19:53 UTC: 0 / 0 hits

Another thing to check is the rule yaml files

cd /home/scotty/elastalert_config/rules
sudo vi df-60.yaml or sudo vi df-80.yaml

Ensure the setting is_enabled is set to true.

Auto Clear: Will auto clear if the elastalert_scrapes_total counter increases again

resolution_area/prometheus_resolutions/res-p9129.1722974261.txt.gz · Last modified: 2024/08/06 20:57 by 10.91.120.100