User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1408

This is an old revision of the document!


KLA NUC DF 60/80

Level: Major/Critical

Purpose: Notify operations that a KLA NUC has disk space full greater than 60 or 80 percent

Resolution: Take the entry point ID in the alert, and search for this ID in the DB.

select * from snmp_manager.network_element where id = <ID>

Check the “name” field of the network element, it should have an acronym you can relate to in the “nuc-hosts.ini” under “prodkla” in the env-configuration repo.

Manual Action Steps: SSH into the related server, the username and password is in pwsafe under “KLA NUC errigal usr”. 9 times out of 10 it is the RDFAGENT logs that is the culprit. Do the following to delete excessive rdfagent logs.

sudo su rdfagent
cd ~/rdfagent/logs/agent
rm agent.log.*

You should be left with only one file, the currently active log, agent.log

If its not the agent logs that is causing the issue do the standard checks for disk space

du -sh ./* | sort -h

Auto Clear: This alert is a little bit flappy, it cant be helped due to how the data is transmitted, and how we are alerting on it and our current alertmanager settings. it will flap to resolved and back again, even shortly after you have done the work. It will clear fully about 15 mins later. Once the disk space is below 60 it will stay resolved for good eventually.

resolution_area/prometheus_resolutions/res-p1408.1716226635.txt.gz · Last modified: 2024/05/20 18:37 by 10.91.120.100