User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1408

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
resolution_area:prometheus_resolutions:res-p1408 [2024/05/20 18:33] – created 10.91.120.100resolution_area:prometheus_resolutions:res-p1408 [2024/05/21 11:12] (current) 10.91.120.100
Line 14: Line 14:
  
 **Manual Action Steps:** **Manual Action Steps:**
-SSH into the related server, the username and password is in pwsafe. 9 times out of 10 it is the RDFAGENT logs that is the culprit. Do the following to delete excessive rdfagent logs.+SSH into the related server, the username and password is in pwsafe under "KLA NUC errigal usr"There is a port specified alongside the IP address. 
 + 
 +<code> ssh errigal@<IP-ADDRESS> -p <PORT></code> 
 + 
 +9 times out of 10 it is the RDFAGENT logs that is the culprit. Do the following to delete excessive rdfagent logs.
  
 <code>sudo su rdfagent</code> <code>sudo su rdfagent</code>
Line 20: Line 24:
 <code>cd ~/rdfagent/logs/agent</code> <code>cd ~/rdfagent/logs/agent</code>
  
-<code>rm agent.log.*</code>+<code>rm spring.log.*</code>
  
 You should be left with only one file, the currently active log, agent.log You should be left with only one file, the currently active log, agent.log
  
-**Auto Clear:** This alert is a little bit flappy, it cant be helped due to how the data is transmitted, and how we are alerting on it. it will flap to resolved and back again, even after you have done the work. It will clear about 15 mins later. Once the disk space is below 60 it will stay resolved eventually.+If its not the agent logs that is causing the issue do the standard checks for disk space 
 + 
 +<code>du -sh ./* | sort -h</code> 
 + 
 +**Auto Clear:** This alert is a little bit flappy, it cant be helped due to how the data is transmitted, and how we are alerting on it and our current alertmanager settingsWe would have to set our alertmanager auto resolve from 3 minutes to 10 minutes, a change that might affect other alerts and Id rather not find out the hard way about that. It will flap to resolved and back again, even shortly after you have done the work. It will clear fully about 15 mins later. Once the disk space is below 60 it will stay resolved for good eventually.
resolution_area/prometheus_resolutions/res-p1408.1716226405.txt.gz · Last modified: 2024/05/20 18:33 by 10.91.120.100