User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1408

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
resolution_area:prometheus_resolutions:res-p1408 [2024/05/20 18:37] 10.91.120.100resolution_area:prometheus_resolutions:res-p1408 [2024/05/21 11:12] (current) 10.91.120.100
Line 14: Line 14:
  
 **Manual Action Steps:** **Manual Action Steps:**
-SSH into the related server, the username and password is in pwsafe under "KLA NUC errigal usr". 9 times out of 10 it is the RDFAGENT logs that is the culprit. Do the following to delete excessive rdfagent logs.+SSH into the related server, the username and password is in pwsafe under "KLA NUC errigal usr"There is a port specified alongside the IP address. 
 + 
 +<code> ssh errigal@<IP-ADDRESS> -p <PORT></code> 
 + 
 +9 times out of 10 it is the RDFAGENT logs that is the culprit. Do the following to delete excessive rdfagent logs.
  
 <code>sudo su rdfagent</code> <code>sudo su rdfagent</code>
Line 20: Line 24:
 <code>cd ~/rdfagent/logs/agent</code> <code>cd ~/rdfagent/logs/agent</code>
  
-<code>rm agent.log.*</code>+<code>rm spring.log.*</code>
  
 You should be left with only one file, the currently active log, agent.log You should be left with only one file, the currently active log, agent.log
Line 28: Line 32:
 <code>du -sh ./* | sort -h</code> <code>du -sh ./* | sort -h</code>
  
-**Auto Clear:** This alert is a little bit flappy, it cant be helped due to how the data is transmitted, and how we are alerting on it and our current alertmanager settings. it will flap to resolved and back again, even shortly after you have done the work. It will clear fully about 15 mins later. Once the disk space is below 60 it will stay resolved for good eventually.+**Auto Clear:** This alert is a little bit flappy, it cant be helped due to how the data is transmitted, and how we are alerting on it and our current alertmanager settings. We would have to set our alertmanager auto resolve from 3 minutes to 10 minutes, a change that might affect other alerts and Id rather not find out the hard way about that. It will flap to resolved and back again, even shortly after you have done the work. It will clear fully about 15 mins later. Once the disk space is below 60 it will stay resolved for good eventually.
resolution_area/prometheus_resolutions/res-p1408.1716226635.txt.gz · Last modified: 2024/05/20 18:37 by 10.91.120.100