User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1302

CriticalCPULoad

Level: Critical FIXME

Purpose: The alert reports if the CPU usage is above 96% for more that 2 minutes on one of the servers.

Scenario: The CPU has been over 96% for more than 2 minutes.

Resolution: Monitor the server and alerts. Check the RAM usage on the server. Check processes to see if anything is running that shouldn't be. Check the Grafana metrics for CPU percentage for the last few days for any patterns. This could be caused by a process that runs at the same time everyday e.g. key stats.

Manual Action Steps: Kill any processes that should not be running. The serve may require maintenance if it continues to report high usage.

Auto Clear: When CPU usages drops below 96%

resolution_area/prometheus_resolutions/res-p1302.txt · Last modified: 2021/07/05 11:42 by 10.91.120.28