User Tools
Writing /app/www/public/data/meta/resolution_area/prometheus_resolutions/res-p1302.meta failed
resolution_area:prometheus_resolutions:res-p1302
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| resolution_area:prometheus_resolutions:res-p1302 [2021/06/24 14:18] – btobin | resolution_area:prometheus_resolutions:res-p1302 [2021/07/05 11:42] (current) – 10.91.120.28 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | =====CriticalCPULoad===== | ||
| + | **Level:** __Critical__ FIXME | ||
| + | |||
| + | |||
| + | **Purpose: | ||
| + | The alert reports if the CPU usage is above 96% for more that 2 minutes on one of the servers. | ||
| + | |||
| + | **Scenario: | ||
| + | The CPU has been over 96% for more than 2 minutes. | ||
| + | |||
| + | **Resolution: | ||
| + | Monitor the server and alerts. Check the RAM usage on the server. Check processes to see if anything is running that shouldn' | ||
| + | |||
| + | **Manual Action Steps:** | ||
| + | Kill any processes that should not be running. The serve may require maintenance if it continues to report high usage. | ||
| + | |||
| + | **Auto Clear:** | ||
| + | When CPU usages drops below 96% | ||