User Tools
Writing /app/www/public/data/meta/resolution_area/prometheus_resolutions/res-p1403.meta failed
resolution_area:prometheus_resolutions:res-p1403
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| resolution_area:prometheus_resolutions:res-p1403 [2021/06/21 11:47] – external edit 127.0.0.1 | resolution_area:prometheus_resolutions:res-p1403 [2021/07/05 13:36] (current) – wflaherty | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | =====DiskWillFillIn4Hours===== | ||
| + | |||
| + | **Level:** Warning | ||
| + | |||
| + | **Purpose: | ||
| + | This is a common alert which to notify operations if there is a server writing to the disk too much. | ||
| + | |||
| + | **Scenario: | ||
| + | |||
| + | **Resolution: | ||
| + | Very commonly this alert will clear itself in a few moments but it is important to be wary of it as a filled disk can be very difficult to recover from. | ||
| + | |||
| + | |||
| + | **Manual Action Steps:** | ||
| + | You can check the disk and RAM status with a few tools. | ||
| + | |||
| + | Using the Application Version Monitor. You can quickly check and see any potential disk problems across the servers. This can be crucial if there is disk filling up quickly and there isn't much space. | ||
| + | https:// | ||
| + | |||
| + | Using SSH for a quick peak. | ||
| + | < | ||
| + | ssh scotty@server.err "df -h" | ||
| + | </ | ||
| + | Example using Ironman: | ||
| + | < | ||
| + | ssh scotty@ironmanapps1.err "df -h" | ||
| + | ssh scotty@ironmanapps2.err "df -h" | ||
| + | ssh scotty@ironmanlb1.err "df -h" | ||
| + | </ | ||
| + | |||
| + | Using Grafana. Grafana is an excellent tool for monitoring the history and can give more details about the situation for a specific server. | ||
| + | http:// | ||
| + | |||
| + | http:// | ||
| + | |||
| + | |||
| + | If the disk is filling up too quickly and it absolutely must be stopped before it is too late then ssh into the server. | ||
| + | You can replace the APPLICATION in this line for the app you wish to search for. This will provide the PID and the start of the command for running this app as context. | ||
| + | You can quickly get the PID from this | ||
| + | < | ||
| + | ssh scotty@SERVER "ps aux | grep -i ' | ||
| + | </ | ||
| + | |||
| + | or SSH in first, then run the command | ||
| + | < | ||
| + | ssh scotty@SERVER | ||
| + | ps aux | grep -i ' | ||
| + | </ | ||
| + | Typically indicated by the PID that is a Java process. | ||
| + | |||
| + | Then using the PID you just receieved kill the application. | ||
| + | < | ||
| + | sudo kill -9 PID | ||
| + | </ | ||
| + | |||
| + | Now decide if you want to start the application back up. | ||
| + | You can use Ansible to start it. | ||
| + | |||
| + | **Auto Clear:** | ||
| + | Hopefully, the alert won't go on for very long and will autoclear itself. However, if it doesn' | ||