=====DiskWillFillIn4Hours=====
**Level:** Warning
**Purpose:**
This is a common alert which to notify operations if there is a server writing to the disk too much.
**Scenario:** Disk will fill in 4 hours based on current write rate in past 5m.
**Resolution:**
Very commonly this alert will clear itself in a few moments but it is important to be wary of it as a filled disk can be very difficult to recover from.
**Manual Action Steps:**
You can check the disk and RAM status with a few tools.
Using the Application Version Monitor. You can quickly check and see any potential disk problems across the servers. This can be crucial if there is disk filling up quickly and there isn't much space.
https://docs.google.com/spreadsheets/d/1Ebj0kWPl63Q4L3f_oo_kvWoh5ZRq2vFI8FmrWbjoPlo/edit#gid=1606019603
Using SSH for a quick peak.
ssh scotty@server.err "df -h"
Example using Ironman:
ssh scotty@ironmanapps1.err "df -h"
ssh scotty@ironmanapps2.err "df -h"
ssh scotty@ironmanlb1.err "df -h"
Using Grafana. Grafana is an excellent tool for monitoring the history and can give more details about the situation for a specific server.
http://grafana.errigal.com:3000/
http://grafana.errigal.com:3000/d/T9Z85H2iz/node-exporter-full?orgId=1&var-job=node-exporter&var-node=cerberus&var-port=10000
If the disk is filling up too quickly and it absolutely must be stopped before it is too late then ssh into the server.
You can replace the APPLICATION in this line for the app you wish to search for. This will provide the PID and the start of the command for running this app as context.
You can quickly get the PID from this
ssh scotty@SERVER "ps aux | grep -i 'APPLICATION'" | awk '{ print $2 " " $11 }'
or SSH in first, then run the command
ssh scotty@SERVER
ps aux | grep -i 'APPLICATION' | awk '{ print $2 " " $11 }'
Typically indicated by the PID that is a Java process.
Then using the PID you just receieved kill the application.
sudo kill -9 PID
Now decide if you want to start the application back up.
You can use Ansible to start it.
**Auto Clear:**
Hopefully, the alert won't go on for very long and will autoclear itself. However, if it doesn't this can be very serious.