User Tools
resolution_area:prometheus_resolutions:res-p1201
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| resolution_area:prometheus_resolutions:res-p1201 [2021/06/25 10:09] – external edit 127.0.0.1 | resolution_area:prometheus_resolutions:res-p1201 [2021/07/05 12:56] (current) – wflaherty | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| =====HeapLow===== | =====HeapLow===== | ||
| - | **Level:** __Critical__ | + | **Level:** __Critical__ |
| **Purpose: | **Purpose: | ||
| + | This alert was created to identify applications where the allocated memory or heap of the JVM was running low. This can happen for a number of reasons. | ||
| + | There may be: | ||
| + | - blocking event. | ||
| + | - an excessive amount of data being generated by or pushed to the application. | ||
| + | - RAM running low on the server. | ||
| + | - A script that is holding a lot of data in memory. | ||
| **Scenario: | **Scenario: | ||
| **Resolution: | **Resolution: | ||
| + | Ensure that the application hasn't yet crash as a result of the HeapLow. If it has, this will require a restart. | ||
| **Manual Action Steps:** | **Manual Action Steps:** | ||
| + | Ensure you have the latest of the env-config repo and the deployments-playbook repo from bitbucket. | ||
| + | Assuming you have both repositories in the same folder then run | ||
| + | < | ||
| + | cd / | ||
| + | git checkout master && git pull && cd ../ | ||
| + | </ | ||
| + | |||
| + | Restart the application with | ||
| + | < | ||
| + | ansible-playbook -i ../ | ||
| + | </ | ||
| + | NOTE: Change ServerName and AppName for the actual server and application name | ||
| + | |||
| + | If the application is refusing to stop with ansible then you will have to log into the server manually. | ||
| + | Firstly, log into the handler to find the service or application | ||
| + | < | ||
| + | ssh scotty@server | ||
| + | </ | ||
| + | |||
| + | Check if the application is a service. | ||
| + | < | ||
| + | systemctl status < | ||
| + | </ | ||
| + | |||
| + | If it is a service and you can see its active or stopped then try to restart it with | ||
| + | < | ||
| + | sudo systemctl restart < | ||
| + | </ | ||
| + | |||
| + | If it doesn' | ||
| + | |||
| + | Get the application PID with | ||
| + | < | ||
| + | ps aux | grep application | ||
| + | </ | ||
| + | |||
| + | For just the PID use | ||
| + | < | ||
| + | ps aux | grep application | awk '{ print $2 }' | ||
| + | </ | ||
| + | |||
| + | Then kill the process with | ||
| + | < | ||
| + | sudo kill -9 PID | ||
| + | </ | ||
| + | |||
| + | You may have to do this again on the other handler or whichever other server it may be on depending on the application. | ||
| + | Then try Ansible again | ||
| **Auto Clear:** | **Auto Clear:** | ||
| + | This warning may be temporary during a moment of excess load on the server or something such as RabbitMQ, ElasticSearch or anything else. It may just clear itself. | ||
| + | |||
| + | |||
resolution_area/prometheus_resolutions/res-p1201.1624612196.txt.gz · Last modified: 2021/06/25 10:09 by 127.0.0.1