HeapLow

Level: Critical

Purpose: This alert was created to identify applications where the allocated memory or heap of the JVM was running low. This can happen for a number of reasons. There may be: - blocking event. - an excessive amount of data being generated by or pushed to the application. - RAM running low on the server. - A script that is holding a lot of data in memory.

Scenario: <application> on <server> used more than 95% of its Heap for 120s.

Resolution: Ensure that the application hasn't yet crash as a result of the HeapLow. If it has, this will require a restart.

Manual Action Steps: Ensure you have the latest of the env-config repo and the deployments-playbook repo from bitbucket. Assuming you have both repositories in the same folder then run

cd /your/env-config/location
git checkout master && git pull && cd ../deployment-playbooks && git checkout master && git pull

Restart the application with

ansible-playbook -i ../env-configuration/ServerName/hosts.ini Application.yml --vault-id @prompt -e"actions='stop,start'" 

NOTE: Change ServerName and AppName for the actual server and application name

If the application is refusing to stop with ansible then you will have to log into the server manually. Firstly, log into the handler to find the service or application

ssh scotty@server

Check if the application is a service.

systemctl status <service> 

If it is a service and you can see its active or stopped then try to restart it with

sudo systemctl restart <service> 

If it doesn't have a service then it is just running normally and must be killed then started again. Since Ansible has failed to kill and start the application, we must proceed with:

Get the application PID with

ps aux | grep application 

For just the PID use

ps aux | grep application | awk '{ print $2 }'

Then kill the process with

sudo kill -9 PID

You may have to do this again on the other handler or whichever other server it may be on depending on the application. Then try Ansible again

Auto Clear: This warning may be temporary during a moment of excess load on the server or something such as RabbitMQ, ElasticSearch or anything else. It may just clear itself.