User Tools

Site Tools


Writing /app/www/public/data/meta/resolution_area/prometheus_resolutions/res-p1201.meta failed
resolution_area:prometheus_resolutions:res-p1201

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
resolution_area:prometheus_resolutions:res-p1201 [2021/06/24 15:08] btobinresolution_area:prometheus_resolutions:res-p1201 [2021/07/05 12:56] (current) wflaherty
Line 1: Line 1:
 +=====HeapLow=====
 +
 +**Level:** __Critical__
 +
 +
 +**Purpose:**
 +This alert was created to identify applications where the allocated memory or heap of the JVM was running low. This can happen for a number of reasons.
 +There may be: 
 +- blocking event.
 +- an excessive amount of data being generated by or pushed to the application.
 +- RAM running low on the server.
 +- A script that is holding a lot of data in memory.
 +
 +**Scenario:** <application> on <server> used more than 95% of its Heap for 120s.
 +
 +**Resolution:**
 +Ensure that the application hasn't yet crash as a result of the HeapLow. If it has, this will require a restart.
 +
 +**Manual Action Steps:**
 +Ensure you have the latest of the env-config repo and the deployments-playbook repo from bitbucket.
 +Assuming you have both repositories in the same folder then run 
 +<code>
 +cd /your/env-config/location
 +git checkout master && git pull && cd ../deployment-playbooks && git checkout master && git pull
 +</code>
 +
 +Restart the application with
 +<code>
 +ansible-playbook -i ../env-configuration/ServerName/hosts.ini Application.yml --vault-id @prompt -e"actions='stop,start'" 
 +</code>
 +NOTE: Change ServerName and AppName for the actual server and application name 
 +
 +If the application is refusing to stop with ansible then you will have to log into the server manually. 
 +Firstly, log into the handler to find the service or application
 +<code>
 +ssh scotty@server
 +</code>
 +
 +Check if the application is a service.
 +<code>
 +systemctl status <service> 
 +</code>
 +
 +If it is a service and you can see its active or stopped then try to restart it with
 +<code>
 +sudo systemctl restart <service> 
 +</code>
 +
 +If it doesn't have a service then it is just running normally and must be killed then started again. Since Ansible has failed to kill and start the application, we must proceed with:
 +
 +Get the application PID with 
 +<code>
 +ps aux | grep application 
 +</code>
 +
 +For just the PID use 
 +<code>
 +ps aux | grep application | awk '{ print $2 }'
 +</code>
 +
 +Then kill the process with
 +<code>
 +sudo kill -9 PID
 +</code>
 +
 +You may have to do this again on the other handler or whichever other server it may be on depending on the application.
 +Then try Ansible again
 +
 +**Auto Clear:**
 +This warning may be temporary during a moment of excess load on the server or something such as RabbitMQ, ElasticSearch or anything else. It may just clear itself.
 +