Differences

This shows you the differences between two versions of the page.

--- resolution_area:prometheus_resolutions:res-p1201 [2021/06/25 10:09] – external edit 127.0.0.1
+++ resolution_area:prometheus_resolutions:res-p1201 [2021/07/05 12:56] (current) – wflaherty
@@ Line 1: / Line 1: @@
 =====HeapLow=====
-**Level:** __Critical__ FIXME
+**Level:** __Critical__
 **Purpose:**
+This alert was created to identify applications where the allocated memory or heap of the JVM was running low. This can happen for a number of reasons.
+There may be:
+- blocking event.
+- an excessive amount of data being generated by or pushed to the application.
+- RAM running low on the server.
+- A script that is holding a lot of data in memory.
 **Scenario:** <application> on <server> used more than 95% of its Heap for 120s.
 **Resolution:**
+Ensure that the application hasn't yet crash as a result of the HeapLow. If it has, this will require a restart.
 **Manual Action Steps:**
+Ensure you have the latest of the env-config repo and the deployments-playbook repo from bitbucket.
+Assuming you have both repositories in the same folder then run
+<code>
+cd /your/env-config/location
+git checkout master && git pull && cd ../deployment-playbooks && git checkout master && git pull
+</code>
+Restart the application with
+<code>
+ansible-playbook -i ../env-configuration/ServerName/hosts.ini Application.yml --vault-id @prompt -e"actions='stop,start'"
+</code>
+NOTE: Change ServerName and AppName for the actual server and application name
+If the application is refusing to stop with ansible then you will have to log into the server manually.
+Firstly, log into the handler to find the service or application
+<code>
+ssh scotty@server
+</code>
+Check if the application is a service.
+<code>
+systemctl status <service>
+</code>
+If it is a service and you can see its active or stopped then try to restart it with
+<code>
+sudo systemctl restart <service>
+</code>
+If it doesn't have a service then it is just running normally and must be killed then started again. Since Ansible has failed to kill and start the application, we must proceed with:
+Get the application PID with
+<code>
+ps aux | grep application
+</code>
+For just the PID use
+<code>
+ps aux | grep application | awk '{ print $2 }'
+</code>
+Then kill the process with
+<code>
+sudo kill -9 PID
+</code>
+You may have to do this again on the other handler or whichever other server it may be on depending on the application.
+Then try Ansible again
 **Auto Clear:**
+This warning may be temporary during a moment of excess load on the server or something such as RabbitMQ, ElasticSearch or anything else. It may just clear itself.

Sidebar

Internal Errigal Collaboration Wiki

Differences

Sidebar

Internal Errigal Collaboration Wiki

User Tools

Site Tools

Differences

Page Tools