User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1109

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
resolution_area:prometheus_resolutions:res-p1109 [2021/06/25 10:09] – external edit 127.0.0.1resolution_area:prometheus_resolutions:res-p1109 [2021/12/17 13:09] (current) wflaherty
Line 2: Line 2:
  
 **Level:** Warning **Level:** Warning
- 
  
 **Purpose:** **Purpose:**
 +To ensure the services that are supposed to be running stay running
  
 **Scenario:** SystemD service has crashed for 5m. **Scenario:** SystemD service has crashed for 5m.
  
 **Resolution:** **Resolution:**
 +The service is running in a stable state
  
 **Manual Action Steps:** **Manual Action Steps:**
 +  - ssh onto the affected server.
 +  - Use `ps aux | grep <app>` to see if the application is still running.
 +  - use `systemctl status <app>` to check the status of the application.
 +    - If the application is trying to restart over and over open /etc/systemd/system/<app>.service
 +    - Edit the `Restart=` line to be off rather than on-failure or always.
 +  - Use `sudo journalctl -ex` to see the logs of the server after attempting to restart the application.
 +  - A problem for some things in the past that weren't written by Errigal was users required for applications.
 +    - ELK stack and MySQL all require an elasticsearch, logstash, kibana and mysql user respectively.
 +  - Sometimes just fully shutting down the service with `sudo systemctl stop <app>` for a minute before trying to start it again with `sudo systemctl start <app>` can help the application recover.
 +  - If the bash prompt is behaving strangely, the server is likely running out of RAM for some reason.
 +  - Another thing worth checking is the disk space. `df -h` 
 +  - If this is an Errigal app, you can check the logs at moros.err:5601/app/kibana to see the application logs before the service died.
  
 **Auto Clear:** **Auto Clear:**
 +Its entirely possible the service will automatically recover
resolution_area/prometheus_resolutions/res-p1109.1624612196.txt.gz · Last modified: 2021/06/25 10:09 by 127.0.0.1