User Tools
resolution_area:prometheus_resolutions:res-p1109
SystemdServiceCrashed
Level: Warning
Purpose: To ensure the services that are supposed to be running stay running
Scenario: SystemD service has crashed for 5m.
Resolution: The service is running in a stable state
Manual Action Steps:
- ssh onto the affected server.
- Use `ps aux | grep <app>` to see if the application is still running.
- use `systemctl status <app>` to check the status of the application.
- If the application is trying to restart over and over open /etc/systemd/system/<app>.service
- Edit the `Restart=` line to be off rather than on-failure or always.
- Use `sudo journalctl -ex` to see the logs of the server after attempting to restart the application.
- A problem for some things in the past that weren't written by Errigal was users required for applications.
- ELK stack and MySQL all require an elasticsearch, logstash, kibana and mysql user respectively.
- Sometimes just fully shutting down the service with `sudo systemctl stop <app>` for a minute before trying to start it again with `sudo systemctl start <app>` can help the application recover.
- If the bash prompt is behaving strangely, the server is likely running out of RAM for some reason.
- Another thing worth checking is the disk space. `df -h`
- If this is an Errigal app, you can check the logs at moros.err:5601/app/kibana to see the application logs before the service died.
Auto Clear: Its entirely possible the service will automatically recover
resolution_area/prometheus_resolutions/res-p1109.txt · Last modified: 2021/12/17 13:09 by wflaherty