User Tools

Site Tools


resolution_area:prometheus_resolutions:res-p1109

SystemdServiceCrashed

Level: Warning

Purpose: To ensure the services that are supposed to be running stay running

Scenario: SystemD service has crashed for 5m.

Resolution: The service is running in a stable state

Manual Action Steps:

  1. ssh onto the affected server.
  2. Use `ps aux | grep <app>` to see if the application is still running.
  3. use `systemctl status <app>` to check the status of the application.
    1. If the application is trying to restart over and over open /etc/systemd/system/<app>.service
    2. Edit the `Restart=` line to be off rather than on-failure or always.
  4. Use `sudo journalctl -ex` to see the logs of the server after attempting to restart the application.
  5. A problem for some things in the past that weren't written by Errigal was users required for applications.
    1. ELK stack and MySQL all require an elasticsearch, logstash, kibana and mysql user respectively.
  6. Sometimes just fully shutting down the service with `sudo systemctl stop <app>` for a minute before trying to start it again with `sudo systemctl start <app>` can help the application recover.
  7. If the bash prompt is behaving strangely, the server is likely running out of RAM for some reason.
  8. Another thing worth checking is the disk space. `df -h`
  9. If this is an Errigal app, you can check the logs at moros.err:5601/app/kibana to see the application logs before the service died.

Auto Clear: Its entirely possible the service will automatically recover

resolution_area/prometheus_resolutions/res-p1109.txt · Last modified: 2021/12/17 13:09 by wflaherty