**Watchdog Agent** **Alarm** CRITICAL - WatchdogAgent : errigalWatchdogStateApplicationFailureNotification - WatchdogApplicationFailureAlarm. **Context** This is one of the most important Watchdog alerts you will see. When a Watchdog cannot start it will generate the Watchdog Agent alert. The Watchdog Agent is part of a Watchdog. **Decision** When this Watchdog is received __you must__: * First, check that the Watchdog is running and if not take the relevant action. * Because the Watchdog Agent is not currently set up to clear itself you must manually clear the alarm. * You will then have to select "Alarm Clear received" on related Ticket(s) too. * On the Node monitor, you can check the "Review the logs" for Watchdog agent alarms and clear them there. * You can alternatively clear the alarm via the Database ( This option is preferred in this circumstance) Please use the following query to check the Alarm has cleared on Atlas select * from active_alarm where cleared = false and context like '%Watchdog%' or via Terminal mysql -uroot -p(add password) -hatlas.err -e "update snmp_manager.active_alarm set cleared = True where cleared is False and context like '%Watchdog%'"; **Consequences** If a Watchdog Agent Watchdog is not actioned it could mean we miss an important alert this could happen as follows: * Watchdog is running but has an active alarm on the Watchdog Agent. * Watchdog fails to start. * The active alarm on the Watchdog Agent means we would not be alerted to Watchdog failing. * No Watchdog running for the system in question could lead to Operations not being informed of a Critical system problem.