===== Watchdogs / Prometheus ===== ==== Prerequisites to being on-call / watchdog rota ==== * [[onboarding:introduction:watchdog|Errigal Watchdog]] * [[watchdogs:watchdog_overview|Watchdog Updated Process]] * [[watchdogs:watchdog_alarm_summary_report|Watchdog Alarm Summary Report]] * [[watchdogs:watchdog_alarms|Current watchdog alarms]] * [[watchdogs:on_call_process|Watchdog On Call Process]] * [[watchdogs:informing_the_customer|What alarms do customers need to be informed of?]] * [[support:outage_process_backups|Outage Process Backups]] * [[support:tunneling|Tunneling]] ==== General ==== * [[watchdogs:installing_watchdog_on_a_server|Installing Watchdog on a server]] * [[watchdogs:configuring_watchdog_texts|Configuring Watchdog Texts]] * [[watchdogs:upgrade_watchdog|Upgrade Watchdog Install on a Server]] * [[watchdogs:logs|Looking at the logs for key indicators of potential issues ]] * [[watchdogs:ping|Linux - Ping, trace route and TCP dump diagnostic utilities]] * [[watchdogs:clickatell_texts|Clickatell - Watchdog Text Messages]] * [[watchdogs:smoke_tests|Geb Smoke Tests]] * [[watchdogs:sanity_checks|Geb Sanity Checks]] * [[:atc_noc_portal_alarm|ATC NOC Portal Alarm Email]] * [[watchdogs:TicketerEmailFailedDelivery - RabbitMQ]] * [[watchdogs:RemoteTicketFailedToCreate - RabbitMQ]] ==== Prometheus / Grafana ==== * [[Creating a New Metric]] * [[Creating a New Alert]] ==== Common Watchdogs ==== * [[mysqlSlaveReplicationFailure]] * [[watchdogs:resolving_replication_timeout_issue| Resolving Replication Timeout Issue]] * [[QuartzJobsBlocked]] * [[:watchdog_agent|Watchdog Agent]] * [[:resolution_area:start|Watchdog Resolution Area]]