<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.8" -->
<?xml-stylesheet href="http://3.86.49.49/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="http://3.86.49.49/feed.php">
        <title>Internal Errigal Collaboration Wiki watchdogs</title>
        <description></description>
        <link>http://3.86.49.49/</link>
        <image rdf:resource="http://3.86.49.49/lib/tpl/docnavwiki/images/favicon.ico" />
       <dc:date>2026-04-17T13:30:26+00:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:clickatell_texts&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:configuring_watchdog_texts&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:creating_a_new_alert&amp;rev=1678120264&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:creating_a_new_metric&amp;rev=1678119399&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:informing_the_customer&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:installing_watchdog_on_a_server&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:logs&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:mysqlslavereplicationfailure&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:on_call_process&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:ping&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:quartzjobsblocked&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:remoteticketfailedtocreate_-_rabbitmq&amp;rev=1678095073&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:resolving_replication_timeout_issue&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:sanity_checks&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:smoke_tests&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:start&amp;rev=1677861837&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:ticketeremailfaileddelivery_-_rabbitmq&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:upgrade_watchdog&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:watchdog_alarms&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:watchdog_alarm_summary_report&amp;rev=1624612196&amp;do=diff"/>
                <rdf:li rdf:resource="http://3.86.49.49/doku.php?id=watchdogs:watchdog_overview&amp;rev=1624612196&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="http://3.86.49.49/lib/tpl/docnavwiki/images/favicon.ico">
        <title>Internal Errigal Collaboration Wiki</title>
        <link>http://3.86.49.49/</link>
        <url>http://3.86.49.49/lib/tpl/docnavwiki/images/favicon.ico</url>
    </image>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:clickatell_texts&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:clickatell_texts</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:clickatell_texts&amp;rev=1624612196&amp;do=diff</link>
        <description>Clickatell - Watchdog Text Messages

Most of the stuff is done by them and theres info of how to on their site

We send an email to clickatell and they send us texts

&lt;https://atlas.err:8083/Ticketer/groovlet/show/233&gt;

SendSMSandEmail

AlarmId to send text is at the end of the - in the watchdog emails</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:configuring_watchdog_texts&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:configuring_watchdog_texts</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:configuring_watchdog_texts&amp;rev=1624612196&amp;do=diff</link>
        <description>Configuring Watchdog Texts

Author: Anna Dowling

There is currently one source of text messages within Errigal.

	*  These all come from the Ticketer that is used by Watchdog (Currently Atlas).
	*  The texts initiate from a groovlet in the Ticketer ID number: 233</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:creating_a_new_alert&amp;rev=1678120264&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2023-03-06T16:31:04+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:creating_a_new_alert</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:creating_a_new_alert&amp;rev=1678120264&amp;do=diff</link>
        <description>Creating a New Alert on Prometheus

Author: Máté Domonics on 06/03/2023

If you haven&#039;t created a new metric yet, please refer to this wiki page.

Creating a Wiki entry for your alarm

When creating a new alarm, it is invaluable to create a new wiki page entry, where you detail what action should be taken if this alarm is firing.
Follow these steps to create a new wiki page:</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:creating_a_new_metric&amp;rev=1678119399&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2023-03-06T16:16:39+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:creating_a_new_metric</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:creating_a_new_metric&amp;rev=1678119399&amp;do=diff</link>
        <description>Creating a New Metric on Prometheus

Author: Máté Domonics on 06/03/2023

SQL Exporter

SQL Exporter is used to test your SQL queries before deploying them. Use this link to download it. (I downloaded the darwin-amd64 version)

Usage of SQL Exporter

	*</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:informing_the_customer&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:informing_the_customer</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:informing_the_customer&amp;rev=1624612196&amp;do=diff</link>
        <description>What alarms do customers need to be informed of?

In general we don&#039;t have to inform the customers of every little problem we deal with. What we have to inform them of is anything that&#039;s classed as an “outage”. It&#039;s also good practice to inform them of any issues that they&#039;re likely to run into themselves anyway. If we tell them before they run into it and have to tell us it looks better on us and saves face.</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:installing_watchdog_on_a_server&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:installing_watchdog_on_a_server</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:installing_watchdog_on_a_server&amp;rev=1624612196&amp;do=diff</link>
        <description>Installing Watchdog on a server

Author: Eoin Joy   Edited by: Andrey Shevyakov, Colm Carew

Ensure that there is a mysqld running on the server as well and that there is a watchdog database present on the instance, if you need to install mysqld : Install MySQL on a Server - REHL 6.6</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:logs&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:logs</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:logs&amp;rev=1624612196&amp;do=diff</link>
        <description>Looking at the logs for key indicators of potential issues

Overall investigation is a difficult task to summarise as there are a variety of things that could be the issue. It is important to maintain a level head and not be overwhelmed. A good method of following for actually solving a problem is 1) Recreate the Problem and verify it is still there 2) Apply your proposed solution and try recreate the problem multiple times. If it does not happen again you can say you&#039;ve fixed the issue but if i…</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:mysqlslavereplicationfailure&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:mysqlslavereplicationfailure</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:mysqlslavereplicationfailure&amp;rev=1624612196&amp;do=diff</link>
        <description>mysqlSlaveReplicationFailure

Author: Yanjun Wang

	*  Step to Resolve Query Running Timeout Issue

	*  login into slave database
	*  run the following commands and copy the result which will be used for further investigation


show slave status\G
show processlist;
show engine innodb status\G
show open tables where In_Use &gt; 0;</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:on_call_process&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:on_call_process</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:on_call_process&amp;rev=1624612196&amp;do=diff</link>
        <description>On Call Process

Author: Eileen Dillon

The current escalation process that is communicated to our customers is as follows:
Customer Errigal Escalation Process

Support Rota Dexcomm Rota sheet

Outages can occur as follows:

	*  Customer reports an issue - via critical text message or email.
	*  Critical Watchdogs - text message &amp; emails. Be aware, the customer maybe copied on some Watchdogs.</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:ping&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:ping</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:ping&amp;rev=1624612196&amp;do=diff</link>
        <description>Linux - Ping, trace route and TCP dump diagnostic utilities

Ping

Used to determine if you can reach a server

A Successful Ping :


ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=57 time=12.892 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=57 time=12.521 ms</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:quartzjobsblocked&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:quartzjobsblocked</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:quartzjobsblocked&amp;rev=1624612196&amp;do=diff</link>
        <description>Author: Stephanie Song - Sept 12, 2018
Edited: Bartosz Osowski - April 10th, 2020

Sample QuartzJobsBlocked Watchdog:

WATCHDOG-787755 - Alarm Received : extdb1 - 10.230.10.14 - CRITICAL - ticketerDbCheck : errigalDatabaseMySqlQuartzCheckJobFailureAlarm - QuartzJobsBlocked</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:remoteticketfailedtocreate_-_rabbitmq&amp;rev=1678095073&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2023-03-06T09:31:13+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:remoteticketfailedtocreate_-_rabbitmq</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:remoteticketfailedtocreate_-_rabbitmq&amp;rev=1678095073&amp;do=diff</link>
        <description>RabbitMQ - RemoteTicketFailedToCreate

Author: Eoin Hearne

A prometheus alert will come in similar to [FIRING:1] RemoteTicketFailedToCreate prodext (extlb1:15692 rabbitmq_3_8 remote.ticket.dead.letter.queue critical admin). If this happens, it signifies there&#039;s a remote ticket that failed to create a ticket for an unknown reason. The application sends JSON to the RabbitMQ remote.ticket.create.queue and will try to create a ticket, with a given number of retries, currently defaulted to 5. If it …</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:resolving_replication_timeout_issue&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:resolving_replication_timeout_issue</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:resolving_replication_timeout_issue&amp;rev=1624612196&amp;do=diff</link>
        <description>Resolving Replication Timeout Issue

Author: Yan Wang

Steps to resolve replication timeout issue:

1. login into slave database

2. run show slave status\G

3. check the Last_Errno and Last_Error to identify if it&#039;s a timeout issue

4. run SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; to skip that query(will be run manually later) - AWS RDS can run the query, check here for sulotion:</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:sanity_checks&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:sanity_checks</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:sanity_checks&amp;rev=1624612196&amp;do=diff</link>
        <description>Geb Sanity Checks

Project Location

Project:&lt;https://bitbucket.org/errigal/sanity-checks-geb/src/master/&gt;

Overview

This project runs through the basic sanity checks against a given environment. This used to be a manual task after deployments.

When to Use

In the event of an outage or any of the customer&#039;s servers experiencing an unexpected reboot, the sanity tests can be used to ensure that when the applications are restarted, they pass the basic sanity checks.</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:smoke_tests&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:smoke_tests</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:smoke_tests&amp;rev=1624612196&amp;do=diff</link>
        <description>Geb Smoke Tests

Project Location

Project:&lt;https://bitbucket.org/errigal/smoke-tests-geb/src/master/&gt;

Overview

This project&#039;s main goal is to implement the basic suite of smoke tests to verify the sanity of all apps in a given environment. Please refer to the project&#039;s README for further information and installation instructions.</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:start&amp;rev=1677861837&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2023-03-03T16:43:57+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:start</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:start&amp;rev=1677861837&amp;do=diff</link>
        <description>Watchdogs / Prometheus

Prerequisites to being on-call / watchdog rota

	*  Errigal Watchdog
	*  Watchdog Updated Process
	*  Watchdog Alarm Summary Report
	*  Current watchdog alarms 
	*  Watchdog On Call Process
	*  What alarms do customers need to be informed of?
	*  Outage Process Backups
	*  Tunneling

General

	*  Installing Watchdog on a server
	*  Configuring Watchdog Texts
	*  Upgrade Watchdog Install on a Server
	*  Looking at the logs for key indicators of potential issues 
	*  Linux …</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:ticketeremailfaileddelivery_-_rabbitmq&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:ticketeremailfaileddelivery_-_rabbitmq</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:ticketeremailfaileddelivery_-_rabbitmq&amp;rev=1624612196&amp;do=diff</link>
        <description>RabbitMQ - TicketerEmailFailedDelivery

Author: Aaron Mooney

The interface for the RabbitMQ email queues can be found at loadbalancer:15672 -&gt; `&lt;http://extlb.ext:15672&gt;` and the credentials can be found in pwsafe.
Once logged in you will be presented with this interface. Click on the</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:upgrade_watchdog&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:upgrade_watchdog</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:upgrade_watchdog&amp;rev=1624612196&amp;do=diff</link>
        <description>Upgrade Watchdog Install on a Server

Author: Eoin Joy

	*  Get the latest Watchdog release build from the artifactory (&lt;http://errigalartifactory.err:8080/artifactory&gt;)(as of this writing: WatchDog-REL-2.2.0.tar.gz)
	*  Transfer this to the server you wish to install on.
			*  scp WatchDog-REL-2.2.0.tar.gz scotty@qaerrigalapps1.crc:/export/home/scotty/temp</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:watchdog_alarms&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:watchdog_alarms</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:watchdog_alarms&amp;rev=1624612196&amp;do=diff</link>
        <description>Watchdog Alarm Identifiers

Here is an in-progress list of all the current watchdog alarms, what they mean and what you should do if we get them.

UnableToRunCommand

The UnableToRunCommand alarms are a special case that can show up with any alarm ID. It means rather than the alarm being triggered because the threshold was breached the threshold check failed to run correctly at all. This is serious as it means watchdog is currently broken on that server. Possible causes: A database can&#039;t be reac…</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:watchdog_alarm_summary_report&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:watchdog_alarm_summary_report</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:watchdog_alarm_summary_report&amp;rev=1624612196&amp;do=diff</link>
        <description>Watchdog Alarm Summary Report

Author: Sophie Renshaw

On both Crown Castle and ExteNet we have the Watchdog Alarm Summary Report. It is hidden to the customer but lets us see what watchdogs have not cleared on the relevant servers.

As part of the support process whoever is on Level 2 (Watchdog Rota) in the morning should be running the report for both customers to check for any watchdogs that haven’t cleared. If there are any on either report they need to be investigated and reported to ensure…</description>
    </item>
    <item rdf:about="http://3.86.49.49/doku.php?id=watchdogs:watchdog_overview&amp;rev=1624612196&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2021-06-25T10:09:56+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>watchdogs:watchdog_overview</title>
        <link>http://3.86.49.49/doku.php?id=watchdogs:watchdog_overview&amp;rev=1624612196&amp;do=diff</link>
        <description>The Watchdog Process

Author: Sophie Renshaw

	*  We receive a Watchdog indicating that there is an issue with one of the servers that we are monitoring. We will receive this notification via email and, if it is a critical alarm, a text message.
	*  The person that is on the Watchdog Rota or On Call, if we get the notification at the weekend, will click on the link in the email. This will open the watchdog ticket in the mobile ticketer, which is publicly accessible, so it can be accessed outside…</description>
    </item>
</rdf:RDF>
