The Watchdog Process

Author: Sophie Renshaw

  1. We receive a Watchdog indicating that there is an issue with one of the servers that we are monitoring. We will receive this notification via email and, if it is a critical alarm, a text message.
  2. The person that is on the Watchdog Rota or On Call, if we get the notification at the weekend, will click on the link in the email. This will open the watchdog ticket in the mobile ticketer, which is publicly accessible, so it can be accessed outside of the office.
  3. NOTE: DO NOT LOG IN WITH THE ADMIN ACCOUNT - PLEASE USE YOUR OWN ACCOUNT PROVIDED
  4. The person Investigating the Watchdog should move the ticket to the Under Investigation state. We will get another email, sent to developers, informing us that the ticket has been moved to Under Investigation. We will see the login of the person who moved the ticket in the subject of the Under Investigation email.
  5. Once the Investigation has been completed, the Under Investigation email should be replied to with any findings.
  6. We will get another email when the alarm has cleared.

Once you are finished onboarding, there is more detail relating to the Watchdogs process which you can find here: Watchdogs


Errigal Watchdog - Resource Alarm Workflow


SendSMSandEmail Groovlet (ID: 233)

arg, defaultArg, ticket ->
 
/*
This groovlet is used to send a ticket email using a custom email template (defined in the workflow node 
entrance rule)to a group of users who should be notified about a ticket. 
 
If the email sends, the ticket will enter a successInSendingEmailState state and if it fails to send the ticket
will enter a failedToSendEmailState state. These are also defined in the workflow node rule as arguments.
 
If the groovlet fails to fully execute, any exceptions thrown will be sent to the "Failure email address" listed
*/
 
/*
Defining the sendSMS closure
*/
  //Save on Users needed to resolve the issue that was being seen in SUPPORT-216, whereby the groovlet was failing due to different users interacting with the ticket
  def ticketStatusesByUsers = ticket.statuses.statusBy
  ticketStatusesByUsers.each(){
    com.errigal.ticketer.User user = it
    user.save()
  }
  log.info "Ticket Statuses By: $ticketStatusesByUsers"
 
  def sendEmail = {email,body -> final String username = "ticketer@errigal.com"
        final String password = "notsame1"
 
        Properties props = new Properties()
        props.put("mail.smtp.starttls.enable", "true")
        props.put("mail.smtp.auth", "true")
        props.put("mail.smtp.host", "smtp.gmail.com")
        props.put("mail.smtp.port", "587")
 
        javax.mail.Session session = javax.mail.Session.getInstance(props,
          new javax.mail.Authenticator() {
            protected javax.mail.PasswordAuthentication getPasswordAuthentication() {
                return new javax.mail.PasswordAuthentication(username, password)
            }
          })
 
        try {
 
            javax.mail.Message message = new javax.mail.internet.MimeMessage(session)
            message.setFrom(new javax.mail.internet.InternetAddress("peter.phelan@errigal.com"))
            message.setRecipients(javax.mail.Message.RecipientType.TO,
                                  javax.mail.internet.InternetAddress.parse(email))
            message.setSubject("")
            message.setText(body)
 
            javax.mail.Transport.send(message)
 
        } catch (javax.mail.MessagingException e) {
            throw new RuntimeException(e)
        } 
  }
 
 
log.info "Default arg:  $defaultArg"
log.info "Argument:  $arg"
log.info "Ticket: $ticket"
 
def args = arg.split(",")
 
def stateName
def successInSendingEmailState
def failedToSendEmailState//what happens if no fail state provided
 
 log.info "Groovlet has received correct amount of arguments. Executing" 
 
 stateName = args[0].trim();
 
 def emailService = com.errigal.ticketer.utils.DomainUtils.getGrailsService("emailService")
 
com.errigal.ticketer.Visibility ticketVisibility = com.errigal.ticketer.Visibility.findByName(ticket.visibility)
log.info "Visibility: $ticketVisibility"
 
com.errigal.ticketer.EmailAccount fromEmailAccount = ticketVisibility.emailAccount
log.info "Email Account: $fromEmailAccount"
 
//TODO: Support exit email too
def entranceEmail = ticket.workflow.nodes.find{it.name == ticket.currentStatus.status}.entranceEmail
 
log.info "Email to use: ${entranceEmail}"
log.info "Ticket ID: ${ticket.id}"
 
def emailVo = emailService.getCustomEmailVOWithoutSecurityCheck(ticket.id, entranceEmail)
log.info "Email VO: $emailVo"
 
try{
 emailService.sendEmail(emailVo)
}
catch(Exception e){
 log.error "Unable to send email in Groovlet", e 
}
 
Calendar cal = Calendar.getInstance(); // creates calendar
cal.setTime(new Date()); // sets calendar time/date
Boolean hasAlarmID = false
int currentHour =  cal.get(Calendar.HOUR_OF_DAY)
def acceptedAlarmIDs = ["remoteTicketCreationInactive", "LinkPollerInactive","TrapForwarderInactive","QuartzJobsBlocked","NextFireTimeInThePast","DefaultEmailUpdatePollJob","JBossProcessInactive","ApplicationServerProcessInactive","OutOfMemoryErrorFound",
                        "PermGenErrorFound","mysqlSlaveReplicationFailure","ScriptNotRunning", "TrapParsingInactive",
                        "mysqlFailure","heartbeatMissing", "TrapCountBreached", "IrisLicenseCheckFailed", "UnableToRunCommand","MobiTrapInactivity","MobiTicketInactivity", "handlerFailover", "MySQLJDBCCommunicationsExceptionCommunicationLinkFailure"]
acceptedAlarmIDs.each{it -> if(ticket.summary.contains(it)){hasAlarmID = true}}
 
if(hasAlarmID){
  log.info "Sending Text for ${ticket.id}"
} else {
  log.info "Could not find any of ${acceptedAlarmIDs} in '${ticket.summary}' for ${ticket.id}"
}
 
 
String body = """api_id:3513334 \r
user:errigalwaterford \r
password:#errigal321!# \r
from:Errigal \r
to:353851190587,353863997160,353870574522,353861568567,353871246696,353868441173,353861703468, 353861703468, 353872777914, 353871477066 \r
text:$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary"""
 
if(currentHour >= 16 && currentHour <= 23 && hasAlarmID)
{
sendEmail("4157480023@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("4155598389@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("4157264987@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("4157206583@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("6282287260@tmomail.net","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
}
else if (currentHour >= 0 && currentHour <= 7 && hasAlarmID)
{
  sendEmail("sms@messaging.clickatell.com",body)
}
else if (hasAlarmID)
{
sendEmail("4157480023@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("4155598389@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("4157264987@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("4157206583@vtext.com","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("6282287260@tmomail.net","$ticket.idWithPrefix - $ticket.currentStatus.status : $ticket.summary")
sendEmail("sms@messaging.clickatell.com",body)
}
 
 
// This below code is used to change state if the email has been sent (i.e. "Alarm Received" -> "Notification Sent")
if (ticket.parent) {
  log.info "Ticket $ticket.id is no longer a top-level Ticket; cancelling automatic State Change."
} else {
  // Change the Ticket State if it is a top-level Ticket
  if (com.errigal.ticketer.GNode.findByWorkflowAndName(ticket.workflow, stateName)) {
    log.info "Updating Ticket $ticket.id Status from $ticket.currentStatus.status to $stateName."
    ticket.updateStatus(stateName)
    def saveResult = ticket.save(failOnError: true)
    if (saveResult) {
      log.info "State change result:  $saveResult"
    } else {
      def sb = new StringBuilder("Save failed ")
      if (hasErrors()) {
        sb.append " errors:  "
        errors.allErrors.each {sb.append it}
      } else {
        sb.append " no error messages."
      }
      log.info sb
    }
  } else {
    log.error "$stateName was not found in the $ticket.workflow.name Workflow.  Aborting."
  }
 }

Creating a Watchdog

  1. Log onto the server which you want to create the watchdog on
  2. Go to the following directory; /export/home/scotty/watchdog/resources
  3. Add your new rule to the bottom of the ResourceConfig.groovy file in this dir, e.g.
    'GeneralTrapSummaryCreation' {
        type = 'MYSQLDB'
            parameters {
              driver = 'com.mysql.jdbc.Driver'
              host = 'localhost'
              port = '3306'
              database = 'snmp_manager'
              user = 'root'
              password = 'ozzrules'
              checkRowCountQuery = 'select count(*) from general_trap_summary where received_date > now() - interval 5 minute;'
            }
            thresholds {
              a = [name: 'mysqlRowCount', type: 'MIN', value: [0], level: ['CRITICAL'], alarmId: 'generalTrapSummaryCreationInactive']
            }
      }
  4. Double check the DB query on the DB to ensure that it is correct(careful as its production!)
  5. Once the rule is added and your query is checked, then the rule to the main method(called localSystem in the code) to be invoked, e.g.
    localSystem {
        ....
        dd = 'ActiveAlarmCreation'
        ee = 'GeneralTrapSummaryCreation'
    }