User Tools

Site Tools


Writing /app/www/public/data/meta/onboarding/snmpmanager/alarm_-_the_basics.meta failed
onboarding:snmpmanager:alarm_-_the_basics

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
onboarding:snmpmanager:alarm_-_the_basics [2017/05/26 15:00] mmcconboarding:snmpmanager:alarm_-_the_basics [2021/06/25 10:09] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +====== Alarms - The Basics ======
 +
 +Author: John Rellis
 +
 +Alarms are the backbone of the IDMS as they are used to drive ticket creation. The list of current active alarms in the system is available in the SnmpManager Node Monitor.  Alarms are represented in the domain model as the ActiveAlarm.
 +
 +<code>
 +/**
 + * This represents an uncleared trap in the system and will put a
 + * network into an alarm until the cleared flag is set. An active alarm
 + * will reference the trap that created it and the trap that cleared it (if it is cleared)
 + */
 +class ActiveAlarm {
 +  static optionals = ["networkElement", "clearedDate"]
 +  static hasMany = [tickets: RemoteTicket]
 +  static transients = ['foundGtsComponent','createNeutralJson']
 +
 +  //static searchable=true
 +  NetworkElement networkElement
 +  // The gts that created this alarm
 +  GeneralTrapSummary creatingGTS
 +  // The gts that cleared the alarm if the alarm is cleared
 +  GeneralTrapSummary clearingGTS = null
 +  // Boolean to determine if the alarm is cleared or not
 +  boolean cleared = false
 +  // This is an alarm type. It will also need archtecture info
 +  // e.g. type=AIMOS_ALARM:3045
 +  String type = "n/a"
 +  // This is some kind of id that will connect a clearTrap with an original alarm
 +  // e.g. context=AIMOS_PHILADELPHIA:72340
 +  String context = "n/a"
 +  // If the same alarm comes in multiple times for the same network element
 +  // this counter will be set rather than start a new alarm
 +  int repeatsReceived = 0;
 +  // Date alarm is created
 +  Date createdDate = new Date()
 +  // If the alarm has changed status (e.g. repeat received, severity changed or cleared, this will update)
 +  Date statusUpdatedDate = new Date();
 +  Date clearedDate = null
 +  // This reason is pulled from the clear trap or is set as "manual" if cleared manually
 +  String clearedReason = "n/a"
 +  // This is trap specific. This is used for the porposes of clearing so that networkElement-alarmIdentifier is
 +  // a unique key to locate and clear an alarm
 +  String alarmIdentifier = "n/a"
 +  // This is usually something like "ok","minor", "major" "critical" etc
 +  String status = "n/a"
 +  // This was to store the previous status of the alarm so that after a "clear" you can see what it was originally
 +  String previousStatus = "n/a"
 +  boolean acknowledged = false
 +  Date acknowledgedDate
 +
 +  transient Boolean createNeutralJson = false
 +  transient GeneralTrapSummaryComponent foundGtsComponent = null
 +
 +....
 +
 +}
 +</code>
 +
 +An understanding of the fields will reveal its behaviour inside the application, some obvious fields are left out, for example, cleared and clearedDate
 +
 +  * alarmIdentifier
 +    * This is the main identifier for the alarm for example, "Unit Unavailable" or a numerical alarmCode.  Vendors typically provide a unique alarm identifier for each of their alarms in a single varbind, however, if this is not the case, a combination of varbinds may be used to create this field.  The alarm identifier needs to be unique per mib, that is, if Commscope uses an alarmIdentifer of "Unit Unavailable", no other alarms on Commscope equipment should use that alarmIdentifier.  This is because GeneralTrapSummaryWrapper#getActiveAlarm queries the activeAlarm table for a matching networkElement and alarmIdentifer combination.  If it finds one, it assumes this is a repeat alarm and GeneralTrapSummaryWrapper#getActiveAlarmWrapper will increment the repeatsReceived field, if it does not, a new ActiveAlarm is created.  Therefore if different alarms are using the same alarmIdentifier, some alarms will be masked.
 +  * status
 +    * This is the current severity of the alarm, this should be one of com.errigal.snmpmanager.trap.AlarmStatus#toString.  This is set from the creatingGTS's severity in GeneralTrapSummaryWrapper#getActiveAlarmWrapper.  Typically the status is one of CLEARED, INFORMATION, MINOR, MAJOR, CRITICAL.  This may be determined from the varbinds of the creating trap or the TrapKnowledge#overrideSeverity.  A status of clear indicates that the alarm is no longer present and will trigger clearing of any remote tickets or removal from the scheduler  
 +  * networkElement
 +    * This is the network element that the alarm occurred on, this is typically assigned in the trap rule or the trap rule helper class
 +  * creatingGTS
 + * This is the GeneralTrapSummary that initiated the creation of this alarm, this is set in the GeneralTrapSummaryWrapper#getActiveAlarmWrapper
 +  * clearingGTS
 + * This is the GeneralTrapSummary that cleared the alarm, this is set in the GeneralTrapSummaryWrapper#getActiveAlarmWrapper
 +  * context
 + * This is a field that has less meaning in the current IDMS implementation and was used more in previous releases
 +  * repeatsReceived
 +    * If an activeAlarm is received with the same status and alarmIdentifier on the same networkElement, then this count is incremented in favour of creating a new alarm. 
 +  * clearedReason
 + * The reason this alarm cleared, if cleared by a clear trap, this is "Clear Received".
 +  * acknowledged
 +    * If there has been an attempt to create a RemoteTicket with this activeAlarm an alarm will be marked as acknowledged
 +  * createNeutralJson
 +    * If the creatingGTS is found to have come from a Neutral Host installation this is set to true and the affected carriers are included in the NodeMonitor JSON.  Note, this is a transient field.
 +  * foundGtsComponent
 +    * If it has been determined during creation of the GeneralTrapSummary that the alarm targets a specific component inside the NetworkElement then it is set here.  Note, this is a transient field.
 +  * tickets
 +    * An activeAlarm can generate tickets in the form of RemoteTicket's.  These represent a Ticket created in the IDMS ticketer, depending on the severity of the alarm there may be a time delay before the ticket is created.  A ticket can also be created from an alarm from the Acknowledge button in the SnmpManager NodeMonitor, this will only succeed if the alarm has a task in the SnmpManager Scheduler.  Note that the code supports many tickets per activeAlarm, however in reality it is typically one ticket per alarm.  This limitation may exist in other parts of the IDMS so if we introduce multiple tickets for active alarm, it must be confirmed that the NocPortal supports it for example.
 +  
 +
 +----
 +
 +
 +===== Scheduling of Alarms =====
 +
 +
 +Active Alarm creation is typically triggered from trap rules via the GeneralTrapSummaryWrapper#scheduleAlarmAndTicketForLaterWithForms method.  
 +
 +This will schedule an alarm and ticket to create after a certain amount of time.  
 +
 +It is possible to have a different time delay for the alarm and the ticket. 
 +
 +A typical set up is to wait 10 seconds to create an alarm and X number of minutes to create a ticket, the X number of minutes is typically decided by the severity of the alarm, the higher the severity, the shorter the time.  
 +
 +This is achieved via the SnmpManager Scheduler.  
 +
 +If a clear is received within these 10 seconds, the alarm is never created and is removed from the Scheduler.
 +
 +
 +----
 +
 +
 +===== Self Assessment =====
 +
 +  * List all the possible alarm status's in order of lowest to highest severity and where in the SnmpManager codebase are these defined?
 +  * How are active alarm and tickets scheduled in the SnmpManager?  What Groovy construct are they stored as in the Scheduler?
 +  * If a trap is received indicating a "Unit Unavailable" on a commscope ION-M or ION-U remote unit, how long do we wait for the alarm and ticket to be created in the SnmpManager?  Note, the trap that creates these is aimosAlarmNew, you should use the Errigal QA system to research this, also include an existing Alarm and Ticket pair.
 +  * How is the status of the alarm determined?  (There's more than one way)
 +  * If a network element has a CRITICAL "RMS Level Low" alarm active on it, what happens if the alarm is received again with the same severity on the same network element?