Alarm Audit

tail -f ~/logs/grails/SnmpManager.log | grep AlarmAudit

tail -f ~/logs/grails/SnmpManager.log | grep VRZ_MD_WolfSt_01

The way the Alarm Audit works is that, before it grabs open alarms, it actually polls the Andrews controller for the 'Alarm Severity' settings, and stores them to the poll as DiscoveredAlarmSeverity entries. When it does the actual auditing, it is supposed to exclude discrepancies for alarms that have a match to a disabled alarm for that site in DiscoveredAlarmSeverity. Currently, the DiscoveredAlarmSeverity table ONLY STORES DISABLED ALARMS in the alarm audit process, as storing everything is way too much information. Anyways, naturally, I went looking for relevant DiscoveredAlarmSeverity entries for today's poll which was AutodiscoveryPoll entry 8738. Maybe there was a matching issue or something easy to fix. To look for them:

mysql> select * from autodiscovery_poll where start_date > '2014-11-25'; +——+———+———————+———————+———————–+

id version start_date end_date type

+——+———+———————+———————+———————–+

8737 0 2014-11-25 00:40:01 2014-11-25 06:32:14 AutodiscoveryAlarmJob
8738 1 2014-11-25 06:44:44 2014-11-25 11:35:16 AutodiscoveryAlarmJob

+——+———+———————+———————+———————–+

mysql> desc discovered_alarm_severity; +—————-+————–+——+—–+———+—————-+

Field Type Null Key Default Extra

+—————-+————–+——+—–+———+—————-+

id bigint(20) NO PRI NULL auto_increment
version bigint(20) NO NULL
alarm_code varchar(255) YES NULL
alarm_severity varchar(255) YES NULL
alarm_text varchar(255) NO NULL
element_type varchar(255) NO NULL
hub_id bigint(20) NO MUL NULL
poll_id bigint(20) NO MUL NULL

+—————-+————–+——+—–+———+—————-+

  The domain has a poll_id and a hub_id.  Nice.  Easy to lookup.  So, find the hub_id.

mysql> select id, name from network_element where name like 'ATT_AZ_ParadiseValley_01(X463)'; +——-+——————————–+

id name

+——-+——————————–+

72680 ATT_AZ_ParadiseValley_01(X463)

+——-+——————————–+

Ok, now let's see what was discovered for disabled alarms for this hub in today's poll.

mysql> select * from discovered_alarm_severity where poll_id=8738 and hub_id=72680; Empty set (0.00 sec)

4. UH OH! There were no disabled alarms discovered for this hub today? But Rich said that there were disabled alarms in Arizona… so, something must be up. Perhaps there is an issue in the code?

5. I checked all of the results for today, and sure enough, it did discover quite a few disabled alarms and many of the 'External 2' type for other hubs, but not this one… mysql> select alarm_text, element_type, count(*) from discovered_alarm_severity where element_type='RU' and alarm_text like '%External%' and poll_id=8738 group by 1,2; +———————————-+————–+———-+

alarm_text element_type count(*)

+———————————-+————–+———-+

External 1 Alarm ({User Text}) RU 19
External 1 Output ({User Text}) RU 9
External 2 Alarm ({User Text}) RU 20
External 2 Output ({User Text}) RU 9
External 3 Alarm ({User Text}) RU 18
External 3 Output ({User Text}) RU 9
External 4 Alarm ({User Text}) RU 19
External 4 Output ({User Text}) RU 9

+———————————-+————–+———-+

Hmmm… not a lot, but they are in there…

6. Well, either there is some random discovery issue, or the customer was confused and the alarms are not actually disabled? Let's check that. Visit the controller: http://10.24.8.246:8080/

   >  Select 'ION System' in the tree.
   >  Select 'Settings' at the middle-top
   >  Select 'Alarm Severity'
   >  Select 'RU'

Awe… COME ON MAN!!!! They are not disabled. In fact, they are labelled Critical!!! Ug. Back to the drawing board.

Well, there very well may still be a problem, but, apparently there is some fog floating around all of this. If there is no problem, great, but we have to bill for investigation time and prove to the customer that they are not aware of what is disabled and were too lazy to check.

Again, I am not saying that there is not a problem, just that you can use the above knowledge to investigate further. For example, you can look up all discovered alarm text types for element_type='RU', and look for those in the report, and cherry pick some to investigate.