tail -f ~/logs/grails/SnmpManager.log | grep AlarmAudit
tail -f ~/logs/grails/SnmpManager.log | grep VRZ_MD_WolfSt_01
The way the Alarm Audit works is that, before it grabs open alarms, it actually polls the Andrews controller for the 'Alarm Severity' settings, and stores them to the poll as DiscoveredAlarmSeverity entries. When it does the actual auditing, it is supposed to exclude discrepancies for alarms that have a match to a disabled alarm for that site in DiscoveredAlarmSeverity. Currently, the DiscoveredAlarmSeverity table ONLY STORES DISABLED ALARMS in the alarm audit process, as storing everything is way too much information. Anyways, naturally, I went looking for relevant DiscoveredAlarmSeverity entries for today's poll which was AutodiscoveryPoll entry 8738. Maybe there was a matching issue or something easy to fix. To look for them:
mysql> select * from autodiscovery_poll where start_date > '2014-11-25'; +——+———+———————+———————+———————–+
| id | version | start_date | end_date | type |
+——+———+———————+———————+———————–+
| 8737 | 0 | 2014-11-25 00:40:01 | 2014-11-25 06:32:14 | AutodiscoveryAlarmJob |
| 8738 | 1 | 2014-11-25 06:44:44 | 2014-11-25 11:35:16 | AutodiscoveryAlarmJob |
+——+———+———————+———————+———————–+
mysql> desc discovered_alarm_severity; +—————-+————–+——+—–+———+—————-+
| Field | Type | Null | Key | Default | Extra |
+—————-+————–+——+—–+———+—————-+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| version | bigint(20) | NO | NULL | ||
| alarm_code | varchar(255) | YES | NULL | ||
| alarm_severity | varchar(255) | YES | NULL | ||
| alarm_text | varchar(255) | NO | NULL | ||
| element_type | varchar(255) | NO | NULL | ||
| hub_id | bigint(20) | NO | MUL | NULL | |
| poll_id | bigint(20) | NO | MUL | NULL |
+—————-+————–+——+—–+———+—————-+
The domain has a poll_id and a hub_id. Nice. Easy to lookup. So, find the hub_id.
mysql> select id, name from network_element where name like 'ATT_AZ_ParadiseValley_01(X463)'; +——-+——————————–+
| id | name |
+——-+——————————–+
| 72680 | ATT_AZ_ParadiseValley_01(X463) |
+——-+——————————–+
Ok, now let's see what was discovered for disabled alarms for this hub in today's poll.
mysql> select * from discovered_alarm_severity where poll_id=8738 and hub_id=72680; Empty set (0.00 sec)
4. UH OH! There were no disabled alarms discovered for this hub today? But Rich said that there were disabled alarms in Arizona… so, something must be up. Perhaps there is an issue in the code?
5. I checked all of the results for today, and sure enough, it did discover quite a few disabled alarms and many of the 'External 2' type for other hubs, but not this one… mysql> select alarm_text, element_type, count(*) from discovered_alarm_severity where element_type='RU' and alarm_text like '%External%' and poll_id=8738 group by 1,2; +———————————-+————–+———-+
| alarm_text | element_type | count(*) |
+———————————-+————–+———-+
| External 1 Alarm ({User Text}) | RU | 19 |
| External 1 Output ({User Text}) | RU | 9 |
| External 2 Alarm ({User Text}) | RU | 20 |
| External 2 Output ({User Text}) | RU | 9 |
| External 3 Alarm ({User Text}) | RU | 18 |
| External 3 Output ({User Text}) | RU | 9 |
| External 4 Alarm ({User Text}) | RU | 19 |
| External 4 Output ({User Text}) | RU | 9 |
+———————————-+————–+———-+
Hmmm… not a lot, but they are in there…
6. Well, either there is some random discovery issue, or the customer was confused and the alarms are not actually disabled? Let's check that. Visit the controller: http://10.24.8.246:8080/
> Select 'ION System' in the tree. > Select 'Settings' at the middle-top > Select 'Alarm Severity' > Select 'RU'
Awe… COME ON MAN!!!! They are not disabled. In fact, they are labelled Critical!!! Ug. Back to the drawing board.
Well, there very well may still be a problem, but, apparently there is some fog floating around all of this. If there is no problem, great, but we have to bill for investigation time and prove to the customer that they are not aware of what is disabled and were too lazy to check.
Again, I am not saying that there is not a problem, just that you can use the above knowledge to investigate further. For example, you can look up all discovered alarm text types for element_type='RU', and look for those in the report, and cherry pick some to investigate.