Level: Critical (1200)
| Major (1000)
| minor (800)
Purpose: Alerts if the active alarm table in the SNMP manager database is out of sync with the active alarm table in the alarm cache database.
Scenario: Alarm cache may not be consuming messages. Check if the logs are processing messages. Alarm cache may need a restart and the RabbitMQ queue may need to be purged.
Resolution:
Run the below query to see how out of sync they are. Follow this wiki article to fix.
select sum(individual_counts) as 'COUNT(*)' from ( (select count(*) as individual_counts from snmp_manager.active_alarm smaa where !smaa.cleared and not (smaa.id in (select id from alarm_cache.active_alarm))) union all (select count(*) from alarm_cache.active_alarm acaa join snmp_manager.active_alarm smaa on smaa.id = acaa.id where smaa.cleared) union all (select count(*) from alarm_cache.active_alarm acaa left join snmp_manager.active_alarm smaa on smaa.id = acaa.id where smaa.id is null) ) as tmp_count_table;
Manual Action Steps: Restart alarm cache, purge the RMQ queue and run the alarm cache audit job to fix the sync issue
Auto Clear: When the above query's result drops below 800.