User Tools

Site Tools


resolution_area:watchdog_resolutions:res-w9104

AlarmCacheOutOfSync

Level: Critical (1200) FIXME | Major (1000) :!: | minor (800)

Purpose: Alerts if the active alarm table in the SNMP manager database is out of sync with the active alarm table in the alarm cache database.

Scenario: Alarm cache may not be consuming messages. Check if the logs are processing messages. Alarm cache may need a restart and the RabbitMQ queue may need to be purged.

Resolution: Run the below query to see how out of sync they are. Follow this wiki article to fix. select sum(individual_counts) as 'COUNT(*)' from ( (select count(*) as individual_counts from snmp_manager.active_alarm smaa where !smaa.cleared and not (smaa.id in (select id from alarm_cache.active_alarm))) union all (select count(*) from alarm_cache.active_alarm acaa join snmp_manager.active_alarm smaa on smaa.id = acaa.id where smaa.cleared) union all (select count(*) from alarm_cache.active_alarm acaa left join snmp_manager.active_alarm smaa on smaa.id = acaa.id where smaa.id is null) ) as tmp_count_table;

Manual Action Steps: Restart alarm cache, purge the RMQ queue and run the alarm cache audit job to fix the sync issue

Auto Clear: When the above query's result drops below 800.

resolution_area/watchdog_resolutions/res-w9104.txt · Last modified: 2021/07/05 12:30 by 10.91.120.28