====== ATC Availability Reports ====== ====== Reports ====== * Availability Raw Data * . BreakDown /List of Alarms – Quite Heavy, should only be run for a specific site or for a short period (e.g. 2 days) * Availability Report by Region - 1 report for the whole system, can be filtered by regions * Availability Report by Carrier - Availability of each site is calculated separately for each carrier. * Network Summary - A quick and formatted overview displaying the overall system availability and the comparison year to date. Sabrina sends this to upper management. * Widgets ====== Type of Alarms====== Alarms need to attend all requirements below to affect availability. * Associated to remotes and hosts) * Severity Critical * Not be created during a maintenance period are ignored * Belong to the following technologies:'ADC','ADC_CXD','ADC_FUSION','ADC_LGCY','ADC_SPECTRUM', 'ADC_URH', 'ADC_PRISM','ANDREWS','ANDREWS_IONB','ANDREWS_IONU','ANDREWS_TSUN','ANDREW_BDA','MOBILE_ACCESS','Mobile Access BR One','POWERWAVE','SOLID','SOLID_REL6','TEKO_TELECOM','TEKO TELECOM' * 'On Air' and 'Off-air accepted' status * ne.show_in_monitor = TRUE ====== Cache ====== Because of the high amount of data, the query to calculate the availability for long periods can take extremely long to execute. To allow the report and widgets to run faster (and to make it easier to maintain the data consistent), a 'cache' was implemented. These are run daily and monthly and saved in the following database tables * reporting_manager.availability_by_region_cache_monthly * reporting_manager.availability_by_region_and_carrier_cache_monthly * reporting_manager.availability_by_region_cache * reporting_manager.availability_by_region_and_carrier_cache A cron job populates these tables daily and monthly. The queries used to populate this cache is where the calculation are being done. The reports have basic queries to read from the cache tables #Network Availability region cache */15 * * * * mysql -uroot -pPASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_cache.sql > ~/script/availability_cache_scripts/regioncachelog.txt 2>&1 #Network Availability region and carrier cache */15 * * * * mysql -uroot -PASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_and_carrier_cache.sql > ~/script/availability_cache_scripts/regionandacarriercachelog.txt 2>&1 1 0 1 * * mysql -uroot -pPASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_monthly_cache.sql > ~/script/availability_cache_scripts/regionMonthlycachelog.txt 2>&1 1 0 1 * * mysql -uroot -pPASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_and_carrier_monthly_cache.sql > ~/script/availability_cache_scripts/regionCarrierMonthlycachelog.txt 2>&1 0), (remoteCount * 900) - fullJoinedList.alarmDurationSum, (remoteCount * 900)) active , (fullJoinedList.remoteCount * 900) base, IF(fullJoinedList.alarmDurationSum IS NOT NULL, ROUND(((fullJoinedList.remoteCount * 900) - SUM(alarmDurationSum)) * 100 / (fullJoinedList.remoteCount * 900), 2), 100) AS uptime, fullJoinedList.techType FROM (SELECT controller.*, SUM(tickets.alarmDuration) alarmDurationSum, IF(controller.technology = 'Opto22',1, IF( COUNT(CASE WHEN (controller.ne_type = 'Node' or controller.ne_type = 'wifi' ) THEN 1 END) > 0, COUNT(CASE WHEN (controller.ne_type = 'Node' or controller.ne_type = 'wifi' ) THEN 1 END), 1 ) )AS remoteCount, count(*) elementCount, controller.technologyType techType, group_concat(distinct controller.technology) vendors, group_concat(distinct controller.neId) elementIds, IF (SUM(tickets.alarmDuration) IS NOT NULL, ( COUNT(CASE WHEN controller.ne_type = 'Node' THEN 1 END) * 900) - SUM(tickets.alarmDuration), (COUNT(CASE WHEN (controller.ne_type = 'Node') THEN 1 END) * 900)) active, (COUNT(CASE WHEN (controller.ne_type = 'Node') THEN 1 END) * 900) base FROM (SELECT ne.id neId, parent.id parentId, IF(ne.ne_type = 'Controller', ne.name, parent.name) as pName, IF(ne.ne_type = 'Controller', ne.cluster_name, parent.cluster_name) as cluster_name, IF(ne.ne_type = 'Controller', ne.technology, parent.technology) as technology, 'System' carrier, IF(ne.assigned_technology_id = 40, concat(ne.network_identifier,' Opto'), parent.network_identifier) network_identifier, ne.ne_type, tech.product_type as technologyType FROM snmp_manager.network_element parent LEFT JOIN snmp_manager.network_element ne ON ne.parent_id = parent.id LEFT JOIN snmp_manager.technology tech ON tech.id = ne.assigned_technology_id left join snmp_manager.site_network_element sne on sne.network_element_id = parent.id left join snmp_manager.network_element_property nep on (nep.site_id = sne.site_network_elements_id and nep.name = 'Asset Status') WHERE (parent.ne_type = 'Controller' OR ne.assigned_technology_id = 40) and ne.show_in_monitor = TRUE and ne.status_id in (1,5) # ATC-996532 - 'On Air' and 'Off-air accepted' states AND parent.technology not in ('SYM','Westell_UDIT','Shark','Eaton','Emerson','Kentrox_RMM','n/a','Exfo','Voltserver','Mikrotik','Opto22') AND ne.equipment_type is null group by ne.id ) AS controller LEFT JOIN (SELECT aa.created_date,aa.cleared_date,aa.cleared_reason, ne.cluster_name region, tech.equipment_manufacturer, ne.name neName, SUM(CASE WHEN aa.cleared_date IS NOT NULL THEN (UNIX_TIMESTAMP(LEAST(aa.cleared_date,NOW())) - UNIX_TIMESTAMP(GREATEST(aa.created_date,DATE_SUB(NOW(), INTERVAL 15 MINUTE)))) WHEN aa.cleared_date IS NULL THEN (UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(GREATEST(aa.created_date,DATE_SUB(NOW(), INTERVAL 15 MINUTE)))) END) alarmDuration, IF(ne.ne_type = 'Controller', ne.id, parent.id) controller_id, ne.ne_type tNeType,ne.id tNeId, ne.name, gts.trap_name, gts.alarm_identifier FROM snmp_manager.active_alarm aa LEFT JOIN snmp_manager.general_trap_summary gts on aa.creatinggts_id = gts.id LEFT JOIN snmp_manager.network_element ne on ne.id = aa.network_element_id LEFT JOIN snmp_manager.technology tech ON tech.id = ne.assigned_technology_id LEFT JOIN snmp_manager.network_element parent ON (ne.parent_id = parent.id AND parent.ne_type = 'Controller') LEFT JOIN snmp_manager.site_network_element site on (ne.id = site.network_element_id or parent.id = site.network_element_id ) LEFT JOIN #Needed an alias table as Null MaintenanceAlarmIdentifier value could either mean 'all alarms' or no maintenance rule at all (select maintenance_rule.* ,COALESCE(alarm_identifiers_string,'ALL ALARMS') alarm_identifiers_string from snmp_manager.maintenance_rule LEFT JOIN snmp_manager.maintenance_rule_alarm_identifiers on maintenance_rule_alarm_identifiers.maintenance_rule_id = maintenance_rule.id where cancelled_manually is false ) AS m ON (ne.id = m.network_element_id or site.site_network_elements_id = m.site_id or m.network_element_id = ne.parent_id) and aa.created_date BETWEEN m.start_date and m.finish_date and (m.alarm_identifiers_string = aa.alarm_identifier or m.alarm_identifiers_string = 'ALL ALARMS') LEFT JOIN snmp_manager.site_network_element sne on sne.network_element_id = parent.id LEFT JOIN snmp_manager.network_element_property nep on (nep.site_id = sne.site_network_elements_id and nep.name = 'Asset Status') -- This is for the site status on polaris WHERE parent.technology not in ('SYM','Westell_UDIT','Shark','Eaton','Emerson','Kentrox_RMM','n/a','Exfo','Voltserver','Mikrotik','Opto22') AND (aa.status = 'CRITICAL' or aa.previous_status = 'CRITICAL') AND (aa.cleared_date IS NULL OR aa.cleared_date > DATE_SUB(NOW(), INTERVAL 15 MINUTE) ) AND aa.created_date < NOW() AND m.id is null AND ne.ne_type in ('node','host','wifi') AND (nep.value = 'Active' or nep.value is null ) AND ne.show_in_monitor = TRUE GROUP BY ne.id ) AS tickets ON tickets.tNeId = controller.neId group by controller.network_identifier,controller.carrier,technologyType ) as fullJoinedList GROUP BY fullJoinedList.network_identifier,fullJoinedList.carrier,technologyType having region <> 'Admin' ORDER BY uptime;> ====== Calculation formula ====== ( (Count of Remotes * TotalTime) - Duration of Critical Alarms) / (Count of Remotes * TotalTime) ) * 100 The above is run for each site, than it is run for the sum of all remotes in the system to calculate the overall system availability ===== Schedules ===== One for each carrier ==== South America ====