User Tools
Table of Contents
ATC Availability Reports
Reports
- Availability Raw Data
- . BreakDown /List of Alarms – Quite Heavy, should only be run for a specific site or for a short period (e.g. 2 days)
- Availability Report by Region - 1 report for the whole system, can be filtered by regions
- Availability Report by Carrier - Availability of each site is calculated separately for each carrier.
- Network Summary - A quick and formatted overview displaying the overall system availability and the comparison year to date. Sabrina sends this to upper management.
- Widgets
Type of Alarms
Alarms need to attend all requirements below to affect availability.
- Associated to remotes and hosts)
- Severity Critical
- Not be created during a maintenance period are ignored
- Belong to the following technologies:'ADC','ADC_CXD','ADC_FUSION','ADC_LGCY','ADC_SPECTRUM', 'ADC_URH', 'ADC_PRISM','ANDREWS','ANDREWS_IONB','ANDREWS_IONU','ANDREWS_TSUN','ANDREW_BDA','MOBILE_ACCESS','Mobile Access BR One','POWERWAVE','SOLID','SOLID_REL6','TEKO_TELECOM','TEKO TELECOM'
- 'On Air' and 'Off-air accepted' status
- ne.show_in_monitor = TRUE
Cache
Because of the high amount of data, the query to calculate the availability for long periods can take extremely long to execute. To allow the report and widgets to run faster (and to make it easier to maintain the data consistent), a 'cache' was implemented. These are run daily and monthly and saved in the following database tables
- reporting_manager.availability_by_region_cache_monthly
- reporting_manager.availability_by_region_and_carrier_cache_monthly
- reporting_manager.availability_by_region_cache
- reporting_manager.availability_by_region_and_carrier_cache
A cron job populates these tables daily and monthly. The queries used to populate this cache is where the calculation are being done. The reports have basic queries to read from the cache tables
#Network Availability region cache */15 * * * * mysql -uroot -pPASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_cache.sql > ~/script/availability_cache_scripts/regioncachelog.txt 2>&1
#Network Availability region and carrier cache */15 * * * * mysql -uroot -PASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_and_carrier_cache.sql > ~/script/availability_cache_scripts/regionandacarriercachelog.txt 2>&1
1 0 1 * * mysql -uroot -pPASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_monthly_cache.sql > ~/script/availability_cache_scripts/regionMonthlycachelog.txt 2>&1
1 0 1 * * mysql -uroot -pPASSWORD ticketer -h atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_and_carrier_monthly_cache.sql > ~/script/availability_cache_scripts/regionCarrierMonthlycachelog.txt 2>&1
<SET group_concat_max_len=15000; INSERT INTO reporting_manager.availability_by_region_cache SELECT
DATE_SUB(NOW(), INTERVAL 15 MINUTE) startDate,
fullJoinedList.network_identifier site,
fullJoinedList.carrier,
fullJoinedList.cluster_name region,
fullJoinedList.vendors vendors,
fullJoinedList.remoteCount remotes,
fullJoinedList.elementCount,
fullJoinedList.alarmDurationSum alarmDuration,
IF (alarmDurationSum IS NOT NULL and ((remoteCount * 900) - fullJoinedList.alarmDurationSum > 0),
(remoteCount * 900) - fullJoinedList.alarmDurationSum,
(remoteCount * 900)) active ,
(fullJoinedList.remoteCount * 900) base,
IF(fullJoinedList.alarmDurationSum IS NOT NULL,
ROUND(((fullJoinedList.remoteCount * 900) - SUM(alarmDurationSum)) * 100 / (fullJoinedList.remoteCount * 900),
2),
100) AS uptime,
fullJoinedList.techType
FROM
(SELECT
controller.*, SUM(tickets.alarmDuration) alarmDurationSum,
IF(controller.technology = 'Opto22',1,
IF(
COUNT(CASE WHEN (controller.ne_type = 'Node' or controller.ne_type = 'wifi' ) THEN 1 END) > 0,
COUNT(CASE WHEN (controller.ne_type = 'Node' or controller.ne_type = 'wifi' ) THEN 1 END),
1
)
)AS remoteCount,
count(*) elementCount,
controller.technologyType techType,
group_concat(distinct controller.technology) vendors,
group_concat(distinct controller.neId) elementIds,
IF (SUM(tickets.alarmDuration) IS NOT NULL, ( COUNT(CASE WHEN controller.ne_type = 'Node' THEN 1 END) * 900) - SUM(tickets.alarmDuration),
(COUNT(CASE WHEN (controller.ne_type = 'Node') THEN 1 END) * 900)) active,
(COUNT(CASE WHEN (controller.ne_type = 'Node') THEN 1 END) * 900) base
FROM
(SELECT ne.id neId,
parent.id parentId,
IF(ne.ne_type = 'Controller', ne.name, parent.name) as pName,
IF(ne.ne_type = 'Controller', ne.cluster_name, parent.cluster_name) as cluster_name,
IF(ne.ne_type = 'Controller', ne.technology, parent.technology) as technology,
'System' carrier,
IF(ne.assigned_technology_id = 40, concat(ne.network_identifier,' Opto'), parent.network_identifier) network_identifier,
ne.ne_type,
tech.product_type as technologyType
FROM
snmp_manager.network_element parent
LEFT JOIN snmp_manager.network_element ne ON ne.parent_id = parent.id
LEFT JOIN snmp_manager.technology tech ON tech.id = ne.assigned_technology_id
left join snmp_manager.site_network_element sne on sne.network_element_id = parent.id
left join snmp_manager.network_element_property nep on (nep.site_id = sne.site_network_elements_id and nep.name = 'Asset Status')
WHERE
(parent.ne_type = 'Controller'
OR ne.assigned_technology_id = 40)
and ne.show_in_monitor = TRUE and ne.status_id in (1,5) # ATC-996532 - 'On Air' and 'Off-air accepted' states
AND parent.technology not in ('SYM','Westell_UDIT','Shark','Eaton','Emerson','Kentrox_RMM','n/a','Exfo','Voltserver','Mikrotik','Opto22')
AND ne.equipment_type is null
group by ne.id
) AS controller
LEFT JOIN
(SELECT
aa.created_date,aa.cleared_date,aa.cleared_reason,
ne.cluster_name region,
tech.equipment_manufacturer,
ne.name neName,
SUM(CASE
WHEN
aa.cleared_date IS NOT NULL
THEN
(UNIX_TIMESTAMP(LEAST(aa.cleared_date,NOW())) - UNIX_TIMESTAMP(GREATEST(aa.created_date,DATE_SUB(NOW(), INTERVAL 15 MINUTE))))
WHEN
aa.cleared_date IS NULL
THEN
(UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(GREATEST(aa.created_date,DATE_SUB(NOW(), INTERVAL 15 MINUTE))))
END) alarmDuration,
IF(ne.ne_type = 'Controller', ne.id, parent.id) controller_id,
ne.ne_type tNeType,ne.id tNeId, ne.name,
gts.trap_name, gts.alarm_identifier
FROM
snmp_manager.active_alarm aa
LEFT JOIN snmp_manager.general_trap_summary gts on aa.creatinggts_id = gts.id
LEFT JOIN snmp_manager.network_element ne on ne.id = aa.network_element_id
LEFT JOIN snmp_manager.technology tech ON tech.id = ne.assigned_technology_id
LEFT JOIN snmp_manager.network_element parent ON (ne.parent_id = parent.id
AND parent.ne_type = 'Controller')
LEFT JOIN snmp_manager.site_network_element site on (ne.id = site.network_element_id or parent.id = site.network_element_id )
LEFT JOIN #Needed an alias table as Null MaintenanceAlarmIdentifier value could either mean 'all alarms' or no maintenance rule at all
(select maintenance_rule.* ,COALESCE(alarm_identifiers_string,'ALL ALARMS') alarm_identifiers_string
from
snmp_manager.maintenance_rule
LEFT JOIN snmp_manager.maintenance_rule_alarm_identifiers on maintenance_rule_alarm_identifiers.maintenance_rule_id = maintenance_rule.id
where
cancelled_manually is false )
AS m ON
(ne.id = m.network_element_id or site.site_network_elements_id = m.site_id or m.network_element_id = ne.parent_id)
and aa.created_date BETWEEN m.start_date and m.finish_date
and (m.alarm_identifiers_string = aa.alarm_identifier or m.alarm_identifiers_string = 'ALL ALARMS')
LEFT JOIN snmp_manager.site_network_element sne on sne.network_element_id = parent.id
LEFT JOIN snmp_manager.network_element_property nep on (nep.site_id = sne.site_network_elements_id and nep.name = 'Asset Status') -- This is for the site status on polaris
WHERE
parent.technology not in ('SYM','Westell_UDIT','Shark','Eaton','Emerson','Kentrox_RMM','n/a','Exfo','Voltserver','Mikrotik','Opto22')
AND (aa.status = 'CRITICAL' or aa.previous_status = 'CRITICAL')
AND (aa.cleared_date IS NULL OR aa.cleared_date > DATE_SUB(NOW(), INTERVAL 15 MINUTE) )
AND aa.created_date < NOW()
AND m.id is null
AND ne.ne_type in ('node','host','wifi')
AND (nep.value = 'Active' or nep.value is null )
AND ne.show_in_monitor = TRUE
GROUP BY ne.id
) AS tickets
ON tickets.tNeId = controller.neId
group by controller.network_identifier,controller.carrier,technologyType
) as fullJoinedList
GROUP BY fullJoinedList.network_identifier,fullJoinedList.carrier,technologyType
having region <> 'Admin'
ORDER BY uptime;>
Calculation formula
( (Count of Remotes * TotalTime) - Duration of Critical Alarms) / (Count of Remotes * TotalTime) ) * 100
The above is run for each site, than it is run for the sum of all remotes in the system to calculate the overall system availability
Schedules
One for each carrier