User Tools

Site Tools


Writing /app/www/public/data/meta/support/atc_availability.meta failed
support:atc_availability

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
support:atc_availability [2020/05/25 20:23] – [Cache] scostasupport:atc_availability [2021/06/25 10:09] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +====== ATC Availability Reports ======
 +
 +
 +====== Reports ======
 +
 +  * Availability Raw Data
 +      * .  BreakDown /List of Alarms – Quite Heavy, should only be run for  a specific site or for a short period (e.g. 2 days)
 +  * Availability Report by Region - 1 report for the whole system, can be filtered by regions
 +  * Availability Report by Carrier - Availability of each site is calculated separately for each carrier. 
 +  * Network Summary - A quick and formatted overview displaying the overall system availability and the comparison year to date. Sabrina sends this to upper management.
 +  * Widgets
 +
 +
 +====== Type of Alarms====== 
 + Alarms need to attend all requirements below to affect availability.
 +  * Associated to remotes and hosts)
 +  * Severity Critical
 +  * Not be created during a maintenance period are ignored
 +  * Belong to the following technologies:'ADC','ADC_CXD','ADC_FUSION','ADC_LGCY','ADC_SPECTRUM', 'ADC_URH', 'ADC_PRISM','ANDREWS','ANDREWS_IONB','ANDREWS_IONU','ANDREWS_TSUN','ANDREW_BDA','MOBILE_ACCESS','Mobile Access BR One','POWERWAVE','SOLID','SOLID_REL6','TEKO_TELECOM','TEKO TELECOM'
 +  * 'On Air' and 'Off-air accepted' status 
 +  * ne.show_in_monitor = TRUE
 +
 +
 +
 +====== Cache ======
 +Because of the high amount of data, the query to calculate the availability for long periods can take extremely long to execute. To allow the report and widgets to run faster (and  to make it easier to maintain the data consistent), a 'cache' was implemented.  These are run daily and monthly and saved in the following database tables 
 +  * reporting_manager.availability_by_region_cache_monthly
 +  * reporting_manager.availability_by_region_and_carrier_cache_monthly
 +  * reporting_manager.availability_by_region_cache
 +  * reporting_manager.availability_by_region_and_carrier_cache
 +
 +A cron job populates these tables daily and monthly. The queries used to populate this cache is where the calculation are being done. The reports have basic queries to read from the cache tables
 +
 +#Network Availability region cache
 +*/15 * * * *  mysql -uroot -pPASSWORD ticketer -h  atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_cache.sql > ~/script/availability_cache_scripts/regioncachelog.txt 2>&1
 +
 +
 +#Network Availability region and carrier cache
 +*/15 * * * *  mysql -uroot -PASSWORD ticketer -h  atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_and_carrier_cache.sql > ~/script/availability_cache_scripts/regionandacarriercachelog.txt 2>&1
 +
 +
 +1 0 1 * *  mysql -uroot -pPASSWORD ticketer -h  atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_monthly_cache.sql > ~/script/availability_cache_scripts/regionMonthlycachelog.txt 2>&1
 +
 +
 +1 0 1 * *  mysql -uroot -pPASSWORD ticketer -h  atc-rds-instance-master-1.c6hdidhzxqy5.us-east-1.rds.amazonaws.com < ~/script/availability_cache_scripts/availability_by_region_and_carrier_monthly_cache.sql > ~/script/availability_cache_scripts/regionCarrierMonthlycachelog.txt 2>&1
 +
 +<SET  group_concat_max_len=15000;
 +INSERT INTO reporting_manager.availability_by_region_cache SELECT
 +         DATE_SUB(NOW(), INTERVAL 15 MINUTE) startDate,
 +            fullJoinedList.network_identifier site,
 +            fullJoinedList.carrier,
 +            fullJoinedList.cluster_name region,
 +            fullJoinedList.vendors vendors,
 +            fullJoinedList.remoteCount remotes,
 +            fullJoinedList.elementCount,
 +            fullJoinedList.alarmDurationSum alarmDuration,
 +                    IF (alarmDurationSum IS NOT NULL and ((remoteCount * 900) - fullJoinedList.alarmDurationSum > 0),
 +                        (remoteCount * 900) - fullJoinedList.alarmDurationSum,
 +                       (remoteCount * 900)) active ,
 +                   (fullJoinedList.remoteCount * 900) base,
 +                    IF(fullJoinedList.alarmDurationSum IS NOT NULL,
 +                        ROUND(((fullJoinedList.remoteCount * 900) - SUM(alarmDurationSum)) * 100 / (fullJoinedList.remoteCount * 900),
 +                                2),
 +                        100) AS uptime,
 +                                    fullJoinedList.techType
 +        FROM
 +            (SELECT
 +                controller.*, SUM(tickets.alarmDuration) alarmDurationSum,
 +                    IF(controller.technology = 'Opto22',1,
 +                      IF(
 +                          COUNT(CASE WHEN (controller.ne_type = 'Node' or controller.ne_type = 'wifi' ) THEN 1 END) > 0,
 +                          COUNT(CASE WHEN (controller.ne_type = 'Node' or controller.ne_type = 'wifi' ) THEN 1 END),
 +                          1
 +                    )
 +                        )AS remoteCount,
 +                count(*) elementCount,
 +                controller.technologyType techType,
 +                group_concat(distinct controller.technology) vendors,
 +                group_concat(distinct controller.neId) elementIds,
 +                IF (SUM(tickets.alarmDuration) IS NOT NULL, ( COUNT(CASE WHEN controller.ne_type = 'Node' THEN 1 END) * 900) - SUM(tickets.alarmDuration),
 +                    (COUNT(CASE WHEN (controller.ne_type = 'Node') THEN 1 END) * 900)) active,
 +                (COUNT(CASE WHEN (controller.ne_type = 'Node') THEN 1 END) * 900) base
 +
 +            FROM
 +                    (SELECT ne.id neId,
 +                        parent.id parentId,
 +                            IF(ne.ne_type = 'Controller', ne.name, parent.name) as pName,
 +                            IF(ne.ne_type = 'Controller', ne.cluster_name, parent.cluster_name) as cluster_name,
 +                            IF(ne.ne_type = 'Controller', ne.technology, parent.technology) as technology,
 +                            'System' carrier,
 +                            IF(ne.assigned_technology_id = 40, concat(ne.network_identifier,' Opto'), parent.network_identifier) network_identifier,
 +                        ne.ne_type,
 +                        tech.product_type as technologyType
 +                    FROM
 +                snmp_manager.network_element parent
 +                LEFT JOIN snmp_manager.network_element ne ON ne.parent_id = parent.id
 +                LEFT JOIN snmp_manager.technology tech ON tech.id = ne.assigned_technology_id
 +                left join snmp_manager.site_network_element  sne on sne.network_element_id = parent.id
 +                left join snmp_manager.network_element_property nep on (nep.site_id = sne.site_network_elements_id and nep.name = 'Asset Status')
 +                    WHERE
 +                        (parent.ne_type = 'Controller'
 +                            OR ne.assigned_technology_id = 40)
 +                        and ne.show_in_monitor = TRUE and ne.status_id in (1,5) # ATC-996532 - 'On Air' and 'Off-air accepted' states
 +                        AND parent.technology not in ('SYM','Westell_UDIT','Shark','Eaton','Emerson','Kentrox_RMM','n/a','Exfo','Voltserver','Mikrotik','Opto22')
 +                        AND ne.equipment_type is null
 +                        group by ne.id
 +            ) AS controller
 +        LEFT JOIN
 +            (SELECT
 +    aa.created_date,aa.cleared_date,aa.cleared_reason,
 +                ne.cluster_name region,
 +                tech.equipment_manufacturer,
 +                ne.name neName,
 +                           SUM(CASE
 +                        WHEN
 +                            aa.cleared_date IS NOT NULL
 +                        THEN
 +                            (UNIX_TIMESTAMP(LEAST(aa.cleared_date,NOW())) - UNIX_TIMESTAMP(GREATEST(aa.created_date,DATE_SUB(NOW(), INTERVAL 15 MINUTE))))
 +                        WHEN
 +                            aa.cleared_date IS NULL
 +                        THEN
 +                            (UNIX_TIMESTAMP(NOW()) - UNIX_TIMESTAMP(GREATEST(aa.created_date,DATE_SUB(NOW(), INTERVAL 15 MINUTE))))
 +                    END) alarmDuration,
 +                    IF(ne.ne_type = 'Controller', ne.id, parent.id) controller_id,
 +                ne.ne_type tNeType,ne.id tNeId, ne.name,
 +              gts.trap_name, gts.alarm_identifier
 +            FROM
 +                snmp_manager.active_alarm aa
 +            LEFT JOIN  snmp_manager.general_trap_summary gts on aa.creatinggts_id = gts.id
 +            LEFT JOIN snmp_manager.network_element ne on ne.id = aa.network_element_id
 +            LEFT JOIN snmp_manager.technology tech ON tech.id = ne.assigned_technology_id
 +            LEFT JOIN snmp_manager.network_element parent ON (ne.parent_id = parent.id
 +                AND parent.ne_type = 'Controller')
 +            LEFT JOIN snmp_manager.site_network_element site on (ne.id = site.network_element_id or parent.id = site.network_element_id )
 +            LEFT JOIN                    #Needed an alias table as Null MaintenanceAlarmIdentifier value could either mean 'all alarms' or no maintenance rule  at all
 +              (select maintenance_rule.* ,COALESCE(alarm_identifiers_string,'ALL ALARMS') alarm_identifiers_string
 +                        from
 +                            snmp_manager.maintenance_rule
 +                            LEFT JOIN snmp_manager.maintenance_rule_alarm_identifiers on maintenance_rule_alarm_identifiers.maintenance_rule_id = maintenance_rule.id
 +                                    where
 +                                       cancelled_manually is false  )
 +                AS m ON
 +            (ne.id = m.network_element_id or site.site_network_elements_id = m.site_id or m.network_element_id = ne.parent_id)
 +            and aa.created_date BETWEEN m.start_date and m.finish_date
 +            and (m.alarm_identifiers_string = aa.alarm_identifier or m.alarm_identifiers_string = 'ALL ALARMS')
 +
 +            LEFT JOIN snmp_manager.site_network_element  sne on sne.network_element_id = parent.id
 +            LEFT JOIN snmp_manager.network_element_property nep on (nep.site_id = sne.site_network_elements_id and nep.name = 'Asset Status') -- This is for the site status on polaris
 +            WHERE
 +                    parent.technology not in ('SYM','Westell_UDIT','Shark','Eaton','Emerson','Kentrox_RMM','n/a','Exfo','Voltserver','Mikrotik','Opto22')
 +                    AND (aa.status = 'CRITICAL' or aa.previous_status = 'CRITICAL')
 +                    AND (aa.cleared_date IS NULL OR aa.cleared_date >  DATE_SUB(NOW(), INTERVAL 15 MINUTE) )
 +                    AND aa.created_date < NOW()
 +                    AND m.id is null
 +                    AND ne.ne_type in ('node','host','wifi')
 +                    AND (nep.value = 'Active' or nep.value is null )
 +                    AND ne.show_in_monitor = TRUE
 +                    GROUP BY ne.id
 +                    ) AS tickets
 +                    ON tickets.tNeId = controller.neId
 +           group by controller.network_identifier,controller.carrier,technologyType
 +          ) as fullJoinedList
 +            GROUP BY fullJoinedList.network_identifier,fullJoinedList.carrier,technologyType
 +having region <> 'Admin'
 +        ORDER BY uptime;>
 +
 +====== Calculation formula ======
 +
 +( (Count of Remotes * TotalTime) - Duration of Critical Alarms)  / (Count of Remotes * TotalTime) )  * 100 
 +
 +The above is run for each site, than it is run for the sum of all remotes in the system to calculate the overall system availability 
 +
 +
 +===== Schedules =====
 +
 +One for each carrier 
 +
 +
 +==== South America ====
 +
 +