User Tools
This is an old revision of the document!
Table of Contents
Trap Rule Testing
Author: Eoin Joy
Before a Trap Rule is deployed into the production system, first we must test it on the customer-data-specific QA system
Configuring An Environment For Script Writing
It is very useful to be able to have correct syntax highlighting for such scripts that can refer quite closely to core Errigal Application code, as is the case with Trap Rules and the SNMP Manager. For this end you could begin writing your script inline with SNMP Manager code (don't do this), or you could make a new project with a dependency on the SNMP Manager.
IntelliJ IDEA Project
IntelliJ IDEA allows you to add modules to a project to allow for dependencies towards these modules.
It is advised to create a Project in IDEA to contain any trap rule edits you need to make. This project would add the SnmpManager codebase as a module, allowing correct syntax highlighting for Trap Rules, including possible methods and classes to 'import'.
Bear in mind that you cannot import into a Trap Rule, but you can fully qualify any such calls.
Working On Edits To Existing Rules
You should ensure before you begin to work on a trap rule edit, that there is not an edit for that rule in the process of being tested or worked upon.
If this is the case, then either the changes become merged and one person performs and tests both changes at the same time, or the new changes are pushed back until the currently changing version of the rule has been given the go-ahead to be pushed to production.
In either case, the rule that should be worked from is gained from the filesystem on the production apps server. Located at appfiles/SnmpManagerFiles/rules/<mib_name>/<trap_name>_Core.groovy
You should determine if there is only one distinct rule text for that folder of rules. If there is not one distinct rule, you must find out if there are minor differences, or entirely different groups of rules. With entirely different groups, consider if they are different because of different usage of the same varbinds, or because the set of varbinds is completely different. If the same varbinds are used in different ways, to ease future maintenance of the rules, there is a possibility and encouragement to merge them into one script.
Deploying To QA
The Trap Rule as a file to be deployed should have the following as its first line of code (comments and whitespace are fine)
com.errigal.snmpmanager.Trap trap ->
As you can guess, the script behaves like a closure.
The script does not need to return anything on its final line.
With your new trap rule file, you must deploy on QA to replace the correct files already in place.
Using The Trap Rule Creation Script
On the applications servers in ~/script/trap_rule_creation/ there exists a script used to create a suite of trap rule files from one source with the same text and correct naming convention.
Two files are needed to properly run this script, the list of trap names to use (mobileAccess.txt) and the contents of the rule (contents_mobileAccess.txt). Optionally should some of the trap names given in the source file end with the text Clear, then a separate contents file will be used for any possible changes for dedicated clear rules (contents_clear_mobileAccess.txt).
The correct usage in this case of the script would be as follows:
./create_rules.csh mobileAccess.txt
This creates an archive, new_rules.tar.gz, containing all the Core rules in a directory called mobileAccess.
The next steps to deployment are as follows:
- Move the archive to the trap rules folder
mv new_rules.tar.gz ~/appfiles/SnmpManagerFiles/rules/
- Extract the new rules from the archive
tar -zxf new_rules.tar.gz
- Move the existing rules to a backup
mv ma_events_2_26.mib ma_events_2_26.mib.backup.2016-08-15
- Move the new rules into place
mv new_rules/mobileAccess ma_events_2_26.mib
- Alter the permissions on these new rules to allow for automatic services like lsyc to manage their synchronisation
chmod 774 ma_events_2_26.mib/*
- Clean up
rmdir new_rules rm new_rules.tar.gz
At this stage, once the trap rule cache is cleared from the trapRule controller, upon trap receipt, the system will look in the filesystem for the most up-to-date trap rule.
lsync
lsync is a daemon process that manages synchronising files across servers in a cluster upon file creation, modification, or deletion.
Be aware that the way that most lsync clusters are set up has the *apps1 server as master to the *apps2 server, but changes from one will be pushed to the other, whereas the relationship is one way towards the *lb1 server containing the distributor. This means that new rules should not be worked on on the distributor, as this can cause confusion in the errigal user should they believe themselves to have made an edit to a rule that has not been edited on the apps servers.
File permissions may also play a part in blocking the propagation of Trap Rules through a cluster. Ensure that rules are owned by scotty:scotty and have sufficient permissions for unison to perform actions upon. The unison log at ~/unison.log will be helpful in diagnosing these issues.
Re-Sending Traps
To test properly any changes you have made to a rule, you will need to determine that trap processing has not been adversely affected by the changes you have made, and that your changes were effective of course. To do this, we test with recycled trap packets on QA.
Manually Re-Sending Traps
Once you find the trap you wish to re-send (determined below), you can manually re-send the trap from the Trap controller's show page e.g. https://qaerrigallb1.crc/SnmpManager/trap/show/<trap_id>
In the general case, you will need to set the ip address of the hub that the trap would have calculated as the parent of its network element to be the same as the handler you sent from.
Example:
- The trap we decide to resend came from 10.20.30.40
- Find which network element corresponds to that ip address, eg. NE-NY-HUB_001-OPN
- Record the current ip address of this network element
- Determine which handler you are currently using.
Use Developer ToolsCtrl+Shift+Iand view the cookies in the Application tab. Result shows SnmpManagerWorker2 implying you are using apps2 - Determine there are currently no network elements using that ip address
- Update the network element to have the ip address of that handler, apps2: 10.40.30.20
- Insert the Load Balancer IP address in the field “Please enter IP address”
- Send your traps
- Reset the ip address of the hub to be its true ip address
Please note that the trap controller re-send traps as Version 2 (V2) traps. A few vendors use Version 1 traps (E.g OPTO22), If the trap that you need to test is Version 1 (V1) (Check field 'type' on the trap table') it will not work. If that is the case, you can use iReasoning Mib browser (http://www.ireasoning.com/mibbrowser.shtml) to re-send the trap as V1.
The Trap Emulator
Trap Emulator Documentation can be found at: https://bitbucket.org/errigal/trap-emulator
The trap emulator can be used to send one trap at a time or multiple traps if one was to have the right traps appearing consecutively in the database. This is not the recommended course of action in this case
What Traps Do I Need To Re-Send?
To test a single case for a trap rule, you must determine the following
- Received Trap immediately creates a General Trap Summary on the correct Network Element
- Received Trap creates an Active Alarm entry <= 10 seconds after processing
- Active Alarm entry appears on the correct element in the Node Monitor
- Repeat traps do not create separate Active Alarms
- Active Alarm entry when acknowledged (manually or automatically) creates a ticket if all of the following are true
- It has Status of CRITICAL, MAJOR, or MINOR
- There exists no unresolved Ticket that has an SNMP Trap Form matching the details of the Active Alarm
- The Network Element it applies to is ON AIR
- A received clear trap with the same alarm identifier and Network Element clears the Active Alarm in the Node Monitor, and moves the Ticket into an Alarm Clear Received state
- A received trap or clear for an element not appearing in the IDMS as a child of an already set up hub should result in an errigalMonitoredCarrierDeviceMissingAlarm on that hub.
- A received trap or clear for an element on a hub that does not appear in the IDMS should create this hub and attempt to create an errigalMonitoredCarrierDeviceMissingAlarm on this new hub.
The cases you need to test for a trap rule include testing every type of equipment that a given rule covers. It is also preferred to test every branch of the code in the rule.
Determining Success
During testing, no Exceptions should be triggered by the process of processing any part of the trap.
The database values for the general trap summary, active alarm, and remote ticket should all create correctly and you should be able to follow the breadcrumbs all the way from the trap and its id through GTS, active alarm, remote ticket, ticket, ticket change, and to a form in the ticket like a NOC Form or the SNMP Trap Form.
Logging
During normal execution, the logging is done with the log variable and is printed to the application logs most often found in the ~/logs/grails/SnmpManager.log file.
During Trap Rule execution, the log variable cannot be used, and as such, all logging is done via print method calls. The output of which can be found in the /var/tomcat/SnmpManager/logs/catalina.out file.
Assessment
Ensure you are on QA and NOT production
- Find the alarm-clear pairs for alarms for a hub in the TMobile - New York cluster for the past week
- Make an edit that will add some useful information to the summary of the general trap summary
- Test this change for several different variations of traps that would be affected by this change
- Restore the old rule and determine it is still working
- Find a rule that does not print all its varbinds. Make an edit to print each varbind name and value upon the start of processing.