User Tools
Writing /app/www/public/data/meta/development/applications/snmpmanager/snmpintro/start.meta failed
development:applications:snmpmanager:snmpintro:start
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| development:applications:snmpmanager:snmpintro:start [2016/08/19 16:01] – cokeeffe | development:applications:snmpmanager:snmpintro:start [2021/06/25 10:09] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | =====SNMP & Traps in a Nutshell===== | ||
| + | SNMP stands for simple network management protocol. As the name suggests it is a standard protocol designed for managing devices. Wikipedia has a brief description of the theory [[https:// | ||
| + | |||
| + | ====Traps==== | ||
| + | |||
| + | The part of the protocol we're most interested in is traps. These are how we monitor devices for alarms and issues. The devices we monitor are all connected over private networks to our servers. When they experience an issue they send a small UDP packet called a trap to us. This shifts the burden from the monitoring software - we don't have to constantly poll devices for fault (though we also do that to some extent), the devices send us alarms when they experience faults. | ||
| + | |||
| + | Traps are simple data objects. They have are identified by an object identifier (OID). The OID is a series of numbers separated by dots. An example OID is '' | ||
| + | |||
| + | Now that's not terrible informative. What does that mean? We use a file call a management information base (MIB) to translate it. A MIB is really just a text file with a specific standard structure that defines what the OID values mean. The [[http:// | ||
| + | |||
| + | MIBs contain a tree structure that is used to translate the numbers of the OID into something sensible. But how do we know what MIB translates each trap? The long number in the middle there is the enterprise number of the OID. It signifies which company made the device or software the trap is being sent from and therefore who you can get the MIB from. These numbers are registered with the IANA and there' | ||
| + | |||
| + | If you look up which company is identified by 33582 you'll find it's us! This is one of our own internal traps. The MIB file to translated it is '' | ||
| + | |||
| + | ====Varbinds==== | ||
| + | |||
| + | But the trap just telling us that a device is missing isn't exactly very useful. What's missing? Traps also contain variable bindings (varbinds). Varbinds, similar to traps themselves, have an OID that identifies their name. They also have a value. A varbind' | ||
| + | |||
| + | |||
| + | |||
| + | ^ Name OID | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.1.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.2.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.3.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.4.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.5.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.6.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.7.0 | ||
| + | | .1.3.6.1.4.1.33582.1.1.1.8.0 | ||
| + | |||
| + | The varbinds each have a specific meaning that is usually documented within the MIB itself. In this case this would signify that we received an btscRfSwOn alarm from a piece of equipment called VD_WD_PeoplesPark_03-RM_20 which is attached to a host VD_WD_PeoplesPark_03 from IP address 9.9.9.9 which we have processed as being part of the MOBILE_ACCESS technology and belongs to customer VODA. If not all of that informations makes sense yet don't worry about it too much. Just know that each different type of trap will have different varbinds with different values that have to be interpreted by our software. We take all this information and put it into a standardized format using [[development: | ||
| + | |||
| + | ====Resolving Exceptions==== | ||
| + | ===StaleObjectStateException (" | ||
| + | I added new entry “Missing Unit Alarm Received” in active_alarm_exclusion_criteria.
I think alarm sync tried to clear device missing but at the same time device missing trap is sent to SnmpManager.
Before device missing cleared, repeat count goes up hence version number (column version in active_alarm table) is incremented.
Process to clear device missing has active_alarm instance of previous version number.
Thus StaleObjectStateException: | ||