Modules without switch ports are marked as down

Bug #258219 reported by Morten Brekkevold
2
Affects Status Importance Assigned to Milestone
Network Administration Visualized
Fix Released
High
Morten Brekkevold

Bug Description

Any module that NAV has not registered any switch ports
for, will repeatedly be marked as down. Typically, it
will mark such a module as down for five hours, then
mark it as up again for one hour, and then mark it as
down again for the next five.

[http://sourceforge.net/tracker/index.php?func=detail&aid=1583453&group_id=107608&atid=648170]

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

This bug is related to getDeviceData's module monitoring
plugins.

Module status is set by two different plugins. One plugin
performs actual module probing, either by asking specific
module status OIDs (such as for HP and 3Com devices), or by
a generic probe for a random ifindex known to exist on a
previously seen module. If a probe succeeds, the module is
marked as up. Another plugin tries to discover which
modules an IP device actually consists of. Any module this
plugin discovers, is marked as up. This latter method is
also how modules are initially discovered.

After this probing, the module monitor plugin will verify
the list of modules the other plugins marked as up against
the list of previously seen modules on the IP device. Any
previously seen module not in the up-list, is then
considered to be down.

This bug first appeared when the moduleMon probe OIDs where
rescheduled to be collected in one hour intervals, instead
of the regular six hours. This was done because one doesn't
want to wait 6 hours for an alert about a module going down.
 The generic ifindex probe only probes ifindexes of switch
ports, not router ports. Also, there is no generic way to
probe a module with no interfaces. Now, this probe will run
every hour, while the full module discovery only runs every
six hours. Since the module probe doesn't probe router- or
interfaceless modules, these will be considered down when
the probe runs single handedly.

This is why the symptoms are a pattern of 5 hour module
downtimes - A probe and full discovery are first run at the
same time, then a single handed probe is run one hour later,
and modules are marked as down for the remaining 5 hours
until a full discovery is run again.

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Fixed in r3688.

Excerpt from commit log:
- ModuleMon now probes router ports as well as switch ports.
- Modules are now explicitly marked down when probes fail.
- Add ability to ignore modules whose status is unknown,
instead of considering them down.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.