offMaintenance alerts for same device every 5 minutes

Bug #1403365 reported by Ingeborg Hellemo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Network Administration Visualized
Fix Released
High
Morten Brekkevold

Bug Description

NAV 4.2.1

A maintenance task ended 2014-12-12 20:44. Since then an alert has been generated every 5 minutes telling that the device is no longer on maintenance.

The mainenance task is not listed as Active, but is found in the archive with

Start 2012-12-05 20:44
End 2014-12-12 20:44
...
State Passed

I have tried to cancel the task to see if I triggered something, but the only change now is that state equals Canceled.

Under "Recent alerts" for the box I find a maintenanceState with "Unresolved" end-time

After turning on debug for maintengine we get this in the log every 5 minutes:

[2014-12-17 08:30:00,607] [DEBUG] [pid=10229 nav.maintengine] ------------------------------------------------------------
[2014-12-17 08:30:00,641] [DEBUG] [pid=10229 nav.maintengine] Endless maintenance task 347: Things that haven't been up longer than the threshold: [<Netbox: kulthLscene-sw.infra>]
[2014-12-17 08:30:00,647] [DEBUG] [pid=10229 nav.maintengine] Endless maintenance task 525: Things that haven't been up longer than the threshold: [<Netbox: sval-afscon3-sw.infra>]
[2014-12-17 08:30:00,652] [DEBUG] [pid=10229 nav.maintengine] Endless maintenance task 560: Things that haven't been up longer than the threshold: [<Netbox: m2m-host-108-121.osl255.netcom.no>]
[2014-12-17 08:30:00,655] [DEBUG] [pid=10229 nav.maintengine] Tasks transitioned to passed state: []
[2014-12-17 08:30:00,658] [DEBUG] [pid=10229 nav.maintengine] Tasks transitioned to active state: []
[2014-12-17 08:30:00,681] [DEBUG] [pid=10229 nav.maintengine] Subjects that should be on maintenance but wasn't: set([])
[2014-12-17 08:30:00,681] [DEBUG] [pid=10229 nav.maintengine] Subjects that should not be on maintenance but was: set([<Netbox: kulthverksted-sw.infra>])
[2014-12-17 08:30:00,696] [DEBUG] [pid=10229 nav.maintengine] Event posted: <EventQueue: event_type_id='maintenanceState', source_id='maintenance', target_id='eventEngine', netbox=<Netbox: kulthverksted-sw.infra>, subid='', state='e', time=datetime.datetime(2014, 12, 17, 8, 30, 0, 682149)>
[2014-12-17 08:30:00,719] [DEBUG] [pid=10229 nav.maintengine] Finished in 0.062s
[2014-12-17 08:30:00,719] [DEBUG] [pid=10229 nav.maintengine] ------------------------------------------------------------

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Has the kulthverksted-sw.infra device been physically replaced during the maintenance period? Maintengine seems to work as it should, but it may be that the eventengine cannot match the offMaintenance alert to the onMaintenance alert becayse the netbox has switched device id's. Check the eventengine logs...

also, you can forcibly close the maintenance alert from the new Status page (check the "on maintenace" filter checkbox to see these alerts).

Changed in nav:
status: New → Incomplete
assignee: nobody → Morten Brekkevold (mbrekkevold)
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

This has been re-reported by another customer, along with sufficient information to reproduce.

It appears there has been a subtle change between NAV version in how maintenance events are generated, which affects the ability to match old maintenance states with new offMaintenance alerts.

Older events seem to have been posted with the event subid attribute set to a NULL value, while new events are posted with the subid attribute set to an empty string. A computer will not consider these two things to be equal, causing the offMaintenance alert to go unmatched.

The result is that the existing maintenance state is not closed, but the event itself is forwarded as an alert to anyone who subscribes to it.

Changed in nav:
status: Incomplete → Confirmed
summary: - maintengine alerting about off maintenance
+ offMaintenance alerts for same device every 5 minutes
Changed in nav:
importance: Undecided → High
status: Confirmed → In Progress
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :
Changed in nav:
milestone: none → 4.2.4
status: In Progress → Confirmed
status: Confirmed → Fix Committed
Changed in nav:
status: Fix Committed → Fix Released
Changed in nav:
milestone: 4.2.4 → 4.2.5
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers