Status page does not show current IP devices down

Bug #906849 reported by Ingeborg Hellemo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Network Administration Visualized
Fix Released
High
Morten Brekkevold

Bug Description

Version 3.9.4

"Status now" box on front page shows correctly a device as down

On the Status page <https://nav.example.com/status/> the "IP devices down" is empty. You have to click on the history-tag to find the current IP devices down.

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

I'm unable to reproduce this. Does this happen for every device that goes down on your installation, or just specific ones? Is maintenance involved?

Changed in nav:
status: New → Incomplete
Revision history for this message
Ingeborg Hellemo (ingeborg-hellemo) wrote :

This is our production NAV and we do not have that many boxes down. But when I have observed this (during the past week) there has indeed been a device on maintenance at the same time - which correctly shows up in "IP devices on maintenance".

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Still unable to reproduce. Can verify that the devices aren't actually reported as being in shadow?

Can you also verify that your status preferences haven't been modified from the default? I.e. go to Status -> Preferences -> IP Devices down, and the filters should be set to all organizations, all categories and the down state.

Revision history for this message
Ingeborg Hellemo (ingeborg-hellemo) wrote :

Upgraded to NAV 3.10.1. Checked that all preferences are (all) (all)

At the moment I have one device on maintenance (shows on status page) and one service down (does not show). If I go to Services down - history, I can see the service that is down. In other words, this bug affects services as well as devices.

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

I was granted access to Ingeborg's NAV installation and finally confirmed the issue.

Ingeborg's database contains a series of unresolved maintenanceState and boxState entries in the alerthist table, where the corresponding netbox has been deleted, i.e. the netboxid field is NULL.

The ORM queries built by the Status page does not take into account that some rows in alerthist may have netboxid=NULL, meaning that one or two of the SQL JOINs produced by the Django ORM will return no result rows when such rows are present in alerthist.

An immediate fix is to make the Status page take this properly into account.

A followup fix should make sure that unresolved alerthist entries are marked as resolved when netboxes are deleted, via a database rule/trigger.

Changed in nav:
status: Incomplete → Confirmed
importance: Undecided → High
assignee: nobody → Morten Brekkevold (mbrekkevold)
status: Confirmed → In Progress
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Just want to clarify my imprecise comment about JOINs being the problem, it is of course the SQL IN operator which causes the problem. If the result set matched by IN contains NULL values, then no matches are made.

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :
Changed in nav:
status: In Progress → Fix Committed
milestone: none → 3.10.3
Changed in nav:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.