pping crashes when re-creating RRD files

Bug #733115 reported by macrom
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Network Administration Visualized
Fix Released
Medium
Morten Brekkevold

Bug Description

NAV : 3.8.2
OS : RHEL 6.0

I'm trying to upgrade from nav-3.5.6 running on RHEL 5.x (32bit) to nav-3.8.2 running on RHEL 6 (64bit).
The nav installalation is fresh, and the database is cloned and upgraded from 3.5.6 to 3.8.2.

When starting pping, this error is printed to pping.log :
[2011-03-11 09:00:10] db.py:execute:146 [Critical] Throwing away update...
[2011-03-11 09:00:10] rrd.py:create:61 [Notice] Created rrd file kat-sw3.foo.tld.rrd
[2011-03-11 09:00:10] db.py:cursor:97 [Critical] Could not get cursor. Trying to reconnect...
[2011-03-11 09:00:10] db.py:connect:75 [Notice] Successfully (re)connected to NAVdb
[2011-03-11 09:00:10] db.py:execute:137 [Notice] Executing: INSERT INTO rrd_file
            (rrd_fileid, path, filename, step, netboxid, subsystem) VALUES
            (97812,'/usr/local/nav/var/rrd','kat-sw3.foo.tld.rrd',300,1080,'pping')
[2011-03-11 09:00:10] db.py:execute:145 [Critical] duplicate key value violates unique constraint "rrd_file_path_filename_key"

[2011-03-11 09:00:10] db.py:execute:146 [Critical] Throwing away update...
[2011-03-11 09:00:10] db.py:cursor:97 [Critical] Could not get cursor. Trying to reconnect...
[2011-03-11 09:00:10] db.py:connect:79 [Critical] Couldn't connect to db.
[2011-03-11 09:00:10] db.py:connect:80 [Critical] FATAL: connection limit exceeded for non-superusers

[2011-03-11 09:00:10] db.py:execute:148 [Critical] Could not execute statement: INSERT INTO rrd_datasource
        (rrd_fileid, name, descr, dstype, units) VALUES
        (97812, 'RESPONSETIME', 'Roundtrip time', 'GAUGE', 's')
[2011-03-11 09:00:10] db.py:execute:149 [Notice] 'NoneType' object has no attribute 'cursor'
Traceback (most recent call last):
  File "/usr/local/nav/bin/pping.py", line 271, in <module>
    start(nofork)
  File "/usr/local/nav/bin/pping.py", line 239, in start
    myPinger.main()
  File "/usr/local/nav/bin/pping.py", line 172, in main
    self.generateEvents()
  File "/usr/local/nav/bin/pping.py", line 114, in generateEvents
    rrd.update(netbox.netboxid, netbox.sysname, 'N', 'UP', rtt)
  File "/usr/local/nav/lib/python/nav/statemon/rrd.py", line 123, in update
    create(filename, netboxid, serviceid, handler)
  File "/usr/local/nav/lib/python/nav/statemon/rrd.py", line 62, in create
    register_rrd(filename, netboxid, serviceid, handler)
  File "/usr/local/nav/lib/python/nav/statemon/rrd.py", line 83, in register_rrd
    responsedescr, "GAUGE", "s")
  File "/usr/local/nav/lib/python/nav/statemon/db.py", line 325, in registerDS
    self.execute(statement)
  File "/usr/local/nav/lib/python/nav/statemon/db.py", line 151, in execute
    self.db.rollback()
AttributeError: 'NoneType' object has no attribute 'rollback'

The rrd files are not migreted over from the production server, so pping fails when it tries to create a rrd file which allready are present in the database.

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

pping registers an RRD file in NAV's PostgreSQL database as soon as it is created by pping. If the RRD file once existed in the same location but has since been removed, it may still be referenced in the db. pping doesn't check to see if it's already in the db, and fails on an integrity error.

I guess pping should either check the db before inserting a new row, or handle the integrityerror more gracefully.

summary: - pping crashes in nav-3.8.2
+ pping crashes when re-creating RRD files
Changed in nav:
assignee: nobody → Morten Brekkevold (mbrekkevold)
importance: Undecided → Low
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Further analysis shows that the underlying problem is that pping interprets a database error as a lost database connection (!). Each time this happens, it creates a new connection. The old one is discarded, but not closed.

In your case, you likely had hundreds of RRD filed created that were already registered in your migrated database, which caused pping to quickly fill up the amount of available non-superuser connections to PostgreSQL. When it could no longer open a new connection, the daemon crashed. This was confirmed on my dev server.

This latter bug should probably be filed separately...

Changed in nav:
importance: Low → Medium
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :
Changed in nav:
milestone: none → 3.8.3
status: New → Confirmed
status: Confirmed → Fix Committed
Changed in nav:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.