getDeviceData and eventEngine stop working after PostgreSQL disconnect

Bug #280145 reported by Morten Brekkevold
2
Affects Status Importance Assigned to Milestone
Network Administration Visualized
Fix Released
High
Morten Brekkevold
3.4
Fix Released
High
Morten Brekkevold
3.5
Fix Released
High
Morten Brekkevold

Bug Description

There seems to be a problem with the stability of the getDeviceData and eventEngine daemons when there is a loss of database connectivity.

Although it is thought that NAV's Java Database library was designed to overcome temporary losses of connectivity, it seems that this is not the case.

Several installations report logs full of SQLExceptions with the following error message: "An I/O error occured while sending to the backend".

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Installations with a separate PostgreSQL server may be more prone to these problems, but they do occur on single-server installations too.

Changed in nav:
assignee: nobody → mvold
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

I took a look at the code, and it tries to detect a lost database connection by searching for certain substrings in the error message present in an SQLException. This is just ridiculous, as the error message has obviously changed in newer versions of PostgreSQL or the JDBC driver.

Working on a patch that matches against exception types instead.

Changed in nav:
status: Confirmed → In Progress
Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

My patch relies on proper exception chaining, which was introduced in Java 1.4. PostgreSQL's JDBC-driver tries to autodetect during its config phase whether to build it with or without exception chaining, mostly based on Java version.

On Debian, it seems the packaged driver isn't built with Java 1.4 or newer, because it doesn't properly chain exceptions.

This means substring searching still has to be a fallback option for the connection loss detection routine. A PostgreSQL JDBC-driver built with Java < 1.4 will add tracebacks to the error message to indicate chained exceptions, instead of using proper chaining.

*frustrated*

Revision history for this message
Morten Brekkevold (mbrekkevold) wrote :

Fixed in changeset 8b340b29faac, pushed to series/3.4.x. Will be merged to default and series/3.5.x shortly.

http://metanav.uninett.no/hg/series/3.4.x/rev/8b340b29faac

Changed in nav:
status: In Progress → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.