Services are not able to handle transient DB errors when attempting to report state

Bug #1466991 reported by Jay Bryant
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Jay Bryant
Kilo
Fix Released
Undecided
Unassigned

Bug Description

While testing HA configurations for Cinder we have encountered cases where the services running on the various nodes of the HA cluster encounter transient DB issues which are not properly handled in service.report_state(). Transient DB errors can be encountered and they should not cause the report_state thread to end its polling loop.

The problem can be resolved by making a change like the following:

        except db_exc.DBConnectionError:
            if not getattr(self, 'model_disconnected', False):
                self.model_disconnected = True
                LOG.exception(_LE('model server went away'))
        except db_exc.DBError:
            if not getattr(self, 'model_disconnected', False):
                self.model_disconnected = True
                LOG.exception(_LE('DBError encountered '))

Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Jay Bryant (jsbryant)
Jay Bryant (jsbryant)
Changed in cinder:
milestone: none → liberty-2
Changed in cinder:
status: Triaged → In Progress
Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/195427

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/kilo)

Reviewed: https://review.openstack.org/195427
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=6c055943e70675585ef45301318c47ccc1139ce9
Submitter: Jenkins
Branch: stable/kilo

commit 6c055943e70675585ef45301318c47ccc1139ce9
Author: Jay S. Bryant <email address hidden>
Date: Mon Jun 22 11:43:37 2015 -0500

    Add exception catch in report_state for DBError

    We discovered while testing Cinder in an HA
    environment that transient DB errors can be
    encountered that are not currently covered by
    the exception catch in service.py report_state().
    The uncaught exceptions were causing the thread
    for report_state to prematurely exit and causing
    services to no longer update the DB when, in fact,
    they were still usable.

    This change adds an exception catch for DBError
    so that the thread can continue to function.

    NOTE: Commit fcb5068e79490351359afdec9f05f42ebb022edf added
          self.host, self.binary and self.topic to the setUp for
          these test cases. I needed to modify the new test case
          in this cherry pick to account for the above commit not
          being in Kilo.

    Change-Id: I95f79b8d6c8f5d7b3e44665306b500b8d2ce3c7c
    Closes-bug: 1466991
    (cherry picked from commit 3e4caa552969dd298306b124bd2dc02e7a62c835)

tags: added: in-stable-kilo
Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-2 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.