dhcp agent RPC handler doesn't retry DBError

Bug #1618216 reported by Kevin Benton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Kevin Benton

Bug Description

The RPC handler to create a port is catching DBError's before the retry decorator gets a chance to retry them. This ends up being treated like a broken network to the agent so the network will not have any DHCP service, leading to difficult to debug failures like this one:

http://logs.openstack.org/82/346282/4/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/d4ea1f6/console.html

Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/362458

Changed in neutron:
status: New → In Progress
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This can prevent retries.

Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/362458
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=85ed7017ff22dd24ef7558010e62598483561354
Submitter: Jenkins
Branch: master

commit 85ed7017ff22dd24ef7558010e62598483561354
Author: Kevin Benton <email address hidden>
Date: Sat Aug 27 05:03:54 2016 -0700

    Don't catch DBError in DHCP action handler

    The DHCP port action handler has been catching DBErrors
    since f1b9ac5a542a3125d757094fccda80c80c6dd420, which is
    well before we had the retry decorator to deal with these.
    With the port action handler catching these, it means there
    will not be retries on deadlocks or connection errors so
    transient situations can result in a permanently broken
    DHCP service for a network.

    This removes the catch for DBError so the decorator can retry
    the operation.

    Closes-Bug: #1618216
    Change-Id: I42031b481958bbfdb8f52902c294022717af7adf

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/363332
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9726a00c908341654f4cdeb444adf2ce5c717b80
Submitter: Jenkins
Branch: master

commit 9726a00c908341654f4cdeb444adf2ce5c717b80
Author: Armando Migliaccio <email address hidden>
Date: Wed Aug 31 01:04:42 2016 +0000

    Narrow down DBError to DBReferenceError in DHCP action handler

    Commit 85ed7017ff22dd24ef7558010e62598483561354 removed the DBError
    handling to let the retry decorator do its magic, however the
    full implications of this change were not evaluated. As a result,
    DBReferenceError (which derives from DBError) is not processed
    correctly and that caused a regression of the existing logic.

    Rather than bloat the retry's responsibility even further, this
    patch partially reverts commit 85ed7017f by narrowing down the
    exception handling to DBReferenceError only.

    Related-bug: #1618216

    Change-Id: Icf4e5e4145dcdcdc710b8e42044467913ed01ec1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.