Move db retry logic closer to where DB error occur

Bug #1612798 reported by Armando Migliaccio
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Kevin Benton

Bug Description

This has caused weird failure modes where DB errors get masked by integrity violation errors.

Tags: gate-failure
Changed in neutron:
status: New → Confirmed
assignee: nobody → Kevin Benton (kevinbenton)
importance: Undecided → Low
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

DB errors get masked by integrity violation errors or vice versa?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The former.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

We are supposed to retry on an integrity error because it means there was a race between validation and commit. There isn't any masking issue that I can see.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

There is a problem in general with the retry logic if an operation commits data, does something that triggers a retry, and then fails to rollback what it did on the exception.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

What I mean is that by retrying the whole API operation, it is very likely that the retry is ineffective because of a partial commit (not every method has been designed to be 'retriable', which is something relatively recent). The failure of a retry shows up as an integrity or validation error. In the past, without retry, the failure mode was a lot more recognizable as it was bubbled up as a DB Error of a deadlock nature or equivalent.

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
importance: Low → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :
Changed in neutron:
importance: High → Critical
tags: added: gate-failure
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Some of the bugs we're seeing in the gate are the direct consequence of this issue.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

or lack thereof...

Since we plan to address those by going full steam with this one, might as well capture the 'duplicate' LP relationship.

Changed in neutron:
milestone: none → newton-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/364622

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/364622
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e2fdeefd37314d3e6c5910fcc7070de34561ec44
Submitter: Jenkins
Branch: master

commit e2fdeefd37314d3e6c5910fcc7070de34561ec44
Author: Armando Migliaccio <email address hidden>
Date: Thu Sep 1 19:33:32 2016 -0700

    Deal with unknown exceptions during auto allocation

    Uncaught exceptions in the core or L3 plugin layers can cause the
    auto_allocate plugin to fail unexpectedly. This patch ensures that
    any unexpected error is properly handled by cleaning up half
    provisioned deployment.

    Related-bug: #1612798
    Closes-bug: #1619497

    Change-Id: I3eb9efd33363045f7b2ccd97fe4a48891f48b161

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote : Re: Move db retry logic away from API layer
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/367060

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/367179

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367180

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367182

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367183

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367184

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367185

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367187

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367188

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367189

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367191

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367192

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367193

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/367194

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/367060
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=047c0b5293f7d63279159343185ce183afaa3574
Submitter: Jenkins
Branch: master

commit 047c0b5293f7d63279159343185ce183afaa3574
Author: Armando Migliaccio <email address hidden>
Date: Wed Sep 7 20:04:33 2016 -0700

    Ensure UnknownProvisioningError can be printed

    Implement __str__ to show the underlying error during log
    traces.

    Change-Id: I829d787ba0d8666b09abc0307abb47fe9699c247
    Related-bug: #1612798

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/356530
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=09c87425fa028dbe669e8c215e334297ccbf1c2a
Submitter: Jenkins
Branch: master

commit 09c87425fa028dbe669e8c215e334297ccbf1c2a
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 18:27:49 2016 -0700

    Prepare retry decorator to move to plugin level

    Retrying mutating operations at the API layer caused a
    couple of problems. First, when components would call
    the core plugin using the neutron manager, they would
    have to handle the retriable failures on their own or
    undo any work they had done so far and allow retriable
    failures to be propagated up to the API. Second, retrying
    at the API makes composite operations (e.g. auto allocate,
    add_router_interface, etc) painful because they have to
    consider if exceptions are retriable before raising
    fatal exceptions on failures of core plugin calls.

    This patch begins the process of moving them down to the
    core operations with a new decorator called
    'retry_if_session_inactive', which ensures that the
    retry logic isn't triggered if there is an ongoing transaction
    since retrying inside of a transaction is normally ineffective.
    Follow-up patches apply them to various parts of the code-base.

    Additionally, the args and kwargs of the method are automatically
    deep copied in retries to ensure that any mangling the methods
    do to their arguments don't impact their retriability.

    Finally, since we are leaving the API decorators in place for now,
    the retry logic will not be triggered by another decorator if an
    exception has already been retried. This prevents an exponential
    explosion of retries on nested retry decorators.

    The ultimate goal will be to get rid of the API decorators entirely
    so retries are up to each individual plugin.

    Partial-Bug: #1596075
    Partial-Bug: #1612798
    Change-Id: I7b8a4a105aabfa1b5f5dd7a638099007b0933e66

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367179
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=acbabaa3db2c12a41c52662a82e9a900d5929846
Submitter: Jenkins
Branch: master

commit acbabaa3db2c12a41c52662a82e9a900d5929846
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 18:42:40 2016 -0700

    Make l2/l3 operations retriable at plugin level

    This adds decorators to ML2/db_base_plugin_v2 and the L3
    DB modules to make L2 and L3 resource operations retriable
    at the plugin level.

    Retrying the L2 operations in particular at this level should
    go a long way to improving the reliability of complex operations
    that require L2 resources in the face of deadlocks under high
    concurrency.

    Partial-Bug: #1612798
    Partial-Bug: #1596075
    Change-Id: I3c9437f7ecdd5ebb188b622144b7bd7bed74231d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367191
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a4ccc0ce1cbbd37cb3a2e17432b684557fa51b30
Submitter: Jenkins
Branch: master

commit a4ccc0ce1cbbd37cb3a2e17432b684557fa51b30
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 19:08:58 2016 -0700

    Move retry decorator in auto allocate

    No longer necessary on cleanup since individual plugin operations
    are protected but it should be on _ensure_network_default_value
    callback since it can happen outside of a transaction.

    Partial-Bug: #1612798
    Change-Id: Iab4f2216e6b138296e8245eb0b1e3e6b5e46561b

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367180
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=12420c1585abadc741440dcd48a5540ef85357db
Submitter: Jenkins
Branch: master

commit 12420c1585abadc741440dcd48a5540ef85357db
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 18:51:58 2016 -0700

    Mark quota operations as retriable

    This decorates the quota system operations with
    the retry decorators. This will help significantly
    with the bug this marks as closed since operations
    in the quota engine after commit should no longer
    trigger retries of the full API operation.

    The logic to find the args in the decorator had
    to be adjusted to deal with functions already decorated.
    This just uses the getargspec function from pecan that
    deals with decorated functions.

    Partial-Bug: #1612798
    Closes-Bug: #1596075
    Change-Id: Ib786117dcea08af75551770ea4c30d460382b829

Changed in neutron:
assignee: Kevin Benton (kevinbenton) → Armando Migliaccio (armando-migliaccio)
Changed in neutron:
assignee: Armando Migliaccio (armando-migliaccio) → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367184
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=39ace4c9e3c67b89e7d42ea7b0359809fe1e0928
Submitter: Jenkins
Branch: master

commit 39ace4c9e3c67b89e7d42ea7b0359809fe1e0928
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 18:57:38 2016 -0700

    Utilize retry_if_session_inactive in dvr_mac_db

    Adds the retry decorator to the DVR mac DB module and
    leverages its ability to retry on DBDuplicate entries
    to simplify the code.

    We also add a transaction guard to make the previously
    implicit assumption that _create_dvr_mac_address would
    be called outside of a transaction explicit.

    Partial-Bug: #1612798
    Change-Id: I0c919a3f08e0be1aea4e28dc8f6d92b51de129e0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367188
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0eafa886d46c5a74f267211cf31df043c3399c60
Submitter: Jenkins
Branch: master

commit 0eafa886d46c5a74f267211cf31df043c3399c60
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 19:04:41 2016 -0700

    Add retry decorator to provisioning blocks module

    This adds the retry decorator to the provisioning blocks
    module.

    Change-Id: I1abc7618b5431d4c3e0b84c2d59f83f8eb5a4ffe
    Partial-Bug: #1612798

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367190
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5902d0f174ba89ad1b467f0b01fdcd829c7d76de
Submitter: Jenkins
Branch: master

commit 5902d0f174ba89ad1b467f0b01fdcd829c7d76de
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 19:06:41 2016 -0700

    Protect security group operations with the retry decorator

    Adds the retry decorator to the security group DB module.

    Partial-Bug: #1612798
    Change-Id: I899ed5cacde769ecbb4057b6cb48e3624b07e149

summary: - Move db retry logic away from API layer
+ Move db retry logic closer to where DB error occur
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367189
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7317b8c95b59b709ed11cfd4021aeaa2e9c71e88
Submitter: Jenkins
Branch: master

commit 7317b8c95b59b709ed11cfd4021aeaa2e9c71e88
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 19:05:27 2016 -0700

    Add retry decorator to RBAC module

    This protects the RBAC operations with the retry decorator.

    Change-Id: Iac507483157b4cae34f11fee4eb977e718131f15
    Partial-Bug: #1612798

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

other patches to be reproposed if they burn.

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/367182
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a74cd2f7798413f8da323390213a21fab148b5e4
Submitter: Jenkins
Branch: master

commit a74cd2f7798413f8da323390213a21fab148b5e4
Author: Kevin Benton <email address hidden>
Date: Wed Sep 7 18:35:34 2016 -0700

    Mark agents db mixin operations retriable

    Marks the CRUD operations in the agents DB mixin as
    retriable.

    Change-Id: I1310d85daa29906b18d083e33a11a8b1463c0ad7
    Partial-Bug: #1612798

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/367183
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/367185
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/367181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/367186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/367187

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/367192

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/367193

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/367194

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.