DB deadlocks on simultaneous port creation

Bug #1479738 reported by Oleg Bondarev
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Oleg Bondarev

Bug Description

This was observed during tests on environment with several controllers: when a routers with gateways and subnets are created at a high rate, sometimes port creation for router gateway may fail with DBDeadlock. In several cases that I investigated I found that deadlock happens when router port is created in parallel with dhcp port(s) creation on other servers. Generally we have simultaneous port creation. Port creation involves locking 'ports' and 'binding' tables: get_locked_port_and_binding() ml2 db method, which essentially does:
        port = (session.query(models_v2.Port).
                enable_eagerloads(False).
                filter_by(id=port_id).
                with_lockmode('update').
                one())
        binding = (session.query(models.PortBinding).
                   enable_eagerloads(False).
                   filter_by(port_id=port_id).
                   with_lockmode('update').
                   one())

Also there are locks during ip allocation for the port.
I'm not sure how exacly this may lead to deadlock. It may probably happen due to specifics of Galera working in active-active
mode: throwing deadlock errors when it fails to validate a change with other members of the cluster.

Examples of tracebacks:
http://paste.openstack.org/show/399624/
http://paste.openstack.org/show/405057/

Revision history for this message
Oleg Bondarev (obondarev) wrote :

I'm going to apply fix similar to https://review.openstack.org/#/c/180466/. Though it's more a workaround, it should fix the issue with the only downside of a slight delay in port creation in a very rare circumstances.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/207532

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Oleg Bondarev (<email address hidden>) on branch: master
Review: https://review.openstack.org/207532

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/207532
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=272768caddb17617e4b5af960075d07a623cd8ca
Submitter: Jenkins
Branch: master

commit 272768caddb17617e4b5af960075d07a623cd8ca
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 30 19:24:38 2015 +0300

    Add oslo db retry decorator to the RPC handlers

    The decorator was previously added at the API layer
    (commit 4e77442d529d9803ff90de905b846af940eaf382,
    commit d04335c448aa15cf9e1902e22ed4cd17b6ed344b).
    However some RPC handlers are also dealing with port
    create/update/delete operations, like dhcp ports for example.
    We need to cover these cases too.

    Also remove db retry from ml2 plugin delete_port()
    as it's not needed once we retry at the API and RPC layers.
    (there is already a unit test on this)

    The patch also adds a unit test for checking deadlock
    handling during port creation at API layer.
    Though it's not directly related to the current fix,
    I decided to leave it for regression preventing purposes.

    Closes-Bug: #1479738
    Change-Id: I7793a8f7c37ca542b8bc12372168aaaa0826ac4c

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/211492

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (37.3 KiB)

Reviewed: https://review.openstack.org/211492
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a7b91632fc65ab9d2687298c68b1d715866d0356
Submitter: Jenkins
Branch: feature/pecan

commit 966203f89dee8fe61fb2dce654e36e510e80380f
Author: Sukhdev Kapur <email address hidden>
Date: Wed Jul 1 16:30:44 2015 -0700

    Neutron-Ironic integration patch

    This patch is in preparation for the integration
    of Ironic and Neutron. A new vnic_type is being
    added so that ML2 drivers can filter for all
    Ironic ports based upon match for 'baremetal'.
    Nova/Ironic will set this vnic_type when issuing
    port-create request to neutron.
    (e.g. binding:vnic_type = 'baremetal' )

    Change-Id: I25dc9472b31db052719db503a10c1fb1a55572ef
    Partial-Implements: blueprint neutron-ironic-integration

commit 236e408272bcb9b8e957524864e571b5afdc4623
Author: Oleg Bondarev <email address hidden>
Date: Tue Jul 7 12:02:58 2015 +0300

    DVR: fix router scheduling

    Fix scheduling of DVR routers to not stop scheduling once
    csnat portion was scheduled. See bug report for failing
    scenario.

    This partially reverts
    commit 3794b4a83e68041e24b715135f0ccf09a5631178
    and fixes bug 1374473 by moving csnat scheduling
    after general dvr router scheduling, so double binding does
    not happen.

    Closes-Bug: #1472163
    Related-Bug: #1374473
    Change-Id: I57c06e2be732e47b6cce7c724f6b255ea2d8fa32

commit e152f93878b9bb6af7cfedc9e045892fcf7d0615
Author: Assaf Muller <email address hidden>
Date: Sat Aug 8 21:15:03 2015 +0300

    TESTING.rst love

    Change-Id: I64b569048f8f87ea2fe63d861302b4020d36493d

commit 633c52cca1b383af2c900e1663c8682114acd177
Author: sridhargaddam <email address hidden>
Date: Wed Aug 5 10:49:33 2015 +0000

    Avoid dhcp_release for ipv6 addresses

    dhcp_release is only supported for IPv4 addresses [1] and not for
    IPv6 addresses [2]. There will be no effect when it is called with
    IPv6 address. This patch adds a corresponding note and avoids calling
    dhcp_release for IPv6 addresses.

    [1] http://manpages.ubuntu.com/manpages/trusty/man1/dhcp_release.1.html
    [2] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2013q2/007084.html

    Change-Id: I8b8316c9d3d011c2a687a3a1e2a4da5cf1b5d604

commit 2de8fad17402f38bbc30204ee2f4f99cf21cb69d
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Aug 10 06:11:06 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I2b423e83a7d0ac8b23239f81fe33dd8382c6fff6

commit fef79dc7b9162e03c8891645494c115b52d4d014
Author: Henry Gessau <email address hidden>
Date: Mon Aug 3 23:30:34 2015 -0400

    Consistent layout and headings for devref

    The lack of convention for heading levels among the independently
    written devref documents was starting to make the Table of Contents
    look rather messy when rendered in HTML.

    This patch does not cover the "Neutron Internals" section since its
    layo...

tags: added: in-feature-pecan
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-3 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.