Failure to allocate tunnel id when creating networks concurrently

Bug #1382064 reported by Eugene Nikanorov on 2014-10-16
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
neutron
High
Eugene Nikanorov
Juno
Undecided
Unassigned
Kilo
Undecided
Unassigned

Bug Description

When multiple networks are created concurrently, the following trace is observed:

WARNING neutron.plugins.ml2.drivers.helpers [req-34103ce8-b6d0-459b-9707-a24e369cf9de None] Allocate gre segment from pool failed after 10 failed attempts
DEBUG neutron.context [req-2995f877-e3e6-4b32-bdae-da6295e492a1 None] Arguments dropped when creating context: {u'project_name': None, u'tenant': None} __init__ /usr/lib/python2.7/dist-packages/neutron/context.py:83
DEBUG neutron.plugins.ml2.drivers.helpers [req-3541998d-44df-468f-b65b-36504e893dfb None] Allocate gre segment from pool, attempt 1 failed with segment {'gre_id': 300L} allocate_partially_specified_segment /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/helpers.py:138
DEBUG neutron.context [req-6dcfb91d-2c5b-4e4f-9d81-55ba381ad232 None] Arguments dropped when creating context: {u'project_name': None, u'tenant': None} __init__ /usr/lib/python2.7/dist-packages/neutron/context.py:83
ERROR neutron.api.v2.resource [req-34103ce8-b6d0-459b-9707-a24e369cf9de None] create failed
TRACE neutron.api.v2.resource Traceback (most recent call last):
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/resource.py", line 87, in resource
TRACE neutron.api.v2.resource result = method(request=request, **args)
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/api/v2/base.py", line 448, in create
TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs)
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py", line 497, in create_network
TRACE neutron.api.v2.resource tenant_id)
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 160, in create_network_segments
TRACE neutron.api.v2.resource segment = self.allocate_tenant_segment(session)
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/managers.py", line 189, in allocate_tenant_segment
TRACE neutron.api.v2.resource segment = driver.obj.allocate_tenant_segment(session)
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/type_tunnel.py", line 115, in allocate_tenant_segment
TRACE neutron.api.v2.resource alloc = self.allocate_partially_specified_segment(session)
TRACE neutron.api.v2.resource File "/usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/helpers.py", line 143, in allocate_partially_specified_segment
TRACE neutron.api.v2.resource raise exc.NoNetworkFoundInMaximumAllowedAttempts()
TRACE neutron.api.v2.resource NoNetworkFoundInMaximumAllowedAttempts: Unable to create the network. No available network found in maximum allowed attempts.
TRACE neutron.api.v2.resource

Additional conditions: multiserver deployment and mysql.

Changed in neutron:
assignee: nobody → Eugene Nikanorov (enikanorov)
importance: Undecided → High
tags: added: ml2

Fix proposed to branch: master
Review: https://review.openstack.org/129288

Changed in neutron:
status: New → In Progress
tags: added: juno-backport-potential
Jay Pipes (jaypipes) wrote :

Is this using MySQL Galera as the backend database server? And if so, is the Galera setup using only a single writer node?

Eugene Nikanorov (enikanorov) wrote :

Yes, it's with galera, single writer node.
But that doesn't matter. The issue would be the same with single mysql backend.

Oleg Bondarev (obondarev) wrote :

Seems following patch reveals the problem: https://review.openstack.org/#/c/140493/

Fix proposed to branch: master
Review: https://review.openstack.org/141453

Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → Ed Bak (ed-bak2)
Changed in neutron:
assignee: Ed Bak (ed-bak2) → Eugene Nikanorov (enikanorov)
Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → Russell Bryant (russellb)
Changed in neutron:
assignee: Russell Bryant (russellb) → Eugene Nikanorov (enikanorov)

Reviewed: https://review.openstack.org/129288
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=6617f8fccc8d99520a87cd84a598c4f9a1a43761
Submitter: Jenkins
Branch: master

commit 6617f8fccc8d99520a87cd84a598c4f9a1a43761
Author: Eugene Nikanorov <email address hidden>
Date: Mon Nov 17 11:00:49 2014 +0400

    Change transaction isolation so retry logic could work properly

    Lower isolation level from REPEATABLE READ to READ COMMITTED for
    transaction that is used to create a network.
    This allows retry logic to see changes done in other connections
    while doing the same query.
    Perform that only for mysql db backend.

    Change-Id: I6b9d9212c37fe028566e0df4a3dfa51f284ce6e9
    Closes-Bug: #1382064

Changed in neutron:
status: In Progress → Fix Committed

Change abandoned by Ryan Tidwell (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/148083
Reason: Cherry-picked change reverted on master

YAMAMOTO Takashi (yamamoto) wrote :

copy-and-paste from https://review.openstack.org/#/c/129288/

enikanorov
12-16 19:51
Patch Set 8:
Ok, I'll add this to the bug.
So my current understanding is the following:
when REPEATABLE READ is used, each distinct query in the transaction creates a snapshot on DB backend side that is used when going along the query or when issuing the same query in that transaction. When READ COMMITTED is used, each fetch reaches table directly and that, IMO, increases possible contention that leads to much more frequent deadlocks. Previous version of the patch (that set isolation level for each connection globally) demonstrated it in the gates. But that's only my guess of technical reasons, I can be wrong here.
Context manager is nice suggestion, thanks.

Fix proposed to branch: master
Review: https://review.openstack.org/149261

Change abandoned by enikanorov (<email address hidden>) on branch: master
Review: https://review.openstack.org/148339

Thierry Carrez (ttx) on 2015-02-05
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Changed in neutron:
status: Fix Released → In Progress
milestone: kilo-2 → kilo-3

Change abandoned by Ed Bak (<email address hidden>) on branch: master
Review: https://review.openstack.org/141453
Reason: Not needed any longer. https://review.openstack.org/#/c/129288 solves this problem.

Changed in neutron:
assignee: Eugene Nikanorov (enikanorov) → Assaf Muller (amuller)

Reviewed: https://review.openstack.org/149261
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5dbb34b56fc42d9c68bf6647910a437a2ad6b29e
Submitter: Jenkins
Branch: master

commit 5dbb34b56fc42d9c68bf6647910a437a2ad6b29e
Author: Eugene Nikanorov <email address hidden>
Date: Thu Jan 22 15:54:29 2015 +0300

    Refactor retry mechanism used in some DB operations

    Use oslo_db helper that will allow to restart the whole
    transaction in case it needs a certain operation to be repeated.
    This is a workaround for the REPEATABLE READ problem where
    retrying logic will not work because queries inside a transation
    will not see updates made by other transactions.
    So, run every attempt in a separate transaction.

    Change-Id: I68f9ae8019879725df58f5da2c83bb699a548255
    Closes-Bug: #1382064

Changed in neutron:
status: In Progress → Fix Committed
Changed in neutron:
assignee: Assaf Muller (amuller) → Eugene Nikanorov (enikanorov)
Thierry Carrez (ttx) on 2015-03-19
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-04-30
Changed in neutron:
milestone: kilo-3 → 2015.1.0

Reviewed: https://review.openstack.org/182901
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1d9fd2aec00cb85034e5a23cc1beac33c74e0110
Submitter: Jenkins
Branch: master

commit 1d9fd2aec00cb85034e5a23cc1beac33c74e0110
Author: Eugene Nikanorov <email address hidden>
Date: Mon May 11 01:34:35 2015 +0400

    Randomize tunnel id query to avoid contention

    When networks are created rapidly, neutron-servers compete
    for segmentation ids which creates too much contention and
    may lead to inability to choose available id in hardcoded amount
    of attempts (11)
    Randomize tunnel id selection so that condition is not hit.

    Change-Id: I7068f90fe4927e6e693f8a62cb704213b2da2920
    Related-Bug: #1382064
    Closes-Bug: #1454434

Reviewed: https://review.openstack.org/188994
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9bc323316a27676859280c9ee413a791c386ac64
Submitter: Jenkins
Branch: stable/kilo

commit 9bc323316a27676859280c9ee413a791c386ac64
Author: Eugene Nikanorov <email address hidden>
Date: Mon May 11 01:34:35 2015 +0400

    Randomize tunnel id query to avoid contention

    When networks are created rapidly, neutron-servers compete
    for segmentation ids which creates too much contention and
    may lead to inability to choose available id in hardcoded amount
    of attempts (11)
    Randomize tunnel id selection so that condition is not hit.

    Change-Id: I7068f90fe4927e6e693f8a62cb704213b2da2920
    Related-Bug: #1382064
    Closes-Bug: #1454434
    (cherry picked from commit 1d9fd2aec00cb85034e5a23cc1beac33c74e0110)

tags: added: in-stable-kilo

Change abandoned by stephen-ma (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/195792

Reviewed: https://review.openstack.org/195797
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4cd1b58c8c4ae2a9da31afc1a87647003d6ac128
Submitter: Jenkins
Branch: stable/juno

commit 4cd1b58c8c4ae2a9da31afc1a87647003d6ac128
Author: Eugene Nikanorov <email address hidden>
Date: Thu Jan 22 15:54:29 2015 +0300

    Refactor retry mechanism used in some DB operations

    Use oslo_db helper that will allow to restart the whole
    transaction in case it needs a certain operation to be repeated.
    This is a workaround for the REPEATABLE READ problem where
    retrying logic will not work because queries inside a transation
    will not see updates made by other transactions.
    So, run every attempt in a separate transaction.

    Conflicts:
            neutron/plugins/ml2/drivers/helpers.py
            neutron/plugins/ml2/plugin.py
            neutron/tests/unit/ml2/test_ml2_plugin.py

    (cherry picked from commit 5dbb34b56fc42d9c68bf6647910a437a2ad6b29e)
    Change-Id: I68f9ae8019879725df58f5da2c83bb699a548255
    Closes-Bug: #1382064

tags: added: in-stable-juno

Reviewed: https://review.openstack.org/195792
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=07d3d4401f34a19b23738296c189122eeef9150f
Submitter: Jenkins
Branch: stable/juno

commit 07d3d4401f34a19b23738296c189122eeef9150f
Author: Eugene Nikanorov <email address hidden>
Date: Mon May 11 01:34:35 2015 +0400

    Randomize tunnel id query to avoid contention

    When networks are created rapidly, neutron-servers compete
    for segmentation ids which creates too much contention and
    may lead to inability to choose available id in hardcoded amount
    of attempts (11)
    Randomize tunnel id selection so that condition is not hit.

    (cherry picked from commit 1d9fd2aec00cb85034e5a23cc1beac33c74e0110)
    Conflicts:
            neutron/plugins/ml2/drivers/helpers.py

    Related-Bug: #1382064
    Closes-Bug: #1454434
    Change-Id: I7068f90fe4927e6e693f8a62cb704213b2da2920

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers