NoNetworkFoundInMaximumAllowedAttempts with multipe API workers

Bug #1410854 reported by Attila Fazekas on 2015-01-14
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Undecided
Attila Fazekas

Bug Description

When neutron configured with, in regular devstack job it Fails to create several networks in regular tempest run.

iniset /etc/neutron/neutron.conf DEFAULT api_workers 4

http://logs.openstack.org/82/140482/2/check/check-tempest-dsvm-neutron-full/95aea86/logs/screen-q-svc.txt.gz?#_2015-01-14_13_56_07_268

2015-01-14 13:56:07.267 2814 WARNING neutron.plugins.ml2.drivers.helpers [req-f6402b6d-de49-4675-a766-b45a6bc99061 None] Allocate vxlan segment from pool failed after 10 failed attempts
2015-01-14 13:56:07.268 2814 ERROR neutron.api.v2.resource [req-f6402b6d-de49-4675-a766-b45a6bc99061 None] create failed
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource Traceback (most recent call last):
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/resource.py", line 83, in resource
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource result = method(request=request, **args)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/base.py", line 451, in create
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/plugin.py", line 502, in create_network
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource tenant_id)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/managers.py", line 161, in create_network_segments
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource segment = self.allocate_tenant_segment(session)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/managers.py", line 190, in allocate_tenant_segment
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource segment = driver.obj.allocate_tenant_segment(session)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/type_tunnel.py", line 150, in allocate_tenant_segment
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource alloc = self.allocate_partially_specified_segment(session)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/helpers.py", line 144, in allocate_partially_specified_segment
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource raise exc.NoNetworkFoundInMaximumAllowedAttempts()
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource NoNetworkFoundInMaximumAllowedAttempts: Unable to create the network. No available network found in maximum allowed attempts.
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource

vxlan_vni': 1008L is successfully allocated on behalf of pid=2813 , req-f3866173-7766-46fc-9dea-e5387be7190d.

pid=2814,req-f6402b6d-de49-4675-a766-b45a6bc99061 tries to allocate the same VNI for 10 times without success.

description: updated
description: updated
Attila Fazekas (afazekas) wrote :

https://github.com/openstack/neutron/blob/3f44c9e27874511f05e2338c10e836361776ed88/neutron/plugins/ml2/drivers/helpers.py#L113 issued on the same database snapshot therefore it will select the same value (Repeatable Read) .

It is possible to create new session for each select outside to the current session/transaction.

(IMHO the VNI allocation should not be given up unless the select does not founds any VNI, I would suggest to do more than 10 try, otherwise allocation issues could be triggered with higher load and worker number.)

Attila Fazekas (afazekas) wrote :

Consider using http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.with_for_update with the select, it allows the first reader to win the update.

Attila Fazekas (afazekas) wrote :

Both the 'FOR UPDATE' or separate session way MAY cause DBDeadLock on bulk allocation where neutron should restart the transaction at higher level, but neutron is not prepared for this.

Consider the following condition on bulk allocation, when at least two process tryies to allocate at least 2 segment in one transaction, while 3th thread releases a segment.
 1. process1 locks on segment A
 2. process3 releases segment B (Does a commit)
 3. process2 locks on segment B (process2 might pick another segment)
 4. process1 attempts to lock segment B, (Will wait)
 5. process2 attempts to lock on segment A (DeadLock condition)

One of the process will succeeds to allocate both segments, the others
transaction will be rolled back because of deadlock.

The FOR UPDATE way would make the `select + try update` loop unnecessary.

May be simpler to 'iterate' over the first _full_ result set, probably this was the way how the method originally expected to work.

Attila Fazekas (afazekas) wrote :

The `iterate over the result set` only expected to work without deadlock on bulk allocation if all allocator tries to use the same allocation order, which cannot be guaranteed especially without an 'ORDER BY'.

For now I would suggest the 'FOR UPDATE' way and not pretend we never had to deal with DBDeadlock .
The 'FOR UPDATE' way expected to solves this issue, which has higher chance to happen than any kind of deadlock.
The code should be check later about does it needs to repeat transaction in rare cases and address the issues if its needed.

Fix proposed to branch: master
Review: https://review.openstack.org/147540

Changed in neutron:
assignee: nobody → Attila Fazekas (afazekas)
status: New → In Progress

Change abandoned by afazekas (<email address hidden>) on branch: master
Review: https://review.openstack.org/147540
Reason: marun: the patch uses the mentioned method.

Change abandoned by afazekas (<email address hidden>) on branch: master
Review: https://review.openstack.org/147540

Tanvir (tanvir-kekan) wrote :

When can we expect to get this fix in ?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers