neutron

NoNetworkFoundInMaximumAllowedAttempts with multipe API workers

Bug #1410854 reported by Attila Fazekas on 2015-01-14

This bug report is a duplicate of: Bug #1382064: Failure to allocate tunnel id when creating networks concurrently. Edit Remove

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	neutron	In Progress	Undecided	Attila Fazekas

Bug Description

When neutron configured with, in regular devstack job it Fails to create several networks in regular tempest run.

iniset /etc/neutron/neutron.conf DEFAULT api_workers 4

http://logs.openstack.org/82/140482/2/check/check-tempest-dsvm-neutron-full/95aea86/logs/screen-q-svc.txt.gz?#_2015-01-14_13_56_07_268

2015-01-14 13:56:07.267 2814 WARNING neutron.plugins.ml2.drivers.helpers [req-f6402b6d-de49-4675-a766-b45a6bc99061 None] Allocate vxlan segment from pool failed after 10 failed attempts
2015-01-14 13:56:07.268 2814 ERROR neutron.api.v2.resource [req-f6402b6d-de49-4675-a766-b45a6bc99061 None] create failed
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource Traceback (most recent call last):
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/resource.py", line 83, in resource
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource result = method(request=request, **args)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/base.py", line 451, in create
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource obj = obj_creator(request.context, **kwargs)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/plugin.py", line 502, in create_network
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource tenant_id)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/managers.py", line 161, in create_network_segments
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource segment = self.allocate_tenant_segment(session)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/managers.py", line 190, in allocate_tenant_segment
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource segment = driver.obj.allocate_tenant_segment(session)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/type_tunnel.py", line 150, in allocate_tenant_segment
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource alloc = self.allocate_partially_specified_segment(session)
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/plugins/ml2/drivers/helpers.py", line 144, in allocate_partially_specified_segment
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource raise exc.NoNetworkFoundInMaximumAllowedAttempts()
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource NoNetworkFoundInMaximumAllowedAttempts: Unable to create the network. No available network found in maximum allowed attempts.
2015-01-14 13:56:07.268 2814 TRACE neutron.api.v2.resource

vxlan_vni': 1008L is successfully allocated on behalf of pid=2813 , req-f3866173-7766-46fc-9dea-e5387be7190d.

pid=2814,req-f6402b6d-de49-4675-a766-b45a6bc99061 tries to allocate the same VNI for 10 times without success.

See original description

Attila Fazekas (afazekas) on 2015-01-14

description:

updated

Attila Fazekas (afazekas) on 2015-01-14

description:

updated

Revision history for this message

Attila Fazekas (afazekas) wrote on 2015-01-14:

https://github.com/openstack/neutron/blob/3f44c9e27874511f05e2338c10e836361776ed88/neutron/plugins/ml2/drivers/helpers.py#L113 issued on the same database snapshot therefore it will select the same value (Repeatable Read) .

It is possible to create new session for each select outside to the current session/transaction.

(IMHO the VNI allocation should not be given up unless the select does not founds any VNI, I would suggest to do more than 10 try, otherwise allocation issues could be triggered with higher load and worker number.)

Revision history for this message

Attila Fazekas (afazekas) wrote on 2015-01-14:

Consider using http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.with_for_update with the select, it allows the first reader to win the update.

Revision history for this message

Attila Fazekas (afazekas) wrote on 2015-01-15:

Both the 'FOR UPDATE' or separate session way MAY cause DBDeadLock on bulk allocation where neutron should restart the transaction at higher level, but neutron is not prepared for this.

Consider the following condition on bulk allocation, when at least two process tryies to allocate at least 2 segment in one transaction, while 3th thread releases a segment.
1. process1 locks on segment A
2. process3 releases segment B (Does a commit)
3. process2 locks on segment B (process2 might pick another segment)
4. process1 attempts to lock segment B, (Will wait)
5. process2 attempts to lock on segment A (DeadLock condition)

One of the process will succeeds to allocate both segments, the others
transaction will be rolled back because of deadlock.

The FOR UPDATE way would make the `select + try update` loop unnecessary.

May be simpler to 'iterate' over the first _full_ result set, probably this was the way how the method originally expected to work.

Revision history for this message

Attila Fazekas (afazekas) wrote on 2015-01-15:

The `iterate over the result set` only expected to work without deadlock on bulk allocation if all allocator tries to use the same allocation order, which cannot be guaranteed especially without an 'ORDER BY'.

For now I would suggest the 'FOR UPDATE' way and not pretend we never had to deal with DBDeadlock .
The 'FOR UPDATE' way expected to solves this issue, which has higher chance to happen than any kind of deadlock.
The code should be check later about does it needs to repeat transaction in rare cases and address the issues if its needed.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-15: Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/147540

Changed in neutron:
assignee:	nobody → Attila Fazekas (afazekas)
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-20: Change abandoned on neutron (master)

Change abandoned by afazekas (<email address hidden>) on branch: master
Review: https://review.openstack.org/147540
Reason: marun: the patch uses the mentioned method.