dhcp agent race with port update and network create

Bug #1659919 reported by Kevin Benton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Kevin Benton

Bug Description

Root cause analysis of http://logs.openstack.org/01/410501/5/gate/gate-tempest-dsvm-neutron-full-ubuntu-xenial/3f7b603/ race condition:

* shows that port for VM did not go to ACTIVE because DHCP agent didn't notify server that it setup the DHCP reservation.
* DHCP agent didn't setup the reservation because it didn't have the network in its cache yet when it received the port update but it had already started wiring the network in another thread
* the port didn't exist on the server when the network setup thread asked the server for all ports on the network

So fix is that port_update needs to acquire network lock even if network doesn't appear to be in cache.

Changed in neutron:
assignee: nobody → Kevin Benton (kevinbenton)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/426339

Changed in neutron:
status: New → In Progress
tags: added: ocata-rc-potential
Changed in neutron:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/426339
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=38de22bf2d4c0879a84db4fbc9fa030f181affc0
Submitter: Jenkins
Branch: master

commit 38de22bf2d4c0879a84db4fbc9fa030f181affc0
Author: Kevin Benton <email address hidden>
Date: Fri Jan 27 10:35:44 2017 -0800

    Always acquire network.id lock in dhcp port update

    Looking at the cache before aqcuiring a lock may cause the
    agent to mistakenly think the network doesn't exist when it
    is actually being wired in parallel.

    Always acquiring the network-based semaphore will ensure that
    the network isn't currently being setup in another coroutine.

    Closes-Bug: #1659919
    Change-Id: I99ae71e3c5b1cd91dca3f6c80b04d2ecb79de64f

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.0.0rc1

This issue was fixed in the openstack/neutron 10.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/452623

tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/452623
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=37de2f6c4c831a8f4515eee9938996936dd2776c
Submitter: Jenkins
Branch: stable/newton

commit 37de2f6c4c831a8f4515eee9938996936dd2776c
Author: Kevin Benton <email address hidden>
Date: Fri Nov 18 04:25:32 2016 -0700

    Lock in DHCP agent based on network_id

    All cache operations and dnsmasq process operations
    are scoped to a network ID so we can always safely
    perform concurrent actions on different network IDs.
    This patch adjusts the DHCP agent to lock based on
    network ID rather than having a global lock for every
    operation.

    sync_state calls are still protected with a reader/writer
    lock to ensure that when sync_state needs to run, all
    other operations are blocked.

    Related-Bug: #1548190
    Change-Id: I56010dc801d82be56f12e834c5164316872c2f8b
    (cherry picked from commit d1930cefd27448eefc373a229a26f8da25581983)

    Squashed this this commit since tests fail otherwise:

    Always acquire network.id lock in dhcp port update

    Looking at the cache before aqcuiring a lock may cause the
    agent to mistakenly think the network doesn't exist when it
    is actually being wired in parallel.

    Always acquiring the network-based semaphore will ensure that
    the network isn't currently being setup in another coroutine.

    Closes-Bug: #1659919
    Change-Id: I99ae71e3c5b1cd91dca3f6c80b04d2ecb79de64f
    (cherry picked from commit 38de22bf2d4c0879a84db4fbc9fa030f181affc0)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.4.0

This issue was fixed in the openstack/neutron 9.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.