Race condition puts ovs agent in resync

Bug #1499488 reported by Paul Ward
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Paul Ward
Nominated for Liberty by Matt Riedemann

Bug Description

The following code is from neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent.OVSNeutronAgent.treat_devices_added_or_updated():

        devices_details_list = (
            self.plugin_rpc.get_devices_details_list_and_failed_devices(
                self.context,
                devices,
                self.agent_id,
                self.conf.host))
        if devices_details_list.get('failed_devices'):
            #TODO(rossella_s) handle better the resync in next patches,
            # this is just to preserve the current behavior
            raise DeviceListRetrievalError(devices=devices)

        devices = devices_details_list.get('devices')
        vif_by_id = self.int_br.get_vifs_by_ids(
            [vif['device'] for vif in devices])

The race condition comes in between get_devices_details_list_and_failed_devices() and get_vifs_by_ids(). If a VM is deleted in that time, then the OVS port goes away and get_vifs_by_ids() raises an exception, which bumps us out to the exception handler in rpc_loop and puts us in resync, causing the next rpc_loop to rescan ALL ports. On a highly scaled system, this resync can take many minutes, in which time new plug requests all timeout.

get_vifs_by_ids() was added under this patch: https://review.openstack.org/#/c/186734/

The reason the exception is raised due to the missing port is because this new get_vifs_by_id method is not passing if_exists=True on the call to get_ports_attributes(). A grep within that file shows every other call to get_ports_attributes passing if_exists=True.

I believe the fix is to simply start passing if_exists=True in get_vifs_by_ids.

Paul Ward (wpward)
Changed in neutron:
assignee: nobody → Paul Ward (wpward)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/227517

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → High
status: In Progress → Confirmed
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/227517
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0a300a2277a583fe28b00db2571982928c752554
Submitter: Jenkins
Branch: master

commit 0a300a2277a583fe28b00db2571982928c752554
Author: Paul Ward <email address hidden>
Date: Thu Sep 24 14:52:28 2015 -0500

    Better tolerate deleted OVS ports in OVS agent

    This change will not force a resync in the case where a virtual machine is
    deleted, and therefore its OVS port deleted, in between the time an RPC
    call was made to get the devices and where we make the call to correlate
    those devices to vif ports.

    Change-Id: Ie55eb69ad7ee177f0cf8ee8fc7fc585fbd0d4a22
    Closes-Bug: #1499488

Changed in neutron:
status: In Progress → Fix Committed
Paul Ward (wpward)
tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/239507

Paul Ward (wpward)
tags: removed: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/239507
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0007ec3d614ff68aae23a1b07f6a27a57ef74efb
Submitter: Jenkins
Branch: stable/liberty

commit 0007ec3d614ff68aae23a1b07f6a27a57ef74efb
Author: Paul Ward <email address hidden>
Date: Thu Sep 24 14:52:28 2015 -0500

    Better tolerate deleted OVS ports in OVS agent

    This change will not force a resync in the case where a virtual machine is
    deleted, and therefore its OVS port deleted, in between the time an RPC
    call was made to get the devices and where we make the call to correlate
    those devices to vif ports.

    Change-Id: Ie55eb69ad7ee177f0cf8ee8fc7fc585fbd0d4a22
    Closes-Bug: #1499488
    (cherry picked from commit 0a300a2277a583fe28b00db2571982928c752554)

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/neutron 8.0.0.0b1

This issue was fixed in the openstack/neutron 8.0.0.0b1 development milestone.

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 7.0.1

This issue was fixed in the openstack/neutron 7.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.