Possible race condition when port unplugged from ovs

Bug #1992109 reported by Arnaud Morin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Arnaud Morin

Bug Description

A possible race condition can occur when nova unplug a port on integration bridge (when shelving an instance e.g.)

On such action, openvswitch is sending 2 events to neutron:

2022-10-07 01:00:31.790 214734 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["650d47f8-05c2-41ed-aaa2-701432203f49","old",null,47,null],["","new","tap216e7eb3-bc",-1,["map",[["attached-mac","fa:16:3e:54:bc:55"],["iface-id","216e7eb3-bc4e-44d4-a743-d628e9187924"],["iface-status","active"],["vm-id","e28807f6-826a-49b4-84ee-223be559885e"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /opt/openstack/neutron/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262
2022-10-07 01:00:32.179 214734 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: {"data":[["650d47f8-05c2-41ed-aaa2-701432203f49","delete","tap216e7eb3-bc",-1,["map",[["attached-mac","fa:16:3e:54:bc:55"],["iface-id","216e7eb3-bc4e-44d4-a743-d628e9187924"],["iface-status","active"],["vm-id","e28807f6-826a-49b4-84ee-223be559885e"]]]]],"headings":["row","action","name","ofport","external_ids"]} _read_stdout /opt/openstack/neutron/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262

Now imagine that the second event is delayed, so neutron iteration will consider the first event as a port update and will:
- check the ofport --> -1
- put the port in skipped_devices
- put the port DOWN in DB through RPC call
- remove the port from "current" ports
- do nothing else, so the port is still configured: openflow rules, etc. stays

The on next iteration, if the "delete" event is received, the agent will:
- try to figure out if this port is configured by looking in "current"
- it's not so it does nothing

As a result, the port stays configured on the compute. Some openflow rules are left over.

Note that I am running neutron Stein with openvswitch 2.11.4.

I also check that the 2 events are received on an neutron victoria with openvswitch 2.15.0.

Note also that the race condition is very rare and difficult to reproduce, because the port needs to be removed from br-int, but still in ovs db with of_port=-1.

Revision history for this message
Arnaud Morin (arnaud-morin) wrote :

This can also happen on port detach (I reproduced it when attaching/detaching in a loop)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/860649

Changed in neutron:
status: New → In Progress
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Hi,

Thanks Arnaud for reporting the problem and submitting a patch. I am assigning this LP to you since you are already working on it.

Changed in neutron:
importance: Undecided → Medium
assignee: nobody → Arnaud Morin (arnaud-morin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/860649
Committed: https://opendev.org/openstack/neutron/commit/f22aa5dfddae9bd31fbb4fca8aed176e9913d34a
Submitter: "Zuul (22348)"
Branch: master

commit f22aa5dfddae9bd31fbb4fca8aed176e9913d34a
Author: Arnaud Morin <email address hidden>
Date: Fri Oct 7 11:05:17 2022 +0200

    Discard port with ofport -1 in _get_ofport_moves

    When libvirt (nova) detach a port on OVS bridge, two events are sent:
    * one event with 2 actions "old" and "new": a change on ofport (from a
      regular value to -1)
    * a second event with action "delete"

    If, for some reason, the second event is delayed, the rpc_loop iteration
    will consider this port as "updated" instead of "deleted".
    But, because ofport == -1, the port update will be discarded, and
    finally removed from port_info["current"].

    As a result, on next iteration, the deletion wont be performed.

    Most of the time, we endup with some leftovers (like openflow rules,
    etc.)

    The purpose of this patch is very simple, when looping over ports in
    _get_ofport_moves, we will discards the ports that have ofport == -1, so
    the port will not be considered as updated and next iteration will be
    able to delete it correctly.

    Closes-Bug: #1992109

    Change-Id: Ib4a7183867e1b21810b6915a475a234278bf884c
    Signed-off-by: Arnaud Morin <email address hidden>

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.0.0.0rc1

This issue was fixed in the openstack/neutron 22.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.