OVS Neutron agent is marking port as dead before they are deleted

Bug #1493414 reported by Artur Korzeniewski
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Ramu Ramamurthy

Bug Description

The situation is happening on Liberty-3.

When trying to clear the gateway port and tenant network interface delete in router, the OVS agent is marking the port as dead instead of treat them as removed: security group removed and port_unbound

This is causing to left stale OVS flows in br-int, and it may affect the port_unbound() logic in ovs_neutron_agent.py.

The ovs_neutron_agent is in one iteration of rpc_loop processing the deleted port via process_deleted_ports() method, marking the qg- port as dead (ovs flow rule to drop the traffic) and in another iteration, the ovs_neutron_agent is processing the removed port by treat_devices_removed() method.

In first iteration, the port deleting is triggered by port_delete() method:
2015-09-04 14:16:20.337 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-e43234b1-633b-404d-92d0-0f844dadb586 admin 0f6c0469ea6e4d95a27782c46021243a] port_delete message processed for port 1c749258-74fb-498b-9a08-1fec6725a1cf from (pid=136030) port_delete /opt/openstack/neutron/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:410

and in second iteration, the device removed is triggered by ovsdb:
2015-09-04 14:16:20.848 DEBUG neutron.agent.linux.ovsdb_monitor [-] Output received from ovsdb monitor: {"data":[["bab86f35-d004-4df6-95c2-0f7432338edb","delete","qg-1c749258-74",49,["map",[["attached-mac","fa:16:3e:99:37:68"],["iface-id","1c749258-74fb-498b-9a08-1fec6725a1cf"],["iface-status","active"]]]]],"headings":["row","action","name","ofport","external_ids"]}
 from (pid=136030) _read_stdout /opt/openstack/neutron/neutron/agent/linux/ovsdb_monitor.py:50

Log from ovs neutron agent:
http://paste.openstack.org/show/445479/

Steps to reproduce:
1. Create router
2. Add tenant network interface to the router
3. Launch a VM
4. Add external network gateway to created router
5. Check the br-int for current port numbers
6. Remove external network gateway
7. Check the br-int for dead port flows (removed port qg-)
8. Remove the network interface from tenant network
9. Check the br-int for dead port flows.

Repeat the steps 4-9 few times to see if dead port flows will appear in br-int.

This is affecting the legacy, dvr and HA router.

Revision history for this message
John Schwarz (jschwarz) wrote :

I can confirm that this reproduced for me.

Changed in neutron:
status: New → Confirmed
Nandini (nandini-tata)
Changed in neutron:
assignee: nobody → Nandini (nandini-tata)
Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

We can reproduce this

Changed in neutron:
assignee: Nandini (nandini-tata) → Ramu Ramamurthy (ramu-ramamurthy)
Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

seen on kilo also

Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

Fix to be sent for review shortly...

When the port is set to dead, there is a timing condition where the port may be deleted prior to the "drop" flow being added to that port - causing the drop flow to become stale referring to an invalid port. This is handled by cleaning up stale flows.

A further cleanup is in port_dead() to check that cur_tag is valid prior to marking the port dead - because in some cases, cur_tag is None or [].

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/247844

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

the above is the first of a few patches to fix this

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/248908

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/247844
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=76cc53c611a639d915761d1fa3c879cda2ba3502
Submitter: Jenkins
Branch: master

commit 76cc53c611a639d915761d1fa3c879cda2ba3502
Author: Ramu Ramamurthy <email address hidden>
Date: Thu Nov 19 18:43:19 2015 -0500

    In port_dead, handle case when port already deleted

    db_get_val can return None if the port got deleted concurrently.
    In this case there is no need to mark it dead and add drop flow for it.

    Change-Id: I5ef9665770df3a9bbaf79049b219fadd73e20309
    Partial-Bug: #1493414

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/259509

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/259509
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=fa9fba2ab60df681e26f71e8fbaa0c6376ed967c
Submitter: Jenkins
Branch: stable/liberty

commit fa9fba2ab60df681e26f71e8fbaa0c6376ed967c
Author: Ramu Ramamurthy <email address hidden>
Date: Thu Nov 19 18:43:19 2015 -0500

    In port_dead, handle case when port already deleted

    db_get_val can return None if the port got deleted concurrently.
    In this case there is no need to mark it dead and add drop flow for it.

    Change-Id: I5ef9665770df3a9bbaf79049b219fadd73e20309
    Partial-Bug: #1493414
    (cherry picked from commit 76cc53c611a639d915761d1fa3c879cda2ba3502)

tags: added: in-stable-liberty
Jian Wen (wenjianhn)
Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/248908
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5289d9494984b7c95407ad2f9b761b2e647953b2
Submitter: Jenkins
Branch: master

commit 5289d9494984b7c95407ad2f9b761b2e647953b2
Author: Ramu Ramamurthy <email address hidden>
Date: Mon Nov 23 15:21:46 2015 -0500

    Remove stale ofport drop-rule upon port-delete

    When a port is deleted, that port is set to a dead-vlan, and
    an ofport drop-flow is added in port_dead().

    The ofport drop-flow gets removed only in some cases
    in _bind_devices() - depending on the timing of the
    concurrent port-deletion. In other cases, the drop-flow
    never gets removed, and such garbage drop-flow rules
    accumulate forever until the ovs-agent restarts.

    The fix is to use the function update_stale_ofport_rules which
    solves this problem of tracking stale ofport flows
    in deleted ports, but currently only applies only to
    prevent_arp_spoofing.

    Change-Id: I0d1dbe3918cc7d7b3d0cdc49d7b6ff85f9b02a17
    Closes-Bug: #1493414

Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/neutron 9.0.0.0b1

This issue was fixed in the openstack/neutron 9.0.0.0b1 development milestone.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
tags: removed: in-stable-liberty
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.