fullstack: ovs-agents remove trunk bridges that don't belong to them

Bug #1687709 reported by Jakub Libosvar
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

If there are multiple ovs-agents instances running on a single node then trunk re-acts on all ovsdb events. That means if a port is removed from a trunk bridge then all agents will attempt to clear resources around given trunk bridge. This leads to race condition where foreign ovs-agents removes trunk bridge before the correct one. As foreigh ovs-agents have different integration bridge names, they don't clear up properly patch ports on br-int side.

Example of failure: http://logs.openstack.org/30/453330/5/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/b70b50c/testr_results.html.gz

e-r-q: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22still%20has%20following%20ports%20while%20some%20of%20them%20are%20patch%20ports%20for%20trunk%20that%20were%20supposed%20to%20be%20removed%5C%22%20AND%20tags%3Aconsole

33 hits in 7 days

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Still happening. I think Jakub's work to isolate ovs for each fullstack machine may be of help here.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Discussed today with jakub: a possibility would be to have the OVS agent monitor only his own br-int.

ovsdb-client monitor-cond can be used to only get events matching a certain conditions

This can't be done with a per-bridge condition on the Interface table, but can be done on the Bridge table:

ovsdb-client monitor-cond '[["name","==","br-int"]]' Bridge name,ports

# ovsdb-client monitor-cond '[["name","==","br-int"]]' Bridge name,ports
row action name ports
------------------------------------ ------- ------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
a86186e3-7c17-423e-9292-f2a18022ca89 initial br-int [244019ea-ff2e-43ee-91dc-57c8b3aff0d7, 3443782a-f2a3-4abc-9c67-b07f5e97f45c, 430bc324-2eb1-47cd-bd15-f37b1502431f, 6df07278-6b54-4def-a3b4-250108986f35, 7677f9a0-09d9-403f-9041-3436834b95c3, 88ae506a-ddbb-46b8-9875-cf4f248b1845, da14307e-b635-4efb-bf33-f96f6ba38f19, e4378cc4-b65d-434c-8134-335cfcfa90b2]

This gives events on added/removed ports, and information on port number, interface name, and external_ids could be gathered with an additional OVSDB request on the Interface table.

The drawback is to touch production code to solve a fullstack issue, but on the other hand, even in production, filtering to only see ports coming and going for specific bridges and not other bridges, is something that may make sense.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

After our discussion today, let me formulate a possibilities to have each agent receive only the events it needs to: add a parameter to SimpleInterfaceMonitor, only_bridge, that would be a bridge name allowing caller to receive events only about this bridge, which would default to None (receive events about all bridges, which is the current behavior).

OVS neutron agent would have a config option (defaulting to the old behavior) to filter to receive only events for the integration bridge, and would pass the integration bridge name to SimpleInterfaceMonitor (via get_polling_manager). Fullstack tests would set this option to true.

I see two possible ways to implement the filtering itself:
A - have ovsdb-client do the filtering (with monitor-cond) as proposed above, and for each event an additional ovsdb query will be needed to retrieve port number, interface name, and external_ids
B - keep the same ovsdb-client call that receives all events, but then for each event do an additional ovsdb call to identify the bridge for the port and drop the event if need be

A has the drawback of increasing load for production code, while be would only increase load for fullstack test (or production deployment which have a reason to filter).

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Discussed during PTG:
- the agent needs to monitor more than only br-int for trunk ports
- idea of adding a prefix to trunk bridges to work around the issue

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/504186

Changed in neutron:
assignee: Jakub Libosvar (libosvar) → Armando Migliaccio (armando-migliaccio)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/514586

Changed in neutron:
assignee: Armando Migliaccio (armando-migliaccio) → Slawek Kaplonski (slaweq)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/504186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/517598

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/517598
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=806cf71eb5af34359f8bc0c6b740752e53cf854f
Submitter: Zuul
Branch: master

commit 806cf71eb5af34359f8bc0c6b740752e53cf854f
Author: Sławek Kapłoński <email address hidden>
Date: Fri Nov 3 09:59:04 2017 +0000

    Fullstack: init trunk agent's driver only when necessary

    Trunk driver is not needed to be initialized when "trunk"
    service plugin is not enabled.
    On production environments it's not possible to base on
    "service_plugins" config option on L2 agent's side so this
    driver is initialized always.
    It cause problems on fullstack tests becasue there is race
    condition between different ovs agents which consumes events
    from Openvswitch monitor.
    On fullstack tests however we can assume that agent's and server's
    config are in sync so trunk driver can be initialized only if
    "trunk" service plugin is enabled on server side.

    Change-Id: I3ad8d6e7b8f103867ee277078d03f3a01c20ac0d
    Closes-Bug: #1687709

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/514586
Reason: Another patch was just merged: https://review.openstack.org/#/c/517598/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.0.0b2

This issue was fixed in the openstack/neutron 12.0.0.0b2 development milestone.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/584575

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/584576

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ocata)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/584576
Reason: It would require at least 2 or 3 different patches to be backported to Ocata. I don't think it's worth to do now

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/584575
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9b399af547842fc231f0016cc5ddb8d93ac3b100
Submitter: Zuul
Branch: stable/pike

commit 9b399af547842fc231f0016cc5ddb8d93ac3b100
Author: Sławek Kapłoński <email address hidden>
Date: Fri Nov 3 09:59:04 2017 +0000

    Fullstack: init trunk agent's driver only when necessary

    Trunk driver is not needed to be initialized when "trunk"
    service plugin is not enabled.
    On production environments it's not possible to base on
    "service_plugins" config option on L2 agent's side so this
    driver is initialized always.
    It cause problems on fullstack tests becasue there is race
    condition between different ovs agents which consumes events
    from Openvswitch monitor.
    On fullstack tests however we can assume that agent's and server's
    config are in sync so trunk driver can be initialized only if
    "trunk" service plugin is enabled on server side.

    Change-Id: I3ad8d6e7b8f103867ee277078d03f3a01c20ac0d
    Closes-Bug: #1687709
    (cherry picked from commit 806cf71eb5af34359f8bc0c6b740752e53cf854f)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.