neutron-openvswitch-agent interface monitor does not work if ovsdb-client generates warnings (ovs 2.10)

Bug #1788865 reported by Bernard Cafarelli
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Bernard Cafarelli

Bug Description

This was found while testing with ovs 2.10

openvswitch has all drivers built-in, and unfortunately Mellanox needs extra libs, so the driver can't be initialized if you miss those libs. A visible result if the system does not have these extra libs is a warning on stderr when calling ovsdb-client.
A typical call as done by OvsdbMonitor is [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]
It will result in
PMD: net_mlx5: cannot load glue library: libibverbs.so.1: cannot open shared object file: No such file or directory # on stderr
[...]
{"data":[...]} # Proper JSON output on stdout

But OvsdbMonitor is an AsyncProcess(die_on_error=True), so if any stderr ouput is found, the process is killed. With the libibverbs warning, that basically gives a non-working agent

There are possible workarounds and fixes on the ovs side of course, but the agent should be more robust to this kind of events (stderr is not always fatal).

Initial fix ideas:
* Disable die_on_error in OvsdbMonitor, update sub-classes process_events() to filter out non JSON output. Log error lines in debug or similar (as it may be quite verbose in this ovs 2.10 warning case). This is a short-term fix, but we may miss actual errors, and slower reactions to them (until we hit timeout)
* Update the OvsdbMonitor/AsyncProcess logic to check process return code. This allows to ignore/log in a low level stderr output and rely on process reporting success. But it is a bigger change and is still vulnerable to CLI changes
* Use native ovsdb implementation. No more subprocess and vulnerability to CLI changes, but a bit longer-term solution.

Original downstream bug with some additional info and workarounds: https://bugzilla.redhat.com/show_bug.cgi?id=1619387

Hongbin Lu (hongbin.lu)
Changed in neutron:
status: New → Confirmed
tags: added: ovs
Revision history for this message
Hongbin Lu (hongbin.lu) wrote :

As an initial screening, this bug is confirmed, but needs Neutron Lieutenants to further verify its validity.

Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

Further testing: this does not happen with vanilla openvswitch, you need to build it with dpdk parts (as we do in our packaging).

An easier reproducer is to move ovsdb-client to ovsdb-client.orig, and use a wrapper script in its place:
#!/bin/sh

BIN=$(basename "$0")
(>&2 echo "Output some warning")
"${BIN}".orig "${@}"

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/596717

Changed in neutron:
assignee: nobody → Bernard Cafarelli (bcafarel)
status: Confirmed → In Progress
Hongbin Lu (hongbin.lu)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

The suggested fix is just to fix the ovsdb-client output, it does look like we can switch to native for ovsdb monitor commands but I guess this should be another bug/enhancement

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/596717
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49
Submitter: Zuul
Branch: master

commit f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49
Author: Bernard Cafarelli <email address hidden>
Date: Mon Aug 27 14:37:15 2018 +0200

    ovsdb monitor: do not die on ovsdb-client stderr output

    That process may generate stderr output (ovs 2.10 with dpdk support will
    log about missing optional libraries for example), in which case the
    agent will loop forever respawning the ovsdb-client processes.

    AsyncProcess already handles processes exiting uncleanly, and logs
    stderr output with log_output=True (which is the case for OvsdbMonitor).

    As the monitors work on stdout output, disabling die_on_error is enough
    to make them work with this behaviour.

    Change-Id: I8f2e5b93b9c16f9b288046911b5aeb4938845233
    Closes-Bug: #1788865

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/603046

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/603047

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/603048

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/603047
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c704d0d9fab34b05e48afee7718c30c2d42d730c
Submitter: Zuul
Branch: stable/queens

commit c704d0d9fab34b05e48afee7718c30c2d42d730c
Author: Bernard Cafarelli <email address hidden>
Date: Mon Aug 27 14:37:15 2018 +0200

    ovsdb monitor: do not die on ovsdb-client stderr output

    That process may generate stderr output (ovs 2.10 with dpdk support will
    log about missing optional libraries for example), in which case the
    agent will loop forever respawning the ovsdb-client processes.

    AsyncProcess already handles processes exiting uncleanly, and logs
    stderr output with log_output=True (which is the case for OvsdbMonitor).

    As the monitors work on stdout output, disabling die_on_error is enough
    to make them work with this behaviour.

    Change-Id: I8f2e5b93b9c16f9b288046911b5aeb4938845233
    Closes-Bug: #1788865
    (cherry picked from commit f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/603046
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5bd182ecf6d0d365b4eb3c86608d285918b96793
Submitter: Zuul
Branch: stable/rocky

commit 5bd182ecf6d0d365b4eb3c86608d285918b96793
Author: Bernard Cafarelli <email address hidden>
Date: Mon Aug 27 14:37:15 2018 +0200

    ovsdb monitor: do not die on ovsdb-client stderr output

    That process may generate stderr output (ovs 2.10 with dpdk support will
    log about missing optional libraries for example), in which case the
    agent will loop forever respawning the ovsdb-client processes.

    AsyncProcess already handles processes exiting uncleanly, and logs
    stderr output with log_output=True (which is the case for OvsdbMonitor).

    As the monitors work on stdout output, disabling die_on_error is enough
    to make them work with this behaviour.

    Change-Id: I8f2e5b93b9c16f9b288046911b5aeb4938845233
    Closes-Bug: #1788865
    (cherry picked from commit f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/603048
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=584338713f47316857b0e3027e315662ae5df181
Submitter: Zuul
Branch: stable/pike

commit 584338713f47316857b0e3027e315662ae5df181
Author: Bernard Cafarelli <email address hidden>
Date: Mon Aug 27 14:37:15 2018 +0200

    ovsdb monitor: do not die on ovsdb-client stderr output

    That process may generate stderr output (ovs 2.10 with dpdk support will
    log about missing optional libraries for example), in which case the
    agent will loop forever respawning the ovsdb-client processes.

    AsyncProcess already handles processes exiting uncleanly, and logs
    stderr output with log_output=True (which is the case for OvsdbMonitor).

    As the monitors work on stdout output, disabling die_on_error is enough
    to make them work with this behaviour.

    Change-Id: I8f2e5b93b9c16f9b288046911b5aeb4938845233
    Closes-Bug: #1788865
    (cherry picked from commit f6d98a747b03e4da5109b2ee0e3c1bd7e88aee49)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.2

This issue was fixed in the openstack/neutron 13.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.5

This issue was fixed in the openstack/neutron 12.0.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b1

This issue was fixed in the openstack/neutron 14.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.