[SRU] "Interface monitor is not active" can be observed at ovs-agent start

Bug #1584647 reported by Hong Hui Xiao
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Invalid
Undecided
Unassigned
Mitaka
Fix Released
High
Unassigned
neutron
Fix Released
Undecided
Hong Hui Xiao
neutron (Ubuntu)
Invalid
Undecided
Unassigned
Declined for Artful by Corey Bryant
Xenial
Fix Released
High
Unassigned

Bug Description

[Impact]

Requesting to backport to Mitaka since we are seeing this issue in Mitaka clouds (and fix landed in Newton) whereby some compute nodes fail to have their flows added to br-tun following restart of openvswitch-switch.

[Test Case]

* Deploy Openstack Mitaka with one compute host
* Create an instance with overlay network (gre)
* Make a note of flows added to br-tun (ovs-vsctl dump-flows br-tun)
* systemctl restart openvswitch-switch
* Check that flows are re-added to br-tun (compare with previous output)
* Ensure you do not see "Interface monitor is not active" in /var/log/neutron/neutron-openvswitch-agent

NOTE: the root cause of this issue is that ovsdb monitor async process that neutron-openvswitch-agent starts takes too long to start and is not active by the time the rpc_loop tries to poll for updates. It is hard to simulate this scenario and as such it is difficult to know whether it has happened and resolved by this patch. Nevertheless this patch is small and known to have resolved the issue for newer versions of Openstack.

[Regression Potential]

I can't think how this patch could cause a regression. The only possible difference could be that the rpc_loop might take longer to update flows on ovs restart but that in itself would indicate a wider system issue beyond the neutron service that would not constitute a regression.

---------------

I noticed this error message in neutron-ovs-agent log when start neutron-openvswitch-agent

ERROR neutron.agent.linux.ovsdb_monitor [req-a7c7a398-a13b-490e-adf8-c5afb24b4b9c None None] Interface monitor is not active.

ovs-agent will start ovsdb_monitor at [1], and first use it at [2]. There is no guarantee that ovsdb_monitor is ready at [2]. So, I can see the error when start neutron-openvswitch-agent.

We should block the start to wait for the process to be active, and then use it. Or else, the use of ovsdb_monitor will be meaningless.

[1] https://github.com/openstack/neutron/blob/6da27a78f42db00c91a747861eafde7edc6f1fa7/neutron/agent/linux/polling.py#L35

[2] https://github.com/openstack/neutron/blob/6da27a78f42db00c91a747861eafde7edc6f1fa7/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L1994

Hong Hui Xiao (xiaohhui)
Changed in neutron:
assignee: nobody → Hong Hui Xiao (xiaohhui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/319788

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/319788
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=95ff46722d195eb894e79b08c5a6eb13082cc799
Submitter: Jenkins
Branch: master

commit 95ff46722d195eb894e79b08c5a6eb13082cc799
Author: Hong Hui Xiao <email address hidden>
Date: Mon May 23 08:33:55 2016 +0000

    Wait for ovsdb_monitor to be active before use it

    There is a race between ovsdb_monitor becoming active and using
    ovsdb_monitor. Sometimes, code [1] will be hit at ovs-agent startup.

    The fix here will block the start of ovsdb_monitor, so that the
    following code to use the ovsdb_monitor will have ovsdb_monitor be
    active.

    [1] https://goo.gl/RJX4I5
    Closes-bug: #1584647

    Change-Id: I893a3b250339006f50aa003686fb95d7f2465edc

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

description: updated
summary: - "Interface monitor is not active" can be observed at ovs-agent start
+ [SRU] "Interface monitor is not active" can be observed at ovs-agent
+ start
tags: added: sts sts-sru-needed
description: updated
Changed in neutron (Ubuntu):
status: New → Invalid
Changed in cloud-archive:
status: New → Invalid
Changed in neutron (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → High
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Thanks Edward. I've uploaded your change to the xenial unapproved queue where it is awaiting SRU team review. Note that it is combined with LP: #1752838.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Does this patch has backport potential.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@swaminathan-vasudevan mitaka is eol in upstream openstack so can't be backported there but it is being backported to the Ubuntu Mitaka packages.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

I will be conditionally accepting this into xenial-proposed without artful having the fix for LP: #1752838 (as it's blocked by the previous neutron SRU). Normally we would wait, but seeing that it will migrate sooner or later anyway, I'd prefer to accept this and wait with the situation to clear out in -proposed instead.

Changed in neutron (Ubuntu Xenial):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-xenial
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Hong, or anyone else affected,

Accepted neutron into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:8.4.0-0ubuntu7.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Verified using test case from description and lgtm.

tags: added: verification-done-xenial
removed: verification-needed-xenial
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:8.4.0-0ubuntu7.3

---------------
neutron (2:8.4.0-0ubuntu7.3) xenial; urgency=medium

  [ Edward Hope-Morley ]
  * d/p/Wait-for-ovsdb_monitor-to-be-active-before-use-it.patch: backport
    fix to ensure ovsdb mon blocks on start until it is ready (LP: #1584647)

  [ Seyeong Kim ]
  * d/neutron-openvswitch-agent.service.in,
    d/neutron-openvswitch-agent.neutron-ovs-cleanup.service.in:
    Ensure neutron-ovs-cleanup runs after openvswitch-switch and
    neutron-openvswitch-agent runs after neutron-ovs-cleanup (LP: #1752838).

 -- Corey Bryant <email address hidden> Wed, 28 Mar 2018 11:42:05 -0400

Changed in neutron (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Hong, or anyone else affected,

Accepted neutron into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Apologies for the delay. Finally got this verified and lgtm.

tags: added: verification-mitaka-done
removed: verification-mitaka-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:8.4.0-0ubuntu7.3~cloud0
---------------

 neutron (2:8.4.0-0ubuntu7.3~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:8.4.0-0ubuntu7.3) xenial; urgency=medium
 .
   [ Edward Hope-Morley ]
   * d/p/Wait-for-ovsdb_monitor-to-be-active-before-use-it.patch: backport
     fix to ensure ovsdb mon blocks on start until it is ready (LP: #1584647)
 .
   [ Seyeong Kim ]
   * d/neutron-openvswitch-agent.service.in,
     d/neutron-openvswitch-agent.neutron-ovs-cleanup.service.in:
     Ensure neutron-ovs-cleanup runs after openvswitch-switch and
     neutron-openvswitch-agent runs after neutron-ovs-cleanup (LP: #1752838).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.