ovn-controller: Disable ofctrl probe by default

Bug #1899369 reported by Frode Nordahl on 2020-10-11
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Status tracked in Victoria
Ussuri
Undecided
Unassigned
Victoria
Undecided
Unassigned
ovn (Ubuntu)
Status tracked in Hirsute
Focal
High
Unassigned
Groovy
High
Unassigned
Hirsute
High
Unassigned

Bug Description

[Impact]
Service/host restart or upgrade of the ovn-host package may render a host participating in a OVN network unusable as the ovn-controller process fails to complete programming of the local Open vSwitch switch flows.

[Test Case]
The issue was discovered when migrating a 3-node OpenStack cloud with 1000 instances deployed in our test lab. A test case could be to repeat that setup.

[Regression Potential]
None, the change of behavior was introduced upstream in [0] and later reversed in [1]. Keeping an idle probe for a unix socket type connection is clearly unnecessary.

[Original Bug Report]
A change [0] prior to the release of OVN v20.03.0 introduced a change of behavior where the inactivity probe for the ofctrl connection defaults to 5 seconds. Since this normally is a unix socket the default was not to have a inactivity probe at all.

On a busy system a inactivity probe of 5 seconds is not enough for the OVN Controller to complete programming of the switch.

The change of behavior was corrected in [1] and I think it would be beneficial if Ubuntu backported this fix to the OVN package rather than having charms and/or end users work around the issue by manually configuring the timeout through the `external-ids:ovn-openflow-probe-interval` key in the Open_vSwitch table.

Symptoms of this problem is that a OVN controller is either unable to do initial programming of a switch for a host with many ports and flows or that updates are lost on a functional system. The following will be printed in the log:

2020-10-11T18:56:09.355Z|30186|rconn|ERR|unix:/var/run/openvswitch/br-int.mgmt: no response to inactivity probe after 5 seconds, disconnecting

0: https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
1: https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783

Related branches

Frode Nordahl (fnordahl) on 2020-10-11
description: updated
Frode Nordahl (fnordahl) wrote :

The fix is in 20.09 and has already been backported upstream to 20.06 and 20.03:

branch-20.03: https://github.com/ovn-org/ovn/commit/028d6db38ff56018ba40b3abb3da94ba7a724ffa
branch-20.06: https://github.com/ovn-org/ovn/commit/8cd56feadbc8644ece036784f78dd9be289c9fe9

Changed in ovn (Ubuntu):
status: New → Triaged
importance: Undecided → High
James Page (james-page) on 2020-11-09
description: updated
Changed in ovn (Ubuntu Hirsute):
status: Triaged → Fix Released
Changed in ovn (Ubuntu Groovy):
status: New → Triaged
Changed in ovn (Ubuntu Focal):
status: New → Triaged
importance: Undecided → High
Changed in ovn (Ubuntu Groovy):
importance: Undecided → High
Frode Nordahl (fnordahl) on 2020-11-10
description: updated

Hello Frode, or anyone else affected,

Accepted ovn into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ovn/20.06.2-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ovn (Ubuntu Groovy):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-groovy
Changed in ovn (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed-focal
Brian Murray (brian-murray) wrote :

Hello Frode, or anyone else affected,

Accepted ovn into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ovn/20.03.1-0ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Frode Nordahl (fnordahl) wrote :

Verification done for Focal. Removed the workaround provided by the `external-ids:ovn-openflow-probe-interval` key in the Open_vSwitch table and installed the proposed package on a heavily loaded lab cluster and confirmed continued operation with the new package.

tags: added: verification-done-focal
removed: verification-needed-focal

The verification of the Stable Release Update for ovn has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ovn - 20.03.1-0ubuntu1.1

---------------
ovn (20.03.1-0ubuntu1.1) focal; urgency=medium

  * d/p/ovn-controller-ofctrl-probe-interval.patch: Cherry pick
    fix to disable ofctrl probe by default (LP: #1899369).

 -- Frode Nordahl <email address hidden> Fri, 06 Nov 2020 08:21:03 +0000

Changed in ovn (Ubuntu Focal):
status: Fix Committed → Fix Released
Frode Nordahl (fnordahl) wrote :

Have completed verification on Groovy too.

tags: added: verification-done verification-done-groovy
removed: verification-needed verification-needed-groovy
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ovn - 20.06.2-0ubuntu1.1

---------------
ovn (20.06.2-0ubuntu1.1) groovy; urgency=medium

  * d/p/ovn-controller-ofctrl-probe-interval.patch: Cherry pick
    fix to disable ofctrl probe by default (LP: #1899369).

 -- Frode Nordahl <email address hidden> Fri, 06 Nov 2020 08:21:03 +0000

Changed in ovn (Ubuntu Groovy):
status: Fix Committed → Fix Released
James Page (james-page) on 2020-12-02
Changed in cloud-archive:
status: Invalid → Fix Committed
Corey Bryant (corey.bryant) wrote :

The verification of the Stable Release Update for ovn has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package ovn - 20.03.1-0ubuntu1.1~cloud0
---------------

 ovn (20.03.1-0ubuntu1.1~cloud0) bionic-ussuri; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 ovn (20.03.1-0ubuntu1.1) focal; urgency=medium
 .
   * d/p/ovn-controller-ofctrl-probe-interval.patch: Cherry pick
     fix to disable ofctrl probe by default (LP: #1899369).
 .
 ovn (20.03.1-0ubuntu1) focal; urgency=medium
 .
   * New upstream point release (LP: #1897248).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers