Clustered OVN database is not upgraded on package upgrade

Bug #1907081 reported by Frode Nordahl on 2020-12-07
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
charm-ovn-central
Medium
Frode Nordahl
ovn (Ubuntu)
Status tracked in Hirsute
Focal
Medium
Frode Nordahl
Groovy
Medium
Frode Nordahl
Hirsute
Medium
Unassigned

Bug Description

[Impact]
On upgrade of the OVN packages it may be necessary to perform a upgrade to the Northbound and Southbound databases.

Failure to do so may lead to loss of connectivity between participating nodes as the software components will attempt to make use of columns that are not available in the database.

The upgrade process has been performed automatically by the upstream init script by default since inception, both for a local and clustered setup. But as discussed below recent changes has inadvertently omitted this behavior for clustered databases.

[Test Case]

Non-clustered scenario as reference test:
Install the ovn-central package in a container using the in-release focal package and start the database and ovn-northd services.

Upgrade the container to the OVN packages from in-release Groovy and observe the package performing the database upgrade and subsequently ovn-northd service not complaining about missing columns in the database.

Clustered scenario:
Install the ovn-central charm across three containers and necessary dependencies. Perform package upgrade as outlined above and compare how in-relase and proposed packages behave.

[Regression Potential]
As we are restoring the intended behavior the regression potential is minimal.

[Original Bug Report]
In the systemd service we make use of the `ovn-ctl` script `run_nb_ovsdb` and `run_sb_ovsdb` sub-commands introduced in [0]. These sub-commands fit nicely with systemd's expectations of modern daemons to no longer detachand run in the background.

However, the change in [0] has the side effect of disabling automatic upgrading of clustered databases. Previously this would have been done on every startup [1].

A recent commit to master [2] addresses this and uses the combination of presence of `--db-*-cluster-local-addr` and non-presence of the `--db-*-cluster-remote-addr` to determine if the upgrade should be run.

We should backport [2] to our supported OVN packages to prepare for supporting upgrades that require database schema changes. We may also need to change the behavior of the ovn-central charm to not set the `--db-*-cluster-remote-addr` argument on the leader unit.

0: https://github.com/ovn-org/ovn/commit/6444059b5f9444ce06634794d275257f945a6ce5
1: https://github.com/ovn-org/ovn/blob/5c2d311b8b7b4d5c3a619de72be6a433aa4c44db/utilities/ovn-ctl#L312-L314
2: https://github.com/ovn-org/ovn/commit/67e2f386cc838d0b0f9b4b5da7fe611e1113b70c

Related branches

Frode Nordahl (fnordahl) on 2020-12-07
Changed in ovn (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in charm-ovn-central:
status: New → Triaged
importance: Undecided → Medium
Frode Nordahl (fnordahl) on 2020-12-07
Changed in charm-ovn-central:
status: Triaged → In Progress
assignee: nobody → Frode Nordahl (fnordahl)
summary: - OVN database is not upgraded on package upgrade
+ Clustered OVN database is not upgraded on package upgrade
Frode Nordahl (fnordahl) on 2020-12-07
Changed in ovn (Ubuntu):
assignee: nobody → Frode Nordahl (fnordahl)
Frode Nordahl (fnordahl) on 2020-12-11
Changed in ovn (Ubuntu):
status: Triaged → In Progress
James Page (james-page) on 2021-01-12
description: updated
Frode Nordahl (fnordahl) on 2021-01-12
description: updated
Frode Nordahl (fnordahl) on 2021-01-12
description: updated
Frode Nordahl (fnordahl) wrote :
Changed in charm-ovn-central:
status: In Progress → Fix Committed
milestone: none → 21.01
Changed in ovn (Ubuntu Hirsute):
status: In Progress → Fix Released
assignee: Frode Nordahl (fnordahl) → nobody
Changed in ovn (Ubuntu Groovy):
status: New → In Progress
Changed in ovn (Ubuntu Focal):
status: New → In Progress
Changed in ovn (Ubuntu Groovy):
importance: Undecided → Medium
Changed in ovn (Ubuntu Focal):
importance: Undecided → Medium
Changed in ovn (Ubuntu Groovy):
assignee: nobody → Frode Nordahl (fnordahl)
Changed in ovn (Ubuntu Focal):
assignee: nobody → Frode Nordahl (fnordahl)

Hello Frode, or anyone else affected,

Accepted ovn into groovy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ovn/20.06.2-0ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-groovy to verification-done-groovy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-groovy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ovn (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-groovy
Brian Murray (brian-murray) wrote :

Hello Frode, or anyone else affected,

Accepted ovn into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ovn/20.03.1-0ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ovn (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
David Ames (thedac) on 2021-02-10
Changed in charm-ovn-central:
status: Fix Committed → Fix Released
Frode Nordahl (fnordahl) wrote :

On Focal OVN central unit used to initialize a cluster (which means the db-*-clsuter-remote-addr parameters are set to blank):
# grep remote-addr= /etc/default/ovn-central
    --db-nb-cluster-remote-addr= \
    --db-sb-cluster-remote-addr= \

I can then install the packages from -proposed:
# apt -y install ovn-common ovn-central

I can then confirm that the cluster is still healthy:
# ovn-appctl -t /var/run/ovn/ovnsb_db.ctl cluster/status OVN_Southbound
cd89
Name: OVN_Southbound
Cluster ID: d623 (d62321da-8009-4798-bd0a-171e5678a790)
Server ID: cd89 (cd8983cc-e19a-45ee-9247-34e130927eb6)
Address: ssl:10.247.39.51:6644
Status: cluster member
Role: follower
Term: 5
Leader: b941
Vote: unknown

Election timer: 4000
Log: [2, 7533]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->b941 ->8c5f <-b941 <-8c5f
Servers:
    cd89 (cd89 at ssl:10.247.39.51:6644) (self)
    b941 (b941 at ssl:10.247.39.127:6644)
    8c5f (8c5f at ssl:10.247.39.149:6644)

Since the focal package currently does not contain any schema changes we cannot do further checks as to the upgrade procedure actually working for Focal. We will do that as part of validating Groovy.

tags: added: verification-done-focal
removed: verification-needed-focal
Frode Nordahl (fnordahl) wrote :

With the system used to validate Focal in #4 I performed a upgrade to Groovy. We can then immediately verify that a unchanged Groovy package exhibits the problem:
2021-02-15T17:08:14.584Z|00014|ovsdb_idl|WARN|Forwarding_Group table in OVN_Northbound database lacks external_ids column (database needs upgrade?)
2021-02-15T17:08:14.584Z|00015|ovsdb_idl|WARN|Load_Balancer table in OVN_Northbound database lacks selection_fields column (database needs upgrade?)
2021-02-15T17:08:14.584Z|00016|ovsdb_idl|WARN|Logical_Router_Policy table in OVN_Northbound database lacks external_ids column (database needs upgrade?)
2021-02-15T17:08:14.584Z|00017|ovsdb_idl|WARN|Logical_Router_Port table in OVN_Northbound database lacks ipv6_prefix column (database needs upgrade?)
2021-02-15T17:08:14.584Z|00018|ovsdb_idl|WARN|NAT table in OVN_Northbound database lacks external_port_range column (database needs upgrade?)

After installing the Groovy package from -proposed we can confirm the schema warnings are gone:
2021-02-15T17:11:32.828Z|00010|reconnect|INFO|ssl:10.247.39.51:6641: connected
2021-02-15T17:11:32.829Z|00011|ovsdb_idl|INFO|ssl:10.247.39.51:6641: clustered database server is not cluster leader; trying another server
2021-02-15T17:11:32.830Z|00012|ovsdb_idl|INFO|ssl:10.247.39.149:16642: clustered database server is not cluster leader; trying another server
2021-02-15T17:11:32.830Z|00013|reconnect|INFO|ssl:10.247.39.51:6641: connection attempt timed out
2021-02-15T17:11:32.830Z|00014|reconnect|INFO|ssl:10.247.39.149:16642: connection attempt timed out
2021-02-15T17:11:32.830Z|00015|reconnect|INFO|ssl:10.247.39.127:6641: connecting...
2021-02-15T17:11:32.830Z|00016|reconnect|INFO|ssl:10.247.39.127:16642: connecting...
2021-02-15T17:11:32.834Z|00017|reconnect|INFO|ssl:10.247.39.127:6641: connected
2021-02-15T17:11:32.836Z|00018|reconnect|INFO|ssl:10.247.39.127:16642: connected
2021-02-15T17:11:32.836Z|00019|ovn_northd|INFO|ovn-northd lock lost. This ovn-northd instance is now on standby.
2021-02-15T17:11:32.839Z|00020|ovn_northd|INFO|ovn-northd lock acquired. This ovn-northd instance is now active.

tags: added: verification-done verification-done-groovy
removed: verification-needed verification-needed-groovy
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ovn - 20.06.2-0ubuntu1.2

---------------
ovn (20.06.2-0ubuntu1.2) groovy; urgency=medium

  * d/p/ovn-ctl-cluster-db-upgrades.patch: Cherry pick fix for upgrading
    database schema of clustered databases on package upgrade (LP: #1907081)
  * d/p/ovn-ofctrl-predictable-resolution-conflicting-flow-actions-*: Cherry
    pick fixes for predictable resolution for conflicting flow actions.
    (LP: #1906922)

 -- Frode Nordahl <email address hidden> Tue, 12 Jan 2021 11:45:12 +0000

Changed in ovn (Ubuntu Groovy):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for ovn has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ovn - 20.03.1-0ubuntu1.2

---------------
ovn (20.03.1-0ubuntu1.2) focal; urgency=medium

  * d/p/ovn-northd-revert-manage-arp-process-locally-dvr.patch: Cherry pick
    fix for incorrect ARP processing with DVR enabled (LP: #1905933).
  * d/p/ovn-ctl-cluster-db-upgrades.patch: Cherry pick fix for upgrading
    database schema of clustered databases on package upgrade (LP: #1907081)
  * d/p/ovn-ofctrl-predictable-resolution-conflicting-flow-actions-*: Cherry
    pick fixes for predictable resolution for conflicting flow actions.
    (LP: #1906922)

 -- Frode Nordahl <email address hidden> Tue, 12 Jan 2021 11:47:18 +0000

Changed in ovn (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers