ovn db upgrade

Bug #2059721 reported by Vadim Kuznetsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Triaged
High
Unassigned
ovn (Ubuntu)
New
Undecided
Unassigned

Bug Description

After distro upgrade of the openstack-ansible 2023.1 site from Ubuntu 20.04 to Ubuntu 22.04 OVN cluster DB was not upgraded.

root@b-mgmt-neutron-ovn-northd-container-cb715707:/var/log/ovn# ovn-nbctl --version
ovn-nbctl 23.03.1
Open vSwitch Library 3.1.2
DB Schema 7.0.0

root@b-mgmt-neutron-ovn-northd-container-cb715707:/var/log/ovn# ovn-sbctl --version
ovn-sbctl 23.03.1
Open vSwitch Library 3.1.2
DB Schema 20.27.0

However

root@b-mgmt-neutron-ovn-northd-container-cb715707:/var/log/ovn# ovsdb-client get-schema-version unix:/var/run/ovn/ovnnb_db.sock OVN_Northbound
6.1.0
root@b-mgmt-neutron-ovn-northd-container-cb715707:/var/log/ovn# ovsdb-client get-schema-version unix:/var/run/ovn/ovnsb_db.sock OVN_Southbound
20.21.0

Restrarting ovn-central did not help.

ovn-northd.log:
2024-03-28T17:28:52.521Z|00053|ovsdb_idl|WARN|OVN_Southbound database lacks Chassis_Template_Var table (database needs upgrade?)
2024-03-28T17:28:52.521Z|00054|ovsdb_idl|WARN|Load_Balancer table in OVN_Southbound database lacks datapath_group column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00055|ovsdb_idl|WARN|MAC_Binding table in OVN_Southbound database lacks timestamp column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00056|ovsdb_idl|WARN|OVN_Southbound database lacks Mirror table (database needs upgrade?)
2024-03-28T17:28:52.521Z|00057|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_chassis column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00058|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_encap column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00059|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks mirror_rules column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00060|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks port_security column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00061|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks requested_additional_chassis column (database needs upgrade?)
2024-03-28T17:28:52.521Z|00062|ovsdb_idl|WARN|OVN_Southbound database lacks Static_MAC_Binding table (database needs upgrade?)

Tags: jammy ovn ubuntu
tags: added: ubuntu
tags: added: jammy ovn
Revision history for this message
Vadim Kuznetsov (vakuznet) wrote :

and tons of these every 5 sec
ovn-northd.log:
2024-03-28T18:13:16.208Z|00511|ovn_northd|INFO|OVNSB commit failed, force recompute next time.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hey,

Thanks for reporting that.

I believe OpenStack-Ansible indeed miss a required step for proper OVN upgrade on OS resetup.

I will check on the issue in a timely manner.

Changed in openstack-ansible:
importance: Undecided → High
Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hi,

So, I totally see issues in ordering of tasks for OVN upgrade and couple of missing handlers we obviously need to cover.
However, I was able to launch OVN cluster after upgrade.

Basically, these steps should be taken for successful upgrade:

1. Upgrade/restart all ovn-controllers
2. Upgrade ovn-northd. Once it's upgraded you need to restart ovn-northd service and ovn-central right after it. Once 2 out of 3 ovn-northd are upgraded, DB upgrade will be performed to the new schema.

So eventually, I've updated 1st ovn-northd container to 22.04 and I got same output as you did:
root@aio1-neutron-ovn-northd-container-519166b6:/# ovn-nbctl --version
ovn-nbctl 23.03.1
Open vSwitch Library 3.1.2
DB Schema 7.0.0
root@aio1-neutron-ovn-northd-container-519166b6:/# ovsdb-client get-schema-version unix:/var/run/ovn/ovnnb_db.sock
6.1.0
root@aio1-neutron-ovn-northd-container-519166b6:/#

Then, I've updated second container to 22.04 and restarted ovn-northd and ovn-central, and got new schema on the first one right away:
root@aio1-neutron-ovn-northd-container-519166b6:/# ovsdb-client get-schema-version unix:/var/run/ovn/ovnnb_db.sock
7.0.0
root@aio1-neutron-ovn-northd-container-519166b6:/#

I will work on role/playbooks to cover upgrade process ordering and found nits.

Changed in openstack-ansible:
status: New → Triaged
Revision history for this message
Vadim Kuznetsov (vakuznet) wrote (last edit ):

I upgraded compute nodes first (rebuild the from scratch with new OS one by one with VM live migrations) and controller second.
(I tried upgrading controller first and compute second before you updated distro upgrade guide. It resulted in longer outage.)
When I upgraded 2 mgmt nodes and shutdown 3rd one, I tried restarting ovn-central in different ways. Just restart on one node then restart second one. Or rolling stop and rolling start. Nothing.
I rebuild 3rd controller and did the same restarts. did not work.

Did you see any of this messages:
2024-03-28T18:13:16.208Z|00511|ovn_northd|INFO|OVNSB commit failed, force recompute next time.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

No, I do not.
That is basically the whole log I got on one of isntances: https://paste.openstack.org/show/brE4aA8VFptpmMkGn8EE/

Though keep in mind, that I tried that on dummy sandbox.

Also question - you've upgraded controllers or re-installed? As I assume you did rebuild ovn-northd containers afterwards?

I'm just trying to find a way reproducing that in any way. And I indeed do see couple of chicken-egg situations already - as you've mentioned ordering of compute/controller upgrade. As eventually, to successfully upgrade compute you'd need to have at least 1 upgraded repo container for building wheels, otherwise OpenStack infrastructure could be exposed to quite heavy load if you'd deploy XX amount of computes at the same time...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Actually, I do have `2024-03-29T11:34:33.886Z|00041|main|INFO|OVNSB commit failed, force recompute next time.` - but I do get it on ovn-controller (compute) node which was not upgraded yet.

Also I do not see that message right after service restart anymore:
https://paste.openstack.org/show/bJDQ7Gx5e9zmzv4zN7HQ/

Revision history for this message
Vadim Kuznetsov (vakuznet) wrote (last edit ):

I reinstalled host OS and rebuild it from scratch including all containers.
I reinstalled compute nodes one by one, not to overload OpenStack Infrastructure.
I'm using venv_wheel_build_enable=False to get around.
Alternative can be to reinstall and rebuild only infra (repo, galera, rabbitmq) on one controller, then rebuild compute nodes one by one with limit, then come back and finish first controller.
btw glusterfs on repo needs to be upgraded to 10 before rebuilding first repo server.
Also imo more services (not just keystone) do not support limit with recent releases.

Question: You said "you need to restart ovn-northd service and ovn-central right after it"
Why "right after it" is significant?

It seems to me I did restarted services "right after it".

I aslo restarted it after I completed upgrade of all controllers and the day after :)

After restarting ovn-northd I only see ovn-northd. 2 ovsdb-server are not started.
According ovn docs that the only thing is needed to finish ovn upgrade.
But you also restarted ovn-central.
Me too. otherwise ovsdb-servers will stay down.

Revision history for this message
Vadim Kuznetsov (vakuznet) wrote :

Im not sure what it means, but northd_internal_version has changed in nb db, but not in sb db:

ovn-nbctl get nb-global . options
{mac_prefix="a6:00:79", max_tunid="16711680", northd_internal_version="23.03.1-20.27.0-70.6", svc_monitor_mac="ea:c9:fc:91:88:7c"}

ovn-sbctl get sb-global . options
{lb_hairpin_use_ct_mark="true", mac_prefix="a6:00:79", max_tunid="16711680", northd_internal_version="22.03.3-20.21.0-62.4", svc_monitor_mac="ea:c9:fc:91:88:7c"}

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Nah, I guess I just meant - after it. I just spotted that DB schema was updated only when ovn-northd -> ovn-central are restarted. For me ovn-central did not start on it's own, but I cut couple of corners while trying to reproduce the issue. Eventually, I just upgraded a host rather then re-install it. So that's why it could be working in my test env, as database was present on the expected location during OVN startup, rather then it needs to fetch it and convert.

Revision history for this message
Vadim Kuznetsov (vakuznet) wrote :

Here is my explanation:
1. According to ovn docs https://docs.ovn.org/en/latest/intro/install/ovn-upgrades.html
"The only step required after upgrading the packages is to restart ovn-northd, which automatically restarts the databases and upgrades the database schema, as well.
if you’re using a Linux distribution with systemd:
$ sudo systemctl restart ovn-northd

which is not true in Ubuntu case
```
ExecStart=/usr/share/ovn/scripts/ovn-ctl start_northd --ovn-manage-ovsdb=no --no-monitor $OVN_CTL_OPTS
```
ovn-manage-ovsdb=no, so it will not restart the DBs.

2. DB restarts by ovn-ovsdb-server-nb.service and ovn-ovsdb-server-sb.service
run_nb_ovsdb and run_sb_ovsdb options in ovn-ctl
```
ExecStart=/usr/share/ovn/scripts/ovn-ctl run_nb_ovsdb $OVN_CTL_OPTS
```

3. This commit fixed Cluster OVN database upgrade https://github.com/ovn-org/ovn/commit/67e2f386cc838d0b0f9b4b5da7fe611e1113b70c
in this bug https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1907081

This is the code that is responsible for the upgrade
https://github.com/ovn-org/ovn/blob/5c2d311b8b7b4d5c3a619de72be6a433aa4c44db/utilities/ovn-ctl#L293
```
if test X$detach = Xno && test $mode = cluster && test -z "$cluster_remote_addr" ; then
```
First two checks are true.
Third check means upgrade_cluster (L313) will run on leader node, i.e --db-nb-cluster-remote-addr= and --db-sb-cluster-remote-addr= is not set.

4. During rolling upgrades of controller nodes, the leader will likely move.
OVS_CTL_OPTS generated from this template
https://github.com/openstack/openstack-ansible-os_neutron/blob/stable/2023.1/templates/ovn-northd-opts.j2
but only when new node joins the cluster https://github.com/openstack/openstack-ansible-os_neutron/blob/stable/2023.1/tasks/providers/ovn_cluster_setup.yml#L105
so after upgrade all 3 nodes will have --db-nb-cluster-remote-addr and --db-sb-cluster-remote-addr set to the some value that was leader node at that time.

What we need is --db-nb-cluster-remote-addr= not set on leader node, so during ovn-central restart on leader node upgrade_cluster will run.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.