OVN controllers dead (XXX) after zed upgrade

Bug #2059119 reported by Mark Goddard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Medium
Unassigned

Bug Description

Performed a major upgrade from Yoga to Zed on a cloud running Kayobe + Kolla Ansible on Ubuntu Jammy 22.04 hosts with OVN networking.

After the upgrade, metadata is broken for existing instances and new instances fail to boot.

all OVN controllers are seen as dead by Neutron (they have XXX in the Alive column of openstack network agent list). They have a State of UP.

There are various warnings in the OVN controller logs about the Southbound database schema:

2024-03-20T12:13:02.422Z|00010|ovsdb_idl|WARN|Load_Balancer table in OVN_Southbound database lacks datapath_group column (database needs upgrade?)
2024-03-20T12:13:02.422Z|00011|ovsdb_idl|WARN|MAC_Binding table in OVN_Southbound database lacks timestamp column (database needs upgrade?)
2024-03-20T12:13:02.423Z|00012|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_chassis column (database needs upgrade?)
2024-03-20T12:13:02.423Z|00013|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks additional_encap column (database needs upgrade?)
2024-03-20T12:13:02.423Z|00014|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks port_security column (database needs upgrade?)
2024-03-20T12:13:02.423Z|00015|ovsdb_idl|WARN|Port_Binding table in OVN_Southbound database lacks requested_additional_chassis column (database needs upgrade?)

and one slightly more weird/scary one:

2024-03-20T12:13:04.126Z|00027|lflow|WARN|error parsing actions "ct_commit_nat;": Syntax error at `ct_commit_nat' expecting action.

Restarting the OVN controller services did not help.

Restarted one Southbound DB (ovn_sb_db) container and all agents came back to life. Metadata and booting an instance now works.

Some versions:

(ovn-controller)# ovn-controller --version
ovn-controller 22.09.1
Open vSwitch Library 3.0.3
OpenFlow versions 0x6:0x6
SB DB Schema 20.25.0

(ovn-sb-db)# ovn-sbctl --version
ovn-sbctl 22.09.1
Open vSwitch Library 3.0.3
DB Schema 20.25.0

It's not clear what happened, but it seems that for some reason the SB DB had not performed or completed its DB upgrade.

Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → Medium
Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

I have no idea what could be done better in kolla-ansible - but maybe we should do some DB consistency check if possible.

Revision history for this message
Michal Nasiadka (mnasiadka) wrote :
Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

TL;DR - use ovsb-tool check-cluster

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.