ovn-controller crashing with "failed in flood_remove_flows_for_sb_uuid()"

Bug #1959441 reported by Stefan Lupsa
32
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
ovn (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

ovn-controller crashes and restarts under a Kolla Openstack deployment.
Link to RedHat bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1928012

2022-01-28T13:50:11.460Z|00076|util|EMER|controller/ofctrl.c:1198: assertion ovs_list_is_empty(&f->list_node) failed in flood_remove_flows_for_sb_uuid()
2022-01-28T13:50:13.050Z|00001|vlog|INFO|opened log file /var/log/kolla/openvswitch/ovn-controller.log

This problem seems to be addressed on the ovn repository with later releases:
https://github.com/ovn-org/ovn/commit/858d1dd716db1a1e664a7c1737fd34f04fcbda5e added in v21.03.0
https://github.com/ovn-org/ovn/commit/c6c61b4e3462fb5201a61a226c2acaf6f4caf917 added in v21.06.0

Ubuntu version 20.04.3 LTS
Packages:
ovn-host 20.12.0-0ubuntu2~cloud0
ovn-common 20.12.0-0ubuntu2~cloud0

Latest Candidate: 20.12.0-0ubuntu3~cloud0 from http://ubuntu-cloud.archive.canonical.com/ubuntu focal-updates/wallaby/main amd64 Packages does not have the patches backported in the changelog.

Is there any plan to include/backport this for the next update to the 20.12.0 package?

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ovn (Ubuntu):
status: New → Confirmed
Revision history for this message
Chris Valean (cvalean) wrote (last edit ):

[Impact]

In an OpenStack Wallaby Kolla deployment, the OVN controller container gets restarted when this bug is hit, and it can impact all compute nodes.

The patches linked are already backported and present in the v21 ovn series.

This patch fixes this issue by cloning the hmap 'flood_remove_nodes' and using it to
iterate the flood remove nodes.

[Test Plan]

We have been able to reproduce the bug using the steps detailed in the test case that is part of fixing the bug, full steps are below.

It has also been verified that by swapping the OVN container to the Xena release - ovn v21 - that has this patch already, the bug is no longer reproducible using the same steps.

[Other Info]

There are 2 patches linked to the same issue when flood removing flows is crashing the OVN controller. Both these patches must be backported.
https://github.com/ovn-org/ovn/commit/858d1dd716db1a1e664a7c1737fd34f04fcbda5e
https://github.com/ovn-org/ovn/commit/c6c61b4e3462fb5201a61a226c2acaf6f4caf917

The issues have also been tested in these threads:
https://bugzilla.redhat.com/show_bug.cgi?id=1929978
https://bugzilla.redhat.com/show_bug.cgi?id=1928012

Revision history for this message
Chris Valean (cvalean) wrote :

repro steps attached

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.