Flapping Dqlite schema upgrade test

Bug #2072658 reported by Martin Kalcok
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
microovn
New
Undecided
Unassigned

Bug Description

Testsuite `ovsdb_schema_upgrade.bats` is currently unstable and occasionally fails. This test tries to detect whether:
* Microovn's internal dqlite schema needs upgrade
* OVN's OVSDB schema needs upgrade

and then take different approach in each scenario. The detection occurs here [0], and if it detects that internal dqlite schema needs upgrade, the test upgrades every single node before proceeding. Highlighted call to `wait_microovn_online` is prone to failure after last node is upgraded, because that's when the dqlite schema upgrade kicks in, and 30 seconds may not be enough for all nodes to become "ONLINE" after that.

There's also a minor secondary issue. the call to `wait_microovn_online` is called on every iteration of the loop if the Dqlite upgrade is required. However as it waits for all nodes to be "ONLINE" it can never succeed until all nodes are upgraded (i.e. on last loop iteration). All it's doing is adding "elaborate" 30 second sleep on each iteration.

My suggestion would be to detect dqlite upgrade by checking for presence of keywords like "UPGRADING" and "NEEDS UPGRADE", rather than absence of "ONLINE". Then on the last iteration run `wait_microovn_online` with timeout of 60 seconds to detect that all nodes became available after upgrade.

[0]https://github.com/canonical/microovn/blob/a14a37d362349463bb7859825b2d3148354d9a3a/tests/test_helper/bats/ovsdb_schema_upgrade.bats#L75-L83

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.