Activity log for bug #2072658

Date Who What changed Old value New value Message
2024-07-10 14:24:19 Martin Kalcok bug added bug
2024-07-10 14:31:40 Martin Kalcok description Testsuite `ovsdb_schema_upgrade.bats` is currently unstable and occasionally fails. This test tries to detect whether: * Microovn's internal dqlite schema needs upgrade * OVN's OVSDB schema needs upgrade and then take different approach in each scenario. The detection occurs here [0], and if it detects that internal dqlite schema needs upgrade, the test upgrades every single node before proceeding. Highlighted call to `wait_microovn_online` is prone to failure after last node is upgraded, because that's when the dqlite schema upgrade kicks in, and 30 seconds may not be enough for all nodes to become "ONLINE" after that. There's also a minor secondary issue. the call to `wait_microovn_online` is called on every iteration of the loop if the Dqlite upgrade is required. However as it waits for all nodes to be "ONLINE" it can never succeed until all nodes are upgraded (i.e. on last loop iteration). All it does is elaborate 30 second sleep on each iteration. My suggestion would be to detect dqlite upgrade by checking for presence of keywords like "UPGRADING" and "NEEDS UPGRADE", rather than absence of "ONLINE". Then on the last iteration run `wait_microovn_online` with timeout of 60 seconds to detect that all nodes became available after upgrade. [0]https://github.com/canonical/microovn/blob/a14a37d362349463bb7859825b2d3148354d9a3a/tests/test_helper/bats/ovsdb_schema_upgrade.bats#L75-L83 Testsuite `ovsdb_schema_upgrade.bats` is currently unstable and occasionally fails. This test tries to detect whether: * Microovn's internal dqlite schema needs upgrade * OVN's OVSDB schema needs upgrade and then take different approach in each scenario. The detection occurs here [0], and if it detects that internal dqlite schema needs upgrade, the test upgrades every single node before proceeding. Highlighted call to `wait_microovn_online` is prone to failure after last node is upgraded, because that's when the dqlite schema upgrade kicks in, and 30 seconds may not be enough for all nodes to become "ONLINE" after that. There's also a minor secondary issue. the call to `wait_microovn_online` is called on every iteration of the loop if the Dqlite upgrade is required. However as it waits for all nodes to be "ONLINE" it can never succeed until all nodes are upgraded (i.e. on last loop iteration). All it's doing is adding "elaborate" 30 second sleep on each iteration. My suggestion would be to detect dqlite upgrade by checking for presence of keywords like "UPGRADING" and "NEEDS UPGRADE", rather than absence of "ONLINE". Then on the last iteration run `wait_microovn_online` with timeout of 60 seconds to detect that all nodes became available after upgrade. [0]https://github.com/canonical/microovn/blob/a14a37d362349463bb7859825b2d3148354d9a3a/tests/test_helper/bats/ovsdb_schema_upgrade.bats#L75-L83
2024-07-15 08:14:48 mj microovn: assignee mj (crypticcoder)