Comment 4 for bug 1944201

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

I reviewed the kibana logs and other successful executions. I think the problem is in [1]. When we change the OF service and the controller needs to be re-initialized. E.g.: in [2] we can see how when we change some parameters of the controller, it needs to be restarted. Snippet: [3].

In order to prevent (not really, just mitigate) those errors, I propose to add a retry decorator on those methods that force the OF controller to be restarted. I've identified these two:
- set_fail_mode (that calls ovsdbapp SetFailModeCommand): this command sets the bridge fail_mode.
- add_protocols: this method add new OF protocols to the bridge (makes sense the OF controller needs to be re-initialized).

I'll psuh a patch to retry those methods in case of returning this "InvalidDatapath" exception.

Regards.

[1]https://github.com/openvswitch/ovs/blob/849a40ccfb9c7c6bba635b517caac4f12ab63eee/ofproto/connmgr.c#L605-L612
[2]https://e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/logs/openvswitch/ovs-vswitchd_log.txt
[3]https://paste.opendev.org/show/809505/

P.S.: just for the records, this bug is related to:
- https://bugs.launchpad.net/neutron/+bug/1817022
- https://bugs.launchpad.net/neutron/+bug/1672610
- https://bugs.launchpad.net/neutron/+bug/1627106
- https://bugs.launchpad.net/neutron/+bug/1666731