Comment 16 for bug 1555162

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

@Alexey, thanks for the snapshot. The test from snapshot uses custom network template where ALL bridges are OVS ones (provider=ovs). So it was much easier to reproduce and troubleshoot the issue.

OK, so it looks like we have 2 problems here:
1) Improper configuration of OVS bridges and ports in /etc/network/interfaces.d/ifcfg-* files (L23_stored_config in fuel-library)
2) Upstart/pre-if-up configuration for openvswitch-switch service.

Some details on each of those problems:
1) Let's take a look at stored config for br-ex - http://paste.openstack.org/show/494246/
If you run 'ifdown br-ex' and then 'ifup br-ex' this bridge will lose connection to physical network (ovs port enp0s4 will be missing from br-ex). It's happening because enp0s4 is not configured as OVSPort. Another problem is that both ports (enp0s4 and p_ff798dba-0) have "auto" enabled. So system tries to bring them up directly (not under bridge) when starts 'networking', which may fail, of course, since bridges are not yet created. Ports, connected to OVS bridge, should not have "auto" parameter (see examples here https://github.com/openvswitch/ovs/blob/master/debian/openvswitch-switch.README.Debian). Those ports will be brought up by /etc/network/if-pre-up.d/openvswitch script (see line #7 in this paste http://paste.openstack.org/show/494249/ ).
So we need to fix our manifests to configure OVS bridges and their ports like this: http://paste.openstack.org/show/494250/

2) The same /etc/network/if-pre-up.d/openvswitch script is the first one who brings openvswitch-switch service up (see http://paste.openstack.org/show/494251/). And it looks like the problem is related to ovsdb accessibility - it tries to configure OVS interfaces while ovsdb is not up yet (/var/run/openvswitch/db.sock is not accessible). After adding a simple wait for socket loop (see http://paste.openstack.org/show/494252/ ) the problem with missing OVS bridges was solved and I was able to find 'NO SOCKET' message in upstart networking log, so it obviously tried to execute ovs commands (which would fail without wait loop).