We’ve come a cross a startup timing issue with neutron-openvswitch-agent and dhcp-agents with underlying OVS on havana. Using COI.H1 to deploy our environment the network services seems that the default provider for startup ordering is defaulting to System V rc-scripts. All the neutron and openvswitch services are getting startup scripts created as /etc/rc2.d/S20* . With a manual install of Havana the rc2.d scripts don’t’ exists and it is solely using upstart. What we're seeing is most of the time dhcp-agent and openvswitch agents are coming up before openvswitch and not able to attach to the bridge properly on startup and therefore we’re having issues with flows getting and instance connectivity work properly. We’ve captured below in dhcp.log. If we remove the rc2.d/S20* startup scripts. If we restart neutron-openvswitch-agent and neutron-dhcp-agent when seeing this problem connectivity is getting established.
2014-01-28 18:02:15.456 13310 ERROR neutron.common.legacy [-] Skipping unknown group key: firewall_driver
2014-01-28 18:02:16.447 13310 ERROR neutron.agent.dhcp_agent [-] Unable to enable dhcp.
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent Traceback (most recent call last):
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/dhcp_agent.py";, line 126, in call_driver
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent getattr(driver, action)(**action_kwargs)
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/dhcp.py";, line 167, in enable
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent reuse_existing=True)
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/dhcp.py";, line 702, in setup
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent namespace=network.namespace)
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/interface.py";, line 161, in plug
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent self.check_bridge_exists(bridge)
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/interface.py";, line 102, in check_bridge_exists
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent raise exceptions.BridgeDoesNotExist(bridge=bridge)
2014-01-28 18:02:16.447 13310 TRACE neutron.agent.dhcp_agent BridgeDoesNotExist: Bridge br-int does not exist.
Could you confirm:
* what scenario you're using for this deployment
* are you using Cisco or UCA packaging
I wasn't able to replicate this in a deployment this afternoon, but it smells like a race condition of sorts per your analysis above. The manifests we're using here don't set the provide for service ensures specifically, which means it's defaulting to whatever the system default is (Puppet's docs are confusing on this point and imply the default is two different things, so I'll have to do a bit more digging). If that's indeed the case, then this might be resolvable by either disabling the ensure completely or changing the provider to upstart for these packages. Neither is super clean, so we may want to confirm that's what the problem is and/or investigate other options as well.