Network connectivity lost on node reboot
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Brent Eagles |
Bug Description
A recent change (see [1]) to neutron sets the OVS fail mode to secure on all physical bridges, i.e. bridges that occur in the bridge mappings. The fail_mode is persistently stored in the OVS database so the bridges will come up in this mode on reboot. The details of what the secure fail mode means can be found in the OVS documentation, but the final result is that these bridges won't allow traffic to pass on reboot. Nodes with a single NIC that are bridged to an OVS bridge that is also used directly by neutron (basically the default for any controller and also for compute nodes when using DVR) that are also configured via DHCP will not be able to get their IP addresses on reboot and will effectively be *unplugged* from the network on reboot. The lack of IP addresses causes mayhem.
This seems to affect RHEL 7.3 consistently and current CentOS 7.2 only sporadically. I don't know about other releases of RHEL or CentOS. AFAIK at the moment this would affect any deployment matching the above description and is running a version of neutron with the patch [1].
[1] https:/
So far the workaround seems to be to modify the os-net-config generated interfaces for OVS bridges that are in the path of essential overcloud traffic (e.g. br-ex on single nic) to set the fail_mode to standalone on boot. This will give it a chance to acquire it's address via dhcp before the neutron agent starts and sets the bridge back to the secure fail_mode.
Changed in tripleo: | |
status: | Confirmed → In Progress |
tags: | added: mitaka-backport-potential newton-backport-potential |
Changed in tripleo: | |
status: | In Progress → Fix Committed |
Changed in tripleo: | |
status: | Fix Committed → Fix Released |
Fixes in progress:
tht - /review. openstack. org/#/c/ 395854/
https:/
os-net-config- /review. openstack. org/#/c/ 395795/
https:/