OVS isn't persisting mac addresses on OVS bridges

Bug #1329238 reported by Michael Kerrin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Dan Prince

Bug Description

Another issue rebooting the controller. My controller has mac address
00:e4:c9:1b:bd:d3 and ip address 192.0.2.25

When we boot the controller we should see DHCP request in undercloud syslog like so:
Jun 11 10:44:03 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPDISCOVER(tapd7cad9be-d7) 00:e4:c9:1b:bd:d3
Jun 11 10:44:03 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPOFFER(tapd7cad9be-d7) 192.0.2.25 00:e4:c9:1b:bd:d3
Jun 11 10:44:05 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPREQUEST(tapd7cad9be-d7) 192.0.2.25 00:e4:c9:1b:bd:d3
Jun 11 10:44:05 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPACK(tapd7cad9be-d7) 192.0.2.25 00:e4:c9:1b:bd:d3 host-192-0-2-25

But when I reboot the controller I regular see request like so:

Jun 11 10:44:31 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPDISCOVER(tapd7cad9be-d7) 192.0.2.25 4a:9a:09:ae:b8:ca no address available
Jun 11 10:44:36 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPDISCOVER(tapd7cad9be-d7) 192.0.2.25 4a:9a:09:ae:b8:ca no address available
Jun 11 10:44:48 undercloud-undercloud-yirqmknwweow dnsmasq-dhcp[5294]: DHCPDISCOVER(tapd7cad9be-d7) 192.0.2.25 4a:9a:09:ae:b8:ca no address available

This is a known issue with openvswitch. http://openvswitch.org/pipermail/discuss/2014-May/014058.html references a solution for redhat.

I tried to configure this on controller by updating the ensure-bridge script to add the following line into the /etc/network/interfaces file:
ovs_extra set bridge br-ex other-config:hwaddr=00:e4:c9:1b:bd:d3

Running ovs-vsctl list bridge br-ex confirms the other_config value is set but this doesn't work all the time.

Running that command in network-interface upstart job, I see the following debug in /var/log/upstart/network-interface-br-ex.log

_uuid : df757d0b-da3e-4c37-a437-4c0eeabceeb1
controller : []
datapath_id : []
datapath_type : ""
external_ids : {}
fail_mode : []
flood_vlans : []
flow_tables : {}
ipfix : []
mirrors : []
name : br-ex
netflow : []
other_config : {hwaddr="00:e4:c9:1b:bd:d3"}
ports : [09978c2e-e0bd-44d1-8f7f-c26f81a30282, 0dcb85ee-1f1a-4e32-895c-c328915c9700, 2c9bf129-1390-4d8b-ba9b-125abd516234]
protocols : []
sflow : []
status : {}
stp_enable : false
ifup: interface eth0 already configured
Internet Systems Consortium DHCP Client 4.2.4
Copyright 2004-2012 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

parse_option_param: Bad format a
Listening on LPF/br-ex/4a:9a:09:ae:b8:ca
Sending on LPF/br-ex/4a:9a:09:ae:b8:ca
Sending on Socket/fallback
DHCPREQUEST on br-ex to 255.255.255.255 port 67
DHCPREQUEST on br-ex to 255.255.255.255 port 67
DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 6
DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 11
DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 14
DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 13
DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 12
DHCPDISCOVER on br-ex to 255.255.255.255 port 67 interval 5
No DHCPOFFERS received.
Trying recorded lease 192.0.2.25
PING 192.0.2.1 (192.0.2.1) 56(84) bytes of data.

— 192.0.2.1 ping statistics —
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.983/1.983/1.983/0.000 ms
bound: renewal in 42327 seconds.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-image-elements (master)

Fix proposed to branch: master
Review: https://review.openstack.org/99604

Changed in tripleo:
assignee: nobody → Michael Kerrin (michael-kerrin-w)
status: New → In Progress
Revision history for this message
Michael Kerrin (michael-kerrin-w) wrote :

This review is an attempt to fix this. Sadly it doesn't work but if anyone knows anything about OVS and would like to help it is a starting point.

Revision history for this message
Dan Prince (dan-prince) wrote :

If I understand the root problem here it is that DHCP isn't working on reboot? And you are trying to fix that by making the OVS bridge persistent?

The OVS bridge should inherit the MAC address of the first "physical" NIC assigned to it. See here:

http://git.openstack.org/cgit/openstack/tripleo-incubator/tree/scripts/configure-vm#n11

This is standard bridge behavior and applies not only to OVS bridges but also to linux bridges as well. You shouldn't have to force the bridge to get the MAC of the NIC that is assigned to it. It should just happen automatically. This is how it works on Fedora at least... (reboots work for me just fine)

Steve Kowalik (stevenk)
Changed in tripleo:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/101719

Changed in tripleo:
assignee: Michael Kerrin (michael-kerrin-w) → Robert Collins (lifeless)
Changed in tripleo:
assignee: Robert Collins (lifeless) → Derek Higgins (derekh)
Changed in tripleo:
assignee: Derek Higgins (derekh) → Robert Collins (lifeless)
Revision history for this message
Dan Prince (dan-prince) wrote :

On IRC it was mentioned this might also be due to an interaction w/ dhcp-all-interfaces on Ubuntu? Have we looked into that further?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/104625

Changed in tripleo:
assignee: Robert Collins (lifeless) → Dan Prince (dan-prince)
Revision history for this message
Dan Prince (dan-prince) wrote :

I spent some time debugging this today. I think it all boils down to a simple ordering issue in /etc/network/interfaces. We want the bridges to be at the top (so they are started first). As we already start bridges first manually this would explain why it works on first boot... but sometimes doesn't on reboot which is handling by ifup -a.

Changed in tripleo:
assignee: Dan Prince (dan-prince) → Robert Collins (lifeless)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-image-elements (master)

Change abandoned by lifeless (<email address hidden>) on branch: master
Review: https://review.openstack.org/101719

Changed in tripleo:
assignee: Robert Collins (lifeless) → Dan Prince (dan-prince)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/104625
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=4467f881ad62bd521e223df48301725a5dfe6a93
Submitter: Jenkins
Branch: master

commit 4467f881ad62bd521e223df48301725a5dfe6a93
Author: Dan Prince <email address hidden>
Date: Thu Jul 3 13:41:54 2014 -0400

    ensure-bridge: bring up bridges first on Debian

    We should bring up the interfaces on boot in the same order
    with which we bring them up in this script.

    In this script we write out /etc/network/interfaces and then
    bring up the bridges first, and interfaces second. This
    allows us to bring up bridges using DHCP with attached ports
    and is working quite well on initial deployment.

    It has been reported that upon reboot (sometimes?) an Ubuntu
    system does not obtain its DHCP ip_address correctly. This
    is most likely due to the fact that ifup -a (which is
    executed at boot time via networking.conf) brings up
    the interfaces *in order* by which they are listed in
    /etc/network/interfaces.

    As this order does not match the order in this script this
    is most likely the cause of the race which sometimes causes
    DHCP to fail.

    Change-Id: I91628cb78565aec9764e0dcb33c1b1fe165b7c4a
    Closes-bug: #1329238

Changed in tripleo:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote :

Hmmm, I'm not at all sure this analysis is correct, since we call ifup on device detection, before ifup -a executes.

Revision history for this message
Michael Kerrin (michael-kerrin-w) wrote :

Thanks everyone for looking into this. Every time I went to look into this stuff changed and dragged me else where.

I was able to reproduce this really easy and I am not sure what has changed but I have not been able to reproduce the problem in a while now. networking is coming up fine on reboots. My setup has evolved with upstream and local changes so I can't pin point anything done.

Jay Dobies (jdob)
Changed in tripleo:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/99604
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=1709a3a1eec599a3723eecf4cca908f111d4f0d3
Submitter: Jenkins
Branch: master

commit 1709a3a1eec599a3723eecf4cca908f111d4f0d3
Author: Robert Collins <email address hidden>
Date: Mon Jun 30 14:26:01 2014 +1200

    Set the MAC address for ensure-bridge bridges

    We're seeing the bridge sometimes not glue to the right port address.
    The defined behaviour is to dynamically change the MAC based on
    attached ports and as we need DHCP to work, we should set the MAC to
    the MAC that Neutron expects.

    Co-Authored-By: Michael Kerrin <email address hidden>
    Closes-bug: #1329238
    Change-Id: I87302d468784a9c9d1703f485031016f87d57873

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.