OpenStack Compute (Nova)

Random mac address should start with a high number

Reported by justinsb on 2012-01-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Vish Ishaya

Bug Description

I literally stumbled upon this bug report with LXC, where the network bridge takes the mac address of an interface if the mac address is numerically lower, and this causes problems if that instance is shut down. I think it may well apply to nova?...

https://www.redhat.com/archives/libvir-list/2010-July/msg00450.html

Or

http://sourceforge.net/tracker/?func=detail&atid=826303&aid=3411497&group_id=163076
(Be sure to open up the comments!)

The workaround seems to be to try to generate a mac address with a high first byte eg 0xfe.

We use 0x02, which is highly likely to have problems (in the same way that 0xfe is unlikely to!)

mac = [0x02, 0x16, 0x3e,
               random.randint(0x00, 0x7f),
               random.randint(0x00, 0xff),
               random.randint(0x00, 0xff)]

I think this could be related to Bug #908194, but I don't really understand the proposed workaround there.

Brian Waldon (bcwaldon) on 2012-01-27
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Changed in nova:
milestone: none → essex-rc1
assignee: nobody → Vish Ishaya (vishvananda)

Fix proposed to branch: master
Review: https://review.openstack.org/5111

Changed in nova:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/5111
Committed: http://github.com/openstack/nova/commit/a186df0ef557de984691d3042a21f0ba331009b4
Submitter: Jenkins
Branch: master

commit a186df0ef557de984691d3042a21f0ba331009b4
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Mar 8 15:36:31 2012 -0800

    Use a high number for our default mac addresses

     * FE has the locally administered bit set and multicast bit unset
     * fixes bug 921838

    Change-Id: Id33a06985c4150da4c5367c700f894590fdac2b9

Changed in nova:
status: In Progress → Fix Committed
justinsb (justin-fathomdb) wrote :

I think this might need to be reverted (yes, I know this is all my fault - sorry!)

I'm trying to use FlatManager, and it simply doesn't work once this code is applied.

I was getting errors like this in dmesg:
[ 2199.836114] br100: received packet on vnet1 with own address as source address

I could ping the guest from the host, but the guest couldn't ping out, because ARP resolution inside the guest was failing. I could see the ARP request being made succesfully from the host, but the ARP table wasn't updated in the guest and it would rapidly retry ARP resolution.

I couldn't ping the guest from other machines, I think because the guest didn't know where to send the return packets.

Reverting the change above fixes the problem immediately.

However, something very odd is happening; I think maybe libvirt is doing something we're not aware of that is breaking us: the MAC address of the vnetX interface is 0xfe, though the internal address is 0x02. It looks like when the two are the same things are FUBAR.

I have below 3 machines, the first two launched with the 0xfe and broken, the last one launched with the patch reverted and working.

(The MAC in the host (vnet2) is the same as the guest, but with 0x02 -> 0xfe)
Host: ifconfig
...
vnet0 Link encap:Ethernet HWaddr fe:16:3e:09:66:f7
          inet6 addr: fe80::fc16:3eff:fe09:66f7/64 Scope:Link
...
vnet1 Link encap:Ethernet HWaddr fe:16:3e:14:e1:62
          inet6 addr: fe80::fc16:3eff:fe14:e162/64 Scope:Link
...
vnet2 Link encap:Ethernet HWaddr fe:16:3e:31:8e:88
          inet6 addr: fe80::fc16:3eff:fe31:8e88/64 Scope:Link
...

(The MAC in the guest is the same as vnet2, but with 0x02)
Guest: ifconfig
eth0 Link encap:Ethernet HWaddr 02:16:3e:31:8e:88
...

justinsb (justin-fathomdb) wrote :

On a hunch, I changed the magic byte to 0xfc. That now works. Libvirt (or whoever it is that is tweaking the address) sets the first byte to 0xfe, we get a 0xfc MAC. They're different, so all is OK. At least that's my theory.

Of course, if libvirt is tweaking the address, then we didn't need the change in the first place - sorry! Any idea if libvirt does this (or if not, how is the bridge address getting set to 0xfe?)

With 0xfc:

Guest ifconfig:
eth0 Link encap:Ethernet HWaddr fc:16:3e:63:63:3c
...

Host ifconfig:
ifconfig vnet3
vnet3 Link encap:Ethernet HWaddr fe:16:3e:63:63:3c

Changed in nova:
status: Fix Committed → In Progress
justinsb (justin-fathomdb) wrote :

I did some source-code splunking. It is libvirt that's changing the address to 0xfe.

Here's the patch where it was introduced; this is
http://libvirt.org/git/?p=libvirt.git;a=commit;h=6ea90b843eff95be6bcbb49a327656fc6f6445ef

(If you're looking at the latest code, and scratching your head, it's because this file was split into 3 files about a year ago and then the MAC code was rearchitected 11 days ago in c1b164d70c738b0d7de530417f49a142680fe294)

So, it looks like libvirt is fixing the problem for us, even if we specify a different mac address. Note that this causes all kinds of problems for e.g. OpenVSwitch, because the MAC address we specify isn't the one that gets configured on the bridge.

I think the bridge has a problem when the guest and host MAC addresses are the same, though I don't know why this should be the case.

So I think we can simply revert this change, though this means that the original bug is still problematic.

I'm going to stay away from networking for a while I think...

Reviewed: https://review.openstack.org/5351
Committed: http://github.com/openstack/nova/commit/b684d651f540fc512ced58acd5ae2ef4d55a885c
Submitter: Jenkins
Branch: master

commit b684d651f540fc512ced58acd5ae2ef4d55a885c
Author: Vishvananda Ishaya <email address hidden>
Date: Wed Mar 14 10:34:33 2012 -0700

    Refix mac change to work around libvirt issue

     * fixes bug 921838

    Change-Id: I11278a03c4429686499b2f62c66a7f442258f5a6

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-03-20
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc1 → 2012.1

Fix proposed to branch: master
Review: https://review.openstack.org/6327

Changed in nova:
assignee: Vish Ishaya (vishvananda) → Cor Cornelisse (corcornelisse)
status: Fix Released → In Progress
Vish Ishaya (vishvananda) wrote :

The fix is above is actually for the related bug 975043

Changed in nova:
status: In Progress → Fix Released
assignee: Cor Cornelisse (corcornelisse) → Vish Ishaya (vishvananda)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers