hairpin mode on vnet bridge ports causes false positives on IPv6 duplicate address detection
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Medium
|
Takashi Sogabe |
Bug Description
Because of bug 933640 (https:/
router advertisement recieved -> configures IPv6 on interface-> sends neighbour advertisement -> receives own neighbour advertisement -> removes IPv6 address from interface
from instance' hum' syslog: Jun 10 10:19:44 hum kernel: [ 150.028370] eth0: IPv6 duplicate address 2001:6f8:
after disabling hairpin mode on the compute node of the vnet interface used the error disapeared and IPv6 connectivity was enabled.
lucas kauffman (lucas-kauffman) wrote : | #1 |
Manu Sporny (msporny) wrote : | #2 |
Confirmed here as well.
There can be issues with IPv6 failing to initialize in a VM that is running on a host machine which is bridging the network traffic to the host's physical ethernet port. The issue appears because of two reasons:
1. IPv6 has a duplicate address detection feature, where if it sees packets from the same IPv6 Link-Local (MAC-based) address as itself, it assumes that there is another box on the same network with the same MAC address.
2. With network bridging hairpin'ing turned on, all packets are reflected back to the VMs, so any IPv6 traffic is duplicated and sent back to the VM... which means that the IPv6 duplicate address detection code activates and the IPv6 subsystem skips initialization.
The bug has to do with two separate systems stomping on each other:
1. The bridge is misconfigured. Either it is in promiscuous mode, or the virtual network interface has hairpin'ing turned on. Hairpin'ing reflects all traffic, including IPv6 Neighbor Solicitation messages, back to the sender.
2. The Linux kernel IPv6 code, upon seeing the reflected IPv6 Neighbor Solicitation message, assumes the address is in use (when it really isn't) and doesn't bring the link up all the way as a result.
Detecting the issue
-------
On the VM, run the following command to see if you have an IPv6 Duplicate Address Detection issue:
dmesg | grep duplicate
If you see a line that matches something like the following, you have the IPv6DAD issue:
eth0: IPv6 duplicate address detected!
You can also issue the following command to see if you have any issues with your IPv6 links:
ip addr | grep tentative
If you see a line that matches something like the following, you most likely have the IPv6DAD issue:
inet6 fe80::a816:
Fixing the Issue
-------
There are two potential fixes for the issue:
1. The bridge interface is in promiscuous mode.
2. Hairpin'ing is turned on for the virtual network device and is thus reflecting IPv6 Neighbor Advertisement messages back to the sender.
To solve #1, turn promiscuous mode for the bridge device off by doing the following:
ifconfig br100 -promisc
To solve #2, you have to turn off hairpin'ing mode for the virtual network interface that is associated with the VM that is not able to setup a valid IPv6 link. So, on the OpenStack Compute Node that is hosting the VM, assuming that the bridge device is named 'br100' and the virtual network interface is named 'vnet0', you would perform the following command to turn off hairpin'ing:
echo 0 > /sys/class/
If using OpenStack, to make the change more permanent, you can comment out the hairpin code in /usr/share/
Links
-------
* See Section '''14.2.2. Neighbor discovery''': http://
* Hairpin'ing feature: http://
Evan Callicoat (diopter) wrote : | #3 |
I am the author of the hairpin_mode change and I recently had this bug brought to my attention, much to my chagrin!
I believe what is happening here is that when ICMPv6 sends certain messages (like Neighbor Solicitations in Duplicate Address Detection) it uses a multicast destination MAC address (33:33:xx:xx:xx:xx) in the ethernet frame sent to the host bridge. When the bridge receives the frame, given that it doesn't engage in IGMP snooping, it treats multicast MAC addresses just like the broadcast MAC address, and forwards the frame to all ports. With hairpin_mode enabled on the port the frame entered the bridge on, it will get copied back out that same port, resulting in the behavior seen above.
I believe the simplest approach to solving the problem without potentially breaking or altering any other behaviors is to add an nwfilter to libvirt which identifies this particular scenario and filters it, like this:
<filter name='no-
<!-- drop if destination mac is v6 mcast mac addr and we sent it. -->
<rule action='drop' direction='in'>
<mac dstmacaddr=
</rule>
<!-- not doing anything with sending side ... -->
</filter>
I haven't tested this yet (so hopefully my syntax is correct there) but the idea is very simple: there's no normal scenario in which we should receive a v6 multicast frame originating from our own interface, so we can accurately identify this as a reflection to be dropped by a bridge filter.
I should have a chance to test this at some point soon and if it works, see about where to submit new nwfilters, but in the meantime it'd be great if anyone affected by this bug could try dropping in a no-mac-
Evan Callicoat (diopter) wrote : | #4 |
Whoops, forgot to mention that nwfilter files go in /etc/libvirt/
Changed in nova: | |
status: | New → Confirmed |
Changed in nova: | |
importance: | Undecided → Medium |
Changed in nova: | |
assignee: | nobody → Takashi Sogabe (sogabe) |
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | Confirmed → In Progress |
tags: | added: folsom-rc-potential |
Reviewed: https:/
Committed: http://
Submitter: Jenkins
Branch: master
commit 0436cbdb882b532
Author: Takashi Sogabe <email address hidden>
Date: Wed Oct 3 17:19:20 2012 +0900
handle IPv6 race condition due to hairpin mode
When using IPv6 an instance sees its own neighbour advertisement,
because of the reflective property of the hairpin mode.
Because of this the trigger-happy duplicate address detection in
the instance's kernel deconfigures the IPv6 address on the interface,
resulting in no IPv6 connectivity.
Approach of this commit is to to add an nwfilter to libvirt which
identifies this particular scenario and filters it.
Change-Id: I28f9b49cee4b2a
Changed in nova: | |
status: | In Progress → Fix Committed |
tags: | removed: folsom-rc-potential |
Changed in nova: | |
milestone: | none → grizzly-1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | grizzly-1 → 2013.1 |
Bernhard M. Wiedemann (ubuntubmw) wrote : | #7 |
We hit this bug in our Essex-based SUSE Cloud 1.0 deployment which does not use IPv6, so I think the patch is incomplete.
After reading http://
Btw: Did anyone manage to ping a floating IP from the VM it is assigned to? did not work for me.
https:/
has a summary of what I debugged during the last week
Evan Callicoat (diopter) wrote : | #8 |
The bugzilla you've linked to is down for maintenance, so I'm going to go at this blind to what you know or have done. Bear with me!
I thought the same thing when I first hit bug 933640 in a customer's deployment, where they absolutely refused to run split-DNS. They had a cluster of cattle (not pets), where any service could be running on any instance, and services only knew about each other through their global DNS hostnames, which were mapped to their floats, further distinguished by ports.
I specialize in Linux networking and spent about three days going over the issue on a whiteboard, working with Vish and other folks, and hairpinning was the simplest and most elegant solution I came up with at that time, and I'll tell you my reasoning.
First of all, I agree with your preliminary reading. Hairpin mode in the Linux kernel was implemented as part of a larger implementation of features to allow Linux to be a Virtual Ethernet Port Aggregator (VEPA), which is related to VEB as you mentioned. Hairpinning is absolutely an L2 functionality, and talking to your own float is indeed a L3 problem. However, getting out to your float and back in to a service that's actually listening on the same private IP you're sourcing from, without having to rewrite the client or service, is both an L2 and L3 issue.
The L3 portion is common; we need to DNAT on the way to a float (ingress initiated), and SNAT on the way from it (egress initiated), so the client thinks it's talking to the original service, and the translated service thinks the translator is the original client. For talking to our own float, we actually need to do both, but from the "back" (VM rather than public) side of the host: DNAT towards our float, which translates to our (private) IP as the new destination, then SNAT on the way back to ourself, so it looks like the traffic actually came *from* our float.
However, this gives us an L2 issue now. Namely that with native Linux bridges and its bridging engine's netfilter interaction (which you can see here from one of the netfilter devs: http://
The iptables rules in nova-network in Diablo/Essex to DNAT/SNAT floats didn't have -s restrictions, and may or may not have -i/-o restrictions depending on nova.conf flags. This turned out to be fortuitous, because it meant that I could rely on the same rules for the usual two float NAT patterns I mentioned earlier, yet hit them from the back side, without changing any iptables rules.
This left only one more minor issue, which is that the SNAT wasn't being hit on the way back to the VM, because of the iptables rule designed to -j ACCEPT fixed -> fixed VM traffic, short-circuiting before the float SNAT rules. So I had to make one minor change there; I added -m conntrack ! --ctstate DNAT (example from folsom, since esse...
Bernhard M. Wiedemann (ubuntubmw) wrote : | #9 |
I made a patch for this bug https:/
I can confirm this problem.