IPV6 fragmentation and mtu issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| linux (Ubuntu) |
Medium
|
Jay Vosburgh | ||
| Trusty |
Undecided
|
Unassigned | ||
| Vivid |
Undecided
|
Unassigned |
Bug Description
Fragmented IPv6 packets are REJECTED by ip6tables on compute nodes. The traffic is goign through an intra-VM network and the packet loss is hurting the system.
There is a patch for this issue: http://
I would like to know is there any bug report or official release date for this issue ?
This is pretty critical for my deployment.
Thanks in advance,
BR,
Gyula
Related branches
CVE References
Changed in nova: | |
status: | New → Confirmed |
Changed in neutron: | |
status: | New → Confirmed |
SecurityFun23 (securityfun23) wrote : | #1 |
Gyula Halmos (gyula-halmos) wrote : | #2 |
Hi there,
We are rebuilding our computes' kernels with the patch to test the solution. But other than that we are waiting for some solution is it is a real showstopper for some of our customers, as their security policies doesnt allow to bypass ip6tables.
BR,
Gyula
Kevin Benton (kevinbenton) wrote : | #3 |
Thanks for the report. I've been looking at the netfilter docs and it doesn't look like we can stop the re-assembly and still have the first packet processed by conntrack. Do you know if this is possible?
If so, I can submit a patch to install a rule that would allow the subsequent fragments to go by as a temporary workaround. The downside would be that arbitrary fragments could get through.
Sean M. Collins (scollins) wrote : | #4 |
I'd like to see the fix get merged into the linux kernel, inside netfilter. I don't think this is a Nova/Neutron specific thing that we can fix independently, since even the proposed fix has side-effects that can be undesirable.
SecurityFun23 (securityfun23) wrote : | #5 |
I agree that the best solution would be to have this merged into the linux kernel. Also, I am unaware of a method to prevent reassembly while still using conntrack. Probably the only way to prevent the re-assembly would be to disable conntrack, but then that would break stateful firewalling, and that wouldn't be consistent with OpenStack security groups. So not only is the kernel update the "best" solution, I think its probably the only complete / consistent one.
As far as kernel patches go, I did want to point out that the current behavior for the IPv4 iptables re-assembly / re-fragmentation is to create entirely new fragments. For linux bridges, in both IPv4 and IPv6, I think that the more desirable behavior is to transmit the original fragments instead of re-fragmenting. The reason this behavior would be preferred is that re-fragmentation now introduces the MTU of the bridge into the picture. If the MTU of the bridge is larger or smaller than the MTU of the VM or gateway this can cause problems. Note that with OpenStack Juno the MTU of the bridges used by the OVSHybridIptabl
However, I should mention that an advantage of re-fragmentation is that you are guaranteed that the packet that is passed by iptables really matches what iptables thinks it does. I know there are some security attacks that take advantage of fact that ip fragment overlaps are handled differently by different operating systems. So iptables might interpret the packet as having one meaning and the end system might interpret it as something else. Doing re-fragmentation solves this issue. However, I would suggest ip fragment overlaps are really bad behavior, and the packet should probably just be dropped rather than trying to fix this by re-fragmenting.
This MTU issue is sort of secondardy to the IPv6 fragmentation patch BUT if someone is looking here to figure out the patch they should submit, I would like them to have this extra info in case they want to improve the patch.
Sean M. Collins (scollins) wrote : | #6 |
A fix to Netfilter was merged into Linux Kernel version 4.2:
https:/
So, in my mind this isn't really an OpenStack bug. It's a Linux bug that just so happens to be hurting OpenStack deployers that want to do IPv6 related things. I think this bug can be closed, since the real fix is "Wait until distros start deploying the fix" - which yes is a long time away - unless someone backports. Regardless, more of a kernel issue than Nova or OpenStack.
Kevin Benton (kevinbenton) wrote : | #7 |
Setting the MTU on the tap, bridge and veth pair to something high seems to fix it in my rudimentary testing.
for int in $(ifconfig | grep 1234abcd | grep -v qbr | awk '{ print $1 }'); do sudo ifconfig $int mtu 9000; done
Where "1234abcd" is the first part of the VM port's UUID.
SecurityFun23 (securityfun23) wrote : | #8 |
Note that increasing the MTU is only possible if your network allows large ethernet frames, which is often not the case. Also, it is only a partial fix since you can't normally increase the MTU much above 9000 (as the Jumbo frame limit is usually near there).
tags: | added: kernel-key |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1463911
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | Incomplete → Confirmed |
tags: | added: bot-stop-nagging |
tags: | added: sts |
Dave Chiluk (chiluk) wrote : | #10 |
I have confirmed that this exists with the latest trusty kernels. I have also attempted a cherry-
Changed in linux (Ubuntu): | |
assignee: | nobody → Dave Chiluk (chiluk) |
Jay Vosburgh (jvosburgh) wrote : | #11 |
I have done a backport of
commit efb6de9b4ba0092
Author: Bernhard Thaler <email address hidden>
Date: Sat May 30 15:30:16 2015 +0200
netfilter: bridge: forward IPv6 fragmented packets
to the trusty 3.13 kernel. This necessitated pulling in some bits from other patches as well. I am currently testing for regressions and will submit it for SRU if all goes well.
Changed in linux (Ubuntu): | |
assignee: | Dave Chiluk (chiluk) → Jay Vosburgh (jvosburgh) |
SecurityFun23 (securityfun23) wrote : | #12 |
Jay Vosburgh,
Have you finished your testing of the patch for Ubuntu Trusty? I would be interesting in getting a hold of a "beta" version of that patch even if i wasn't an official part of Ubuntu.
Thanks!
Gyula Halmos (gyula-halmos) wrote : | #13 |
Would be interested as well.
Thanks!
Jay Vosburgh (jvosburgh) wrote : | #14 |
The original patch had an error in it; I believe I've found it and once I verify that and clean it up a bit I"ll attach it to the bug.
SecurityFun23 (securityfun23) wrote : | #15 |
Jay Vosburgh,
Just wondering if you found some time to clean up that patch.
Thanks!
tags: |
added: kernel-da-key removed: kernel-key |
Jay Vosburgh (jvosburgh) wrote : | #16 |
SRU Justification:
Impact:
This bug causes issues when ip6tables modules are loaded with IPv6
fragmented packets traversing a bridge. The extant conntrack processing
will reassemble the IPv6 fragments for netfilter processing, but is
incapable of re-fragmenting these datagrams for subsequent forwarding.
This causes the fragmented IPv6 datagrams to be dropped.
Fix:
This is resolved by backporting functionality from mainline that
re-fragments the IPv6 datagrams upon bridge egress.
Testcase:
The patch commit log includes a test case; to summarize:
A bridge is configured with two ports and interfaces are attached
to these ports. A traffic source beyond one port generates fragmented
IPv6 datagrams, e.g., ping6 -s 2000, destined for a host beyond the
bridge.
With ip6tables modules unloaded, the IPv6 fragments will traverse
the bridge. Loading ip6tables, e.g., "ip6tables -t nat -L", will cause
IPv6 fragmented datagrams to be dropped on the unpatched kernel.
These datagrams are correctly forwarded with the patch applied.
Jay Vosburgh (jvosburgh) wrote : | #17 |
Jay Vosburgh (jvosburgh) wrote : | #18 |
Jay Vosburgh (jvosburgh) wrote : | #19 |
Jay Vosburgh (jvosburgh) wrote : | #20 |
Test methodology performed on 3.19 kernel with patch applied:
Host A: fd01:2222::1/64 direct connect to host C
ip addr add fd01:2222::1/64 dev eth0
Host B: fd01:2222::2/64 direct connect to host C
ip addr add fd01:2222::2/64 dev eth0
host C: direct connect interfaces for Hosts A & B bridged together:
brctl addbr testbr0
brctl addif testbr0 eth1
brctl addif testbr0 eth5
ip link set dev eth1 up
ip link set dev eth5 up
ip link set dev testbr0 up
ip addr add fd01:2222::99/64 dev testbr0
host A:
continuous ping6 to host C's address beyond the bridge, using size large
enough to generate fragmented IPv6 datagrams for mtu setting of 1500:
ping6 -s 4000 fd01:2222::2
host C:
load ip6tables_nat:
ip6tables -t nat -Ln
Observe on host A that ping continues uninterrupted
Inspect eth1 and eth5 interfaces on host C with tcpdump to confirm traffic passes
through the bridge
Jay Vosburgh (jvosburgh) wrote : | #21 |
The equivalent testing to comment #20 was also performed on the 3.13 and 3.16 kernels, additionally, a customer separately validated the 3.13 and 3.16 patches in their environment.
Changed in linux (Ubuntu Trusty): | |
status: | New → Fix Committed |
Changed in linux (Ubuntu Vivid): | |
status: | New → Fix Committed |
BALAJI SRINIVASAN (balaji-vasan) wrote : | #22 |
I understand a patch is going to come for Ubuntu.
We are using Centos latest release.
[root@sienna net]# uname -r
3.10.0-
[root@sienna net]# cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)
Any information on patch for Centos distro?
Is there any patch in neutron too?
Jay Vosburgh (jvosburgh) wrote : | #23 |
Yes, the patch has been committed for the next Ubuntu kernel releases.
I have no information on a Centos patch; you would need to file a bug against Centos or RHEL.
No patch to Neutron is required.
Brad Figg (brad-figg) wrote : | #24 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-trusty |
tags: | added: verification-needed-vivid |
Brad Figg (brad-figg) wrote : | #25 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
Travis Parchman (travis-parchman) wrote : | #26 |
So, is this problem already resolved in Wily or is Xenial the first formal release that will not exhibit the problem?
Jay Vosburgh (jvosburgh) wrote : | #27 |
The Wily kernel (4.2) already contains the fixes for this bug.
Travis Parchman (travis-parchman) wrote : | #28 |
Awesome. Thank you most kindly for the info.
tags: |
added: verification-done-trusty verification-done-vivid removed: verification-needed-trusty verification-needed-vivid |
Launchpad Janitor (janitor) wrote : | #29 |
This bug was fixed in the package linux - 3.19.0-56.62
---------------
linux (3.19.0-56.62) vivid; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1555832
[ Florian Westphal ]
* SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
userspace
- LP: #1555338
linux (3.19.0-55.61) vivid; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1554708
[ Upstream Kernel Changes ]
* Revert "drm/radeon: call hpd_irq_event on resume"
- LP: #1554608
linux (3.19.0-54.60) vivid; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1552337
[ Upstream Kernel Changes ]
* Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
- LP: #1551419
linux (3.19.0-53.59) vivid; urgency=low
[ Kamal Mostafa ]
* Release Tracking Bug
- LP: #1550576
[ Kamal Mostafa ]
* Merged back 3.19.0-52.58
linux (3.19.0-52.58) vivid; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1548548
[ Dan Streetman ]
* SAUCE: nbd: ratelimit error msgs after socket close
- LP: #1505564
[ Upstream Kernel Changes ]
* Revert "ACPI / LPSS: allow to use specific PM domain during ->probe()"
- LP: #1542457
* Revert "workqueue: make sure delayed work run in local cpu"
- LP: #1546320
* net: ipmr: fix static mfc/dev leaks on table destruction
- LP: #1542457
* drm/nouveau/nv46: Change mc subdev oclass from nv44 to nv4c
- LP: #1542457
* ovl: allow zero size xattr
- LP: #1542457
* ovl: use a minimal buffer in ovl_copy_xattr
- LP: #1542457
* [media] vb2: fix a regression in poll() behavior for output,streams
- LP: #1542457
* [media] gspca: ov534/topro: prevent a division by 0
- LP: #1542457
* [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
- LP: #1542457
* tools lib traceevent: Fix output of %llu for 64 bit values read on 32
bit machines
- LP: #1542457
* KVM: x86: expose MSR_TSC_AUX to userspace
- LP: #1542457
* KVM: x86: correctly print #AC in traces
- LP: #1542457
* drm/radeon: call hpd_irq_event on resume
- LP: #1542457
* xhci: refuse loading if nousb is used
- LP: #1542457
* arm64: Clear out any singlestep state on a ptrace detach operation
- LP: #1542457
* time: Avoid signed overflow in timekeeping_
- LP: #1542457
* ovl: root: copy attr
- LP: #1542457
* Bluetooth: Add support of Toshiba Broadcom based devices
- LP: #1522949, #1542457
* rtlwifi: fix memory leak for USB device
- LP: #1542457
* wlcore/wl12xx: spi: fix oops on firmware load
- LP: #1542457
* ovl: check dentry positiveness in ovl_cleanup_
- LP: #1542457
* EDAC, mc_sysfs: Fix freeing bus' name
- LP: #1542457
* EDAC: Robustify workqueues destruction
- LP: #1542457
* arm64: mm: ensure that the zero page is visible to the page table
walker
- LP: #1542457
* powerpc: Make value-returning atomics fully ordered
- LP: #1542457
* powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
- LP: #1542457
* dm space map metadata: remove unused variable in brb_pop()
- LP: #1542457
* dm thi...
Changed in linux (Ubuntu Vivid): | |
status: | Fix Committed → Fix Released |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #31 |
This bug was fixed in the package linux - 3.13.0-83.127
---------------
linux (3.13.0-83.127) trusty; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1555839
[ Florian Westphal ]
* SAUCE: [nf,v2] netfilter: x_tables: don't rely on well-behaving
userspace
- LP: #1555338
linux (3.13.0-82.126) trusty; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1554732
[ Upstream Kernel Changes ]
* Revert "drm/radeon: call hpd_irq_event on resume"
- LP: #1554608
* net: generic dev_disable_lro() stacked device handling
- LP: #1547680
linux (3.13.0-81.125) trusty; urgency=low
[ Luis Henriques ]
* Release Tracking Bug
- LP: #1552316
[ Upstream Kernel Changes ]
* Revert "firmware: dmi_scan: Fix UUID endianness for SMBIOS >= 2.6"
- LP: #1551419
* bcache: Fix a lockdep splat in an error path
- LP: #1551327
linux (3.13.0-80.124) trusty; urgency=low
[ Brad Figg ]
* Release Tracking Bug
- LP: #1548519
[ Andy Whitcroft ]
* [Debian] hv: hv_set_ifconfig -- convert to python3
- LP: #1506521
* [Debian] hv: hv_set_ifconfig -- switch to approved indentation
- LP: #1540586
* [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues
- LP: #1540586
[ Dan Streetman ]
* SAUCE: nbd: ratelimit error msgs after socket close
- LP: #1505564
[ Upstream Kernel Changes ]
* Revert "workqueue: make sure delayed work run in local cpu"
- LP: #1546320
* [media] gspca: ov534/topro: prevent a division by 0
- LP: #1542497
* [media] media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
- LP: #1542497
* tools lib traceevent: Fix output of %llu for 64 bit values read on 32
bit machines
- LP: #1542497
* KVM: x86: correctly print #AC in traces
- LP: #1542497
* drm/radeon: call hpd_irq_event on resume
- LP: #1542497
* xhci: refuse loading if nousb is used
- LP: #1542497
* arm64: Clear out any singlestep state on a ptrace detach operation
- LP: #1542497
* time: Avoid signed overflow in timekeeping_
- LP: #1542497
* rtlwifi: fix memory leak for USB device
- LP: #1542497
* wlcore/wl12xx: spi: fix oops on firmware load
- LP: #1542497
* EDAC, mc_sysfs: Fix freeing bus' name
- LP: #1542497
* EDAC: Don't try to cancel workqueue when it's never setup
- LP: #1542497
* EDAC: Robustify workqueues destruction
- LP: #1542497
* powerpc: Make value-returning atomics fully ordered
- LP: #1542497
* powerpc: Make {cmp}xchg* and their atomic_ versions fully ordered
- LP: #1542497
* dm space map metadata: remove unused variable in brb_pop()
- LP: #1542497
* dm thin: fix race condition when destroying thin pool workqueue
- LP: #1542497
* futex: Drop refcount if requeue_pi() acquired the rtmutex
- LP: #1542497
* drm/radeon: clean up fujitsu quirks
- LP: #1542497
* mmc: sdio: Fix invalid vdd in voltage switch power cycle
- LP: #1542497
* mmc: sdhci: Fix sdhci_runtime_
- LP: #1542497
* udf: limit the maximum number of indirect extents in a row
- LP: #1542497
* nfs: Fix race in __update_
Changed in linux (Ubuntu Trusty): | |
status: | Fix Committed → Fix Released |
status: | Fix Committed → Fix Released |
no longer affects: | nova |
Changed in neutron: | |
importance: | Undecided → Medium |
no longer affects: | neutron |
This issues is documented in more details in the following old question: https:/ /ask.openstack. org/en/ question/ 43063/ipv6- fragmentationmt u-issue- on-icehouseubun tu-1404/
We have also seen this issue in our lab using Ubuntu 14.04 and RHEL 6. As far as we can tell, the proposed kernel patch has not been implemented in any of the current linux kernel load lines (its possible that a different patch than the one referenced in the bug report could have been applied, but if that's the case the fix has not made it into the latest Ubuntu 14.04 or RHEL6 kernels).
The underlying issue is that IPv6 fragmented packets are being re-assembled as part of the ip6tables inspection performed by the "neutron. agent.linux. iptables_ firewall. OVSHybridIptabl esFirewallDrive r" driver. This inspection occurs on the linux bridge layer, and it appears that once the packets have been assembled they are too big to be sent out of the bridge to the next interface. A better behavior would be to re-fragment the IPv6 packet, or to store and then send the original fragments.
This issue does not impact TCP in IPv6, since IPv6 does not fragment packets in the network just at the endpoints, and TCP will never create IP fragments. However, UDP and ICMP are both impacted by this issue. This means that IPv6 is essentially broken when the standard "neutron. agent.linux. iptables_ firewall. OVSHybridIptabl esFirewallDrive r" driver is used. If the NOOP driver is used or if "net.bridge. bridge- nf-call- ip6tables = 0" option is set in /etc/sysctl.conf to disable ip6tables on bridges, then IPv6 will operate properly. However, in that case Neutron Security Groups and default neutron security rules will have no impact on IPv6 packets.
Possible solutions are to get a fix for this put into the Linux Kernel, or to modify the "OVSHybridIptab lesFirewallDriv er" so that it does not trigger re-assembly (if this is even possible).