[SRU] pre-1.5 OVS has trouble with floating ips when pinging from the same box

Bug #1044318 reported by dan wendlandt on 2012-08-31
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openvswitch (Ubuntu)
Medium
Unassigned
Precise
Medium
Unassigned

Bug Description

[Impact]
The connection tracking logic using by IPtables gets confused if a packet passes through multiple linux network namespaces on the same host. The reason for this confusion is that OVS is not properly clearing some of the fields in the skb header, meaning the connection tracking ignores this packet, so iptables functionality that relies on this (in particular DNAT and SNAT) do not work.

In particular, the use of OVS by OpenStack Quantum is critically affected by this bug.

[Fix]
The issue has been fixed upstream as of 1.4.3. A minimal 5-liner that clears the appropriate metadata from the skb header. The patch has been cherry-picked and fix released in the current Ubuntu dev. release (12.10).

[Test Case]
1. Create external network 40.0.0.0/24 for floating IPs
2. Assign br-ex the IP 40.0.0.1
3. Pinging 40.0.0.1 appears to work, but you're pinging the router, not the VM
4. Network connectivity to the VM with the floating IP does not work, as expected.

[Regression Potential]
Minimal. Simple patch that has been cherry-picked from the current upstream stable release of Openvswitch (1.4.3).

[Original Report]
Note: OVS from before 1.5, which includes the default versions shipped with 12.04 and Fedora 17, has a bug that causes it not to work correctly with floating IPs when the person contacting the floating IP is on the same box as quantum-l3-agent.

While not very likely to happen in a production setup, this is fairly common in simple development environments. For example, let's say you create an external network 40.0.0.0/24 for floating IPs. If you then assign br-ex the IP 40.0.0.1, you should be able to reach all of your VMs with floating IPs, but it won't work because of this bug. Oddly, it will often appear to work if you use ping, but in reality you are pinging the IP address in the router namespace, not the VM.

We believe the following OVS commit it required for this to work properly:

http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=53e6421bc83918ac2d00ba5516f205fa7e394140

We are looking at creating a new stable release on the 1.4.x branch to include this change and plan to work with distros to get it pulled into their packages.

Related branches

dan wendlandt (danwent) on 2012-08-31
Changed in quantum:
milestone: none → folsom-rc1
assignee: nobody → dan wendlandt (danwent)
importance: Undecided → High
status: New → Confirmed
James Page (james-page) on 2012-09-03
Changed in openvswitch (Ubuntu):
importance: Undecided → Medium
dan wendlandt (danwent) wrote :

removing from milestone for quantum, as this is a purely packaging issue.

Changed in quantum:
milestone: folsom-rc1 → none
dan wendlandt (danwent) wrote :

To clarify my previous comment, this is not an issue with the packaging, it is a bug in OVS. I removed it from quantum RC1 as there was not code that needed to be committed to quantum to fix the issue, simply that distros needed to create new packages with the fix.

Again, here is the OVS change: http://openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=commitdiff;h=53e6421bc83918ac2d00ba5516f205fa7e394140

According to the OVS team, without the change, the connection tracking logic using by IPtables gets confused if a packet passes through multiple linux network namespaces on the same host. The reason for this confusion is that OVS is not properly clearing some of the fields in the skb header, meaning the connection tracking ignores this packet, so iptables functionality that relies on this (in particular DNAT and SNAT) do not work.

From the OVS commit message:

"It's possible that packets that are sent on internal devices (from
the OVS perspective) have already traversed the local IP stack.
After they go through the internal device, they will again travel
through the IP stack which may get confused by the presence of
existing information in the skb. The problem can be observed
when switching between namespaces. This clears out that information
to avoid problems but deliberately leaves other metadata alone.
This is to provide maximum flexibility in chaining together OVS
and other Linux components."

Unfortunately, this is actually fairly common given that the quantum l3-agent uses namespaces to implement multiple quantum "routers" on the same box, and uses IPtables within each namespace to perform SNAT/DNAT for "floating ips" and external network access. Above I mention why users are ALWAYS going to hit this bug in single node developer installs, since they typically test the reachability of VMs via floating IPs.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 1.4.2+git20120612-9ubuntu3

---------------
openvswitch (1.4.2+git20120612-9ubuntu3) quantal; urgency=low

  * debian/patches/lp1044318-Reset-upper-layer-protocol-info.patch: Cherry
    picked upstream patch to avoid critical issues with SNAT/DNAT when OVS
    is chained with other Linux components. May be dropped with 1.4.3 upload.
    (LP: #1044318)
 -- Adam Gandelman <email address hidden> Fri, 07 Sep 2012 12:12:03 -0700

Changed in openvswitch (Ubuntu):
status: New → Fix Released
description: updated
summary: - pre-1.5 OVS has trouble with floating ips when pinging from the same box
+ [SRU] pre-1.5 OVS has trouble with floating ips when pinging from the
+ same box
Changed in openvswitch (Ubuntu Precise):
status: New → Confirmed
dan wendlandt (danwent) wrote :

Thanks for the quick action on this folks. Let me know when something is available to test (not sure how 'Fix Committed' state corresponds to availability of new packages via apt).

dan wendlandt (danwent) on 2012-09-11
no longer affects: quantum
Bryce Harrington (bryce) on 2012-09-13
description: updated
Changed in openvswitch (Ubuntu Precise):
status: Confirmed → Triaged
importance: Undecided → Medium

Hello dan, or anyone else affected,

Accepted openvswitch into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/openvswitch/1.4.0-1ubuntu1.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in openvswitch (Ubuntu Precise):
status: Triaged → Fix Committed
tags: added: verification-needed
dan wendlandt (danwent) wrote :

I have confirmed that the updated packages solve the issue by doing a devstack install on standard precise, seeing the issue, upgrading the ovs packages manually via deb, rebooting the system to load the new kernel module, and then testing that the issue is no longer seen. thanks!

tags: added: verification-done
removed: verification-needed

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openvswitch - 1.4.0-1ubuntu1.3

---------------
openvswitch (1.4.0-1ubuntu1.3) precise-proposed; urgency=low

   * debian/patches/lp1044318-Reset-upper-layer-protocol-info.patch: Cherry
     picked upstream patch to avoid critical issues with SNAT/DNAT when OVS
     is chained with other Linux components. (LP: #1044318)
 -- Adam Gandelman <email address hidden> Fri, 07 Sep 2012 15:38:56 -0700

Changed in openvswitch (Ubuntu Precise):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers