Ubuntu

[SRU] Cloudpipe VPN instance can loose connectivity after starting openvpn

Reported by Cor Cornelisse on 2012-04-06
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Cor Cornelisse
Essex
Undecided
Unassigned
nova (Ubuntu)
Undecided
Unassigned
Precise
Undecided
Chuck Short

Bug Description

Cloudpipe openvpn tap has a high chance of not working in Essex

A fix for bug 921838 changed the MAC addresses generated by libvirt to start with a very high first octet.

A cloudpipe instance thus has a high numbered MAC address on eth0

Upon openvpn start, a tap interface will be joined to a bridge br0, together with eth0.

Since the tap MAC address is also randomly generated, and the eth0 MAC is on the high-end side of the MAC-Address address space, chances are very high the tap MAC address will turn out lower than the eth0 MAC address.

A feature of the linux kernel is, the bridge will take the lowest MAC address of all interfaces in the bridge to become the bridge MAC address.

As soon as openvpn is started in the cloudpipe instance, br0 will change its MAC to the tap MAC and after the old MAC address is gone from the ARP table connectivity is lost to the cloudpipe instance.

Since what the kernel is doing is completely valid, there are two possible approaches imho:

- Have libvirt generate a low numbered mac address for cloudpipe instances

or

- Make sure a second mac-address is randomly generated, higher than the eth0 MAC address, and specify it in the openvpn config using (lladdr)

I'll write up something to realize the latter (option 2)

Fix proposed to branch: master
Review: https://review.openstack.org/6327

Changed in nova:
assignee: nobody → Cor Cornelisse (corcornelisse)
status: New → In Progress
Changed in nova:
importance: Undecided → Low

Reviewed: https://review.openstack.org/6327
Committed: http://github.com/openstack/nova/commit/bc9f8d4fff225da6130691de2f2eea22215a4f17
Submitter: Jenkins
Branch: master

commit bc9f8d4fff225da6130691de2f2eea22215a4f17
Author: Cor Cornelisse <email address hidden>
Date: Fri Apr 6 15:54:16 2012 +0200

    Cloudpipe tap vpn not always working

    Fixes bug 975043

    Since Essex, all instances will have an eth0 MAC address in the range
    of FA:16:3E, which is near the end of the MAC address space.

    When openvpn is started, a TAP interface is created with a random
    generated MAC address. Chances are high the generated MAC address is
    lower in value than the eth0 MAC address. Once the tap interface is
    added to the bridge interface, the bridge interface will no longer have
    the eth0 MAC address, but take over the TAP MAC address. This is a
    feature of the linux kernel, whereby a bridge interface will take the
    MAC address with the lowest value amongst its interfaces. After the ARP
    entries expire, this will result in the cloudpipe instance being no
    longer reachable.

    This fix, randomly generates a MAC address starting with FA:17:3E, which
    is greater than FA, and will thus ensure the brige will keep the eth0 MAC
    address.

    Change-Id: I0bd994b6dc7a92738ed23cd62ee42a021fd394e2

Changed in nova:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/6832
Committed: http://github.com/openstack/nova/commit/7c64de95f422add711bcdf5821310435e7be0199
Submitter: Jenkins
Branch: stable/essex

commit 7c64de95f422add711bcdf5821310435e7be0199
Author: Cor Cornelisse <email address hidden>
Date: Fri Apr 6 15:54:16 2012 +0200

    Cloudpipe tap vpn not always working

    Fixes bug 975043

    Since Essex, all instances will have an eth0 MAC address in the range
    of FA:16:3E, which is near the end of the MAC address space.

    When openvpn is started, a TAP interface is created with a random
    generated MAC address. Chances are high the generated MAC address is
    lower in value than the eth0 MAC address. Once the tap interface is
    added to the bridge interface, the bridge interface will no longer have
    the eth0 MAC address, but take over the TAP MAC address. This is a
    feature of the linux kernel, whereby a bridge interface will take the
    MAC address with the lowest value amongst its interfaces. After the ARP
    entries expire, this will result in the cloudpipe instance being no
    longer reachable.

    This fix, randomly generates a MAC address starting with FA:17:3E, which
    is greater than FA, and will thus ensure the brige will keep the eth0 MAC
    address.

    Change-Id: I0bd994b6dc7a92738ed23cd62ee42a021fd394e2

tags: added: in-stable-essex
Devin Carlen (devcamcar) on 2012-05-22
Changed in nova:
milestone: none → folsom-1
Thierry Carrez (ttx) on 2012-05-23
Changed in nova:
status: Fix Committed → Fix Released
Chuck Short (zulcss) on 2012-05-30
Changed in nova (Ubuntu):
status: New → In Progress
Changed in nova (Ubuntu Precise):
status: New → In Progress
Chuck Short (zulcss) on 2012-06-05
summary: - Cloudpipe VPN instance can loose connectivity after starting openvpn
+ [SRU] Cloudpipe VPN instance can loose connectivity after starting
+ openvpn
Chuck Short (zulcss) wrote :

** Impact **

Since Essex, all instances will have an eth0 MAC address in the range of FA:16:3E, which is near the end of the MAC address space.

 When openvpn is started, a TAP interface is created with a random generated MAC address. Chances are high the generated MAC address is lower in value than the eth0 MAC address. Once the tap interface is added to the bridge interface, the bridge interface will no longer have the eth0 MAC address, but take over the TAP MAC address. This is a feature of the linux kernel, whereby a bridge interface will take the MAC address with the lowest value amongst its interfaces. After the ARP entries expire, this will result in the cloudpipe instance being no longer reachable.

This fix, randomly generates a MAC address starting with FA:17:3E, which is greater than FA, and will thus ensure the brige will keep the eth0 MAC address.

** Development Fix **

This fix has been addressed in https://review.openstack.org/6327 and already fixed in quantal.

** Stable Fix **

This fix has been addresses in the stable/essex tree in https://review.openstack.org/6327

** Regression Potential **

Minimal, it is a hardyly used feature in Openstack on Ubuntu.

Chuck Short (zulcss) on 2012-06-08
Changed in nova (Ubuntu Precise):
assignee: nobody → Chuck Short (zulcss)
milestone: none → ubuntu-12.04.1
Chuck Short (zulcss) wrote :

** Test Case **

This is not supported out of the box on Precise.

Hello Cor, or anyone else affected,

Accepted nova into precise-proposed. The package will build now and be available in a few hours. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you please change the bug tag from verification-needed to verification-done. If it does not, change the tag to verification-failed. In either case details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Adam Gandelman (gandelman-a) wrote :

Please find the attached Jenkins job results from the Ubuntu Server Team's CI
infrastructure. As part of the verification process for this bug, Nova has
been deployed and configured across multiple nodes using precise-proposed as
an installation source. After successful bring-up and configuration of the
cluster, a number of exercises and smoke tests have be invoked to ensure the
updated package did not introduce any regressions. A number of test iterations
were carried out to catch any possible transient errors.

Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the
Jenkins links in the comments of the relevant upstream code-review:

https://review.openstack.org/6832

As per the provisional Micro Release Exception granted to this package by
the Technical Board, we hope this contributes toward verification of this
update.

Dave Walker (davewalker) on 2012-07-03
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 2012.1+stable~20120612-3ee026e-0ubuntu1

---------------
nova (2012.1+stable~20120612-3ee026e-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1010473)
  * Dropped, superseeded by new snapshot:
    - debian/patches/upstream/0001-fix-bug-where-nova-ignores-glance-host-in-imageref.patch
    - debian/patches/upstream/0002-Stop-libvirt-test-from-deleting-instances-dir.patch
    - debian/patches/upstream/0003-Allow-unprivileged-RADOS-users-to-access-rbd-volumes.patch
    - debian/patches/upstream/0004-Fixed-bug-962840-added-a-test-case.patch
    - debian/patches/upstream/0005-Populate-image-properties-with-project_id-again.patch
    - debian/patches/upstream/0006-Use-project_id-in-ec2.cloud._format_image.patc
    - debian/patches/CVE-2012-2101.patch
    - debian/patches/CVE-2012-2654.patch
  * Resynchronize with stable/essex:
    - 3ee026e Only invoke .lower() on non-None protocols. (LP: #1010514)
    - f0a9f47 Create a utf8 version of the dns_domains table. (LP: #993663)
    - 84a43e1 Report memory correctly on Xen. (LP: #997014)
    - 8c72924 Add libvirt get_console_output tests: pty and file. (LP: #990237)
    - 4e423cd Fix Multi_Scheduler to process host capabilities. (LP: #1000403)
    - 4aea7f1 Nail pep8 dependencies to 1.0.1
    - 2b3bbc4 handle updated qemu-img info output. (LP: #1000261)
    - 2d7d51c Fix type of snapshot_id column to match db. (LP: #962615)
    - ec70c69 Generate a Changelog for Nova
    - e5e890f Fix nova.tests.test_nova_rootwrap on Fedora 17. (LP: #992916)
    - 9e9a554 Ec2 handle strings with "0x" (LP: #983206)
    - 26dc6b7 QuantumManager will start dnsmasq during startup. Fixes (LP: #977759)
    - 7028d66 Introduced flag base_dir_name. (LP: #973194)
    - 76b525a Get unit tests functional in OS X.
    - facb936 Update KillFilter to handle 'deleted' exe's. (LP: #967931)
    - 1209af4 Checks if value is string or not before decode. (LP: #952176)
    - 1209af4 Fix timeout in EC2 CloudController.create_image(). (LP: #989764)
    - 108e74b Re-add console_log from console_console_output(). (LP: #987335)
    - 48a0768 Don't leak RPC connections on timeouts or other exceptions. (LP: #968843)
    - 7c64de9 Cloudpipe tap vpn not always working. (LP: #975043)
    - 5ab5051 add libvirt_inject_key flag fix (LP: #971640)
    - 6c68ef5 Xen: Pass session to destroy_vdi. (LP: #988615)
    - 015744e Delete fixed_ips when network is deleted. (LP: #754900)
  * Add debian/scripts/changelog.sh to help generate the changelog.
  * Add debian/nova-common.docs:
    - Include changelog and README.rst
  * debian/rules: Generate a tarball from git snapshot.
  * debian/patches/fix-pep8-errors.patch: Fix pep8 errors due to pep8 upstream
    migration.
 -- Chuck Short <email address hidden> Tue, 05 Jun 2012 09:50:59 -0400

Changed in nova (Ubuntu Precise):
status: Fix Committed → Fix Released
Chuck Short (zulcss) on 2012-07-31
Changed in nova (Ubuntu):
status: In Progress → Fix Released
Thierry Carrez (ttx) on 2012-09-27
Changed in nova:
milestone: folsom-1 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers