[SRU] Cloud Images do not bring up networking w/ certain virtual NICs due to device naming rules

Bug #1510345 reported by Laurence Rowe
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-images
Fix Released
Critical
Unassigned
cloud-init (Ubuntu)
Fix Released
High
Scott Moser
Xenial
Fix Released
High
Scott Moser
livecd-rootfs (Ubuntu)
Fix Released
Medium
Unassigned
Wily
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
Unassigned

Bug Description

SRU Justification

[IMPACT] Cloud images produced by livecd-rootfs are not accessable when presented with certain NICS such as ixgbevf used on HVM instances for AWS.

[CAUSE] Changes in default device naming in 15.10 causes some devices to be named at boot time and are not predicatable, i.e. instead of "eth0" being the first NIC, "ens3" might be used.

[FIX] Boot instances with "net.ifnames=0". This change reverts to the old device naming conventions. As a fix, this is the most appropriate since the cloud images configure the first NIC for DHCP.

[TEST CASE1]:
- Build image from -proposed
- Boot image in KVM, i.e:
  $ qemu-system-x86_64 \
   -smp 2 -m 1024 -machine accel=kvm \
   -drive file=build.img,if=virtio,bus=0,cache=unsafe,unit=0,snapshot=on \
   -net nic,model=rtl8139
- Confirm that image has "eth0"

[TEST CASE2]:
- Build image from -proposed
- Publish image to AWS as HVM w/ SRIOV enabled
- Confirm that instance boots and is accessable via SSH

[ORIGINAL REPORT]

I've made several attempts to launch a c4.xlarge and c4.8xlarge instances using Ubuntu 15.10 Wily but am unable to ping the instance after it has started running. The console shows that the instance reachability check failed.

I am able to successfully launch c4.xlarge instances using Ubuntu 14.04 and t2.large instances using Ubuntu 15.10.

I've tried with both of these instance AMIs:

ubuntu/images/hvm-ssd/ubuntu-wily-15.10-amd64-server-20151021 - ami-225ebd11
ubuntu/images-testing/hvm-ssd/ubuntu-wily-daily-amd64-server-20151026 - ami-ea20cdd9

Might there be a problem with the Ubuntu Kernel in 15.10 for the c4 instances?

Looking at the system log it seems that the network never comes up:

[ 140.699509] cloud-init[1469]: 2015-10-26 20:45:49,887 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [0/120s]: request error [('Connection aborted.', OSError(101, 'Network is unreachable'))]

Thread at AWS forums: https://forums.aws.amazon.com/thread.jspa?threadID=218656

Related branches

Revision history for this message
Laurence Rowe (lrowe) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1510345/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Scott Moser (smoser)
Changed in ubuntu:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Ben Howard (utlemming)
tags: added: cloud-images
Robert C Jennings (rcj)
affects: ubuntu → ubuntu-on-ec2
Changed in ubuntu-on-ec2:
assignee: Ben Howard (utlemming) → nobody
importance: Critical → Undecided
status: Confirmed → New
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

This is fall out from the persistent device naming udev fun.

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

I've uploaded a fix to our build PPA and will get a test image out ASAP. The builder PPA is couple hours out.

Changed in ubuntu-on-ec2:
assignee: nobody → Ben Howard (utlemming)
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Once fix is confirmed I'll submit livecd-rootfs for SRU.

Revision history for this message
Scott Moser (smoser) wrote :

I think this is the right fix for the moment.
It does have potential fallout, though, as nics are going to change names for people between one image and the next (new instances). and that ripples through to maas installs also.

that said, those instances would have had serious issues, so it seems the right fix. just wish we'd done it earlier.

in 16.04, i hope to fix this better in cloud-init , where somethign smarter will configure initial networking, and would have handled ens3 being the only device.

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Confirmed fix via KVM instances [1, 2]. Image is working its way through automated daily publication and testing. I'll manually confirm it before release. I booted a prior version using an "rtl8139" NIC and got ens3, whereas with the new build yielded eth0.

ETA is likely EOD US Mountain time.

[1] http://cloud-images.ubuntu.com/wily/20151029/wily-server-cloudimg-amd64-disk1.img
[2]
$ qemu-system-x86_64 \
   -smp 2 \
   -m 1024 \
   -machine accel=kvm \
   -drive file=wily-server-cloudimg-amd64-disk1.img,if=virtio,bus=0,cache=unsafe,unit=0,snapshot=on \
   -cdrom seed.iso} \
   -net nic,model=rtl8139 \
   -net user

Changed in ubuntu-on-ec2:
status: Confirmed → In Progress
Changed in livecd-rootfs (Ubuntu):
assignee: nobody → Ben Howard (utlemming)
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Submitted merge proposal for livecd-rootfs and uploaded proposed SRU.

description: updated
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :
Changed in livecd-rootfs (Ubuntu Wily):
assignee: nobody → Ben Howard (utlemming)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.350

---------------
livecd-rootfs (2.350) xenial; urgency=medium

  [ Oliver Grawert ]
  * bump UID for tss user in snappy, else it matches dnsmasq and bad things
    happen
  * fix handling of writable files in /etc/default for snappy

  [ Ben Howard ]
  * Cloud Images: disable new NIC naming convention (LP: #1510345).

 -- Colin Watson <email address hidden> Thu, 29 Oct 2015 14:25:28 +0000

Changed in livecd-rootfs (Ubuntu Xenial):
status: In Progress → Fix Released
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

I've made "ami-614c3c0b" (HVM instance store) public early for validation. I've confirmed that the ixgbevf driver is properly identified as eth0. Its going to be few more hours as the publication continues.

ubuntu@ip-10-0-129-209:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 15.10
Release: 15.10
Codename: wily

ubuntu@ip-10-0-129-209:~$ cat /etc/cloud/build.info
build_name: server
serial: 20151029

ubuntu@ip-10-0-129-209:~$ dmesg | grep ixgbe
[ 1.639387] ixgbevf: Intel(R) 10 Gigabit PCI Express Virtual Function Network Driver - version 2.12.1-k
[ 1.641374] ixgbevf: Copyright (c) 2009 - 2012 Intel Corporation.
[ 1.660704] ixgbevf 0000:00:03.0: 12:57:b0:c0:b3:15
[ 1.661828] ixgbevf 0000:00:03.0: MAC: 1
[ 1.662849] ixgbevf 0000:00:03.0: Intel(R) 82599 Virtual Function
[ 11.984847] ixgbevf 0000:00:03.0: NIC Link is Up 10 Gbps

ubuntu@ip-10-0-129-209:~$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr 12:57:b0:c0:b3:15
          inet addr:10.0.129.209 Bcast:10.0.129.255 Mask:255.255.255.0
          inet6 addr: fe80::1057:b0ff:fec0:b315/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1
          RX packets:61828 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2853 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:92201765 (92.2 MB) TX bytes:251326 (251.3 KB)

ubuntu@ip-10-0-129-209:~$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.2.0-16-generic root=LABEL=cloudimg-rootfs net.ifnames=0

summary: - Network is unreachable Wily ec2 c4 instance family
+ [SRU] Cloud Images do not bring up networking w/ certain virtual NICs
+ due to device naming rules
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Dailies for Ubuntu Cloud Images have finished publishing at [1].

Confirmed that c4.xlarge instances on AWS get networking, tested on GCE and Azure builds as well. Automated testing preliminary results looks promising, so I've started the promotion process. Pending the results of testing (eta ~2 hours or so), this build will be promoted to a release.

[1] http://cloud-images.ubuntu.com/wily/20151029/

Changed in livecd-rootfs (Ubuntu Wily):
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

New 15.10 images have been released to [1]. I've updated the AWS forum as well.

The SRU aspect is bringing the fix into livecd-rootfs which is used to build the images; this package is not in the image itself. For the time being Cloud Images are being built using a PPA until the fix lands in -updates.

Changed in ubuntu-on-ec2:
status: In Progress → Fix Released
Revision history for this message
Colin Watson (cjwatson) wrote : Please test proposed package

Hello Laurence, or anyone else affected,

Accepted livecd-rootfs into wily-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/livecd-rootfs/2.349.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in livecd-rootfs (Ubuntu Wily):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Verified. Good for release.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Martin Pitt (pitti) wrote :

Setting net.ifnames=0 in livecd-rootfs was a quick hack for wily, but this is really inconvenient. This means that the cloud images now stop being usable for non-cloud workloads (or even for cloud workloads with multiple NICs).

The proper fix would be for cloud-init to generate a network config on first boot, for the first /sys/class/e* interface, instead of shipping a hardcoded /etc/network/interfaces.d/eth0.cfg. Can we please do this for xenial?

no longer affects: cloud-init (Ubuntu Wily)
Changed in cloud-init (Ubuntu Xenial):
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package livecd-rootfs - 2.349.1

---------------
livecd-rootfs (2.349.1) wily; urgency=medium

  * cloud-images: revert to old device naming (LP: #1510345).

 -- Ben Howard <email address hidden> Thu, 29 Oct 2015 11:29:01 -0500

Changed in livecd-rootfs (Ubuntu Wily):
status: Fix Committed → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote : Update Released

The verification of the Stable Release Update for livecd-rootfs has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

tags: added: patch
Martin Pitt (pitti)
tags: added: rls-x-incoming
Scott Moser (smoser)
Changed in cloud-init (Ubuntu Xenial):
importance: Undecided → High
assignee: nobody → Scott Moser (smoser)
Revision history for this message
Will Buckner (willbuckner) wrote :

Pardon me if I'm missing something, but I just launched a new c4.xlarge instance using AMI ami-47723b2d from https://cloud-images.ubuntu.com/wily/20151209/ .

The result was identical to Lawrence Rowe's original report. Same log output, same interface names, no eth0. This bug report indicates that this was fixed and released over a month ago, but the current cloud images are still broken. Am I using the wrong image, or is there more information I should provide to try to resolve this? Wiley is completely unusable to me on EC2 currently.

Revision history for this message
Paul Buonopane (zenexer) wrote :
Revision history for this message
Mian (hou-m) wrote :

Some of the newest Amazon EC2 Published AMIs at [1], for example, ubuntu-wily-15.10-amd64-server-20151219 AMIs: ami-38796759 (us-west-2) & ami-94683ffe (us-east-1) only successfully boot for instance types that don't have enhanced networking enabled (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html).

So attempting to launch a c4.xlarge for example will fail/hung with the following logged:

[ 63.181996] cloud-init[932]: ci-info: +++++++++++++++++++++++++++++++++++++++Net device info+++++++++++++++++++++++++++++++++++++++
[ 63.182482] cloud-init[932]: ci-info: +--------+-------+------------------------------+---------------+-------+-------------------+
[ 63.182886] cloud-init[932]: ci-info: | Device | Up | Address | Mask | Scope | Hw-Address |
[ 63.183281] cloud-init[932]: ci-info: +--------+-------+------------------------------+---------------+-------+-------------------+
[ 63.183664] cloud-init[932]: ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . | . |
[ 63.184172] cloud-init[932]: ci-info: | lo | True | ::1/128 | . | host | . |
[ 63.184482] cloud-init[932]: ci-info: | lxcbr0 | True | 10.0.3.1 | 255.255.255.0 | . | 5a:27:38:06:7c:a3 |
[ 63.184871] cloud-init[932]: ci-info: | lxcbr0 | True | fe80::5827:38ff:fe06:7ca3/64 | . | link | 5a:27:38:06:7c:a3 |
[ 63.185163] cloud-init[932]: ci-info: | ens3 | False | . | . | . | 0a:f9:a1:59:f3:1b |
[ 63.185335] cloud-init[932]: ci-info: +--------+-------+------------------------------+---------------+-------+-------------------+
[ 63.185502] cloud-init[932]: ci-info: +++++++++++++++++++++++++++Route IPv4 info+++++++++++++++++++++++++++
[ 63.186068] cloud-init[932]: ci-info: +-------+-------------+---------+---------------+-----------+-------+
[ 63.186556] cloud-init[932]: ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags |
[ 63.186997] cloud-init[932]: ci-info: +-------+-------------+---------+---------------+-----------+-------+
[ 63.187380] cloud-init[932]: ci-info: | 0 | 10.0.3.0 | 0.0.0.0 | 255.255.255.0 | lxcbr0 | U |
[ 63.187770] cloud-init[932]: ci-info: +-------+-------------+---------+---------------+-----------+-------+
[ 63.188186] cloud-init[932]: 2015-12-24 04:57:59,899 - url_helper.py[WARNING]: Calling 'http://169.254.169.254/2009-04-04/meta-data/instance-id&#39; failed [0/120s]: request error [('Connection aborted.', OSError(101, 'Network is unreachable'))]

[1] http://cloud-images.ubuntu.com/releases/wily/release/

Revision history for this message
Andrew Lau (alau) wrote :

Do we have any updates for fixing these AMIs for instance types that have enhanced networking enabled?

Revision history for this message
Andrew Lau (alau) wrote :

OK it looks like the bug was fixed for the hvm:ebs-ssd (General Purpose SSD) AMI builds and not hvm:ebs (standard/magnetic EBS).

For example, in ubuntu-wily-15.10-amd64-server-20160105.1 for us-east-1:

* ami-bfb5ead5 ( hvm:ebs ) - FAILS
* ami-15b7e87f ( hvm:ebs-ssd ) - WORKS

for enhanced networking instance launches.

Revision history for this message
Andrew Lau (alau) wrote :

Looks like I was premature in my last reply.
ami-bfb5ead5 just came up. I'll have to double check this bug report with one of our customers.

Revision history for this message
Scott Moser (smoser) wrote :

this is fix-released from cloud-init perspective in xenial.
If you disagree, please state why and open the bug. cloud-init now renders .rules files and configures ENI per the datasource's provided network config or a fallback network config.

no longer affects: cloud-init
Changed in cloud-init (Ubuntu):
status: Triaged → Fix Released
Changed in cloud-init (Ubuntu Xenial):
status: Triaged → Fix Released
Mathew Hodson (mhodson)
affects: ubuntu-on-ec2 → cloud-images
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.