cloud-init does not use interfaces.d in trusty

Bug #1315501 reported by David Moreau Simard on 2014-05-02
194
This bug affects 40 people
Affects Status Importance Assigned to Milestone
cloud-init
High
Unassigned
cloud-init (Ubuntu)
High
Unassigned

Bug Description

Hi,

Reference/context: https://ask.openstack.org/en/question/28297/cloud-init-nonet-waiting-and-fails/

The trusty image provided by http://cloud-images.ubuntu.com/trusty/ contains an eth0 interface configured as dhcp in /etc/network/interfaces.d/eth0.cfg.
When I boot this image in an Openstack non-dhcp networking environment, cloud-init configures the static IP provided by Neutron directly in /etc/network/interfaces (not interfaces.d).

This means I now have two eth0 devices configured, in two different files.
Booting 20 VMs with the same image yields around 50-60% of VMs that are not reachable by network.

Soft rebooting a VM in this state or doing and "ifdown eth0 && ifup eth0" will make it ping.

I removed the the eth0 interface file in /etc/network/interfaces.d/eth0.cfg from the image, booted another round of VMs and all of them worked fine.

Now, I see three possible outcomes:
- If eth0 is present in /etc/network/interfaces.d, cloud-init configures/re-configures that interface
- If eth0 is present in /etc/network/interfaces.d, cloud-init deletes it and configures /etc/network/interfaces
- Ubuntu cloud images ships without eth0 being configured by default

description: updated
Scott Moser (smoser) on 2014-05-13
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → High
Changed in cloud-init (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Scott Moser (smoser) wrote :

Your 'ask' comment at
  https://ask.openstack.org/en/question/28297/cloud-init-nonet-waiting-and-fails/

/etc/network/interfaces shows:
  # Injected by Nova on instance boot

If that entry was "injected" by nova, then there really isnt much or anything cloud-init can do about this.
This is a good example about why host "injection" is inherently flawed. The right fix for your problem is then to have nova realize that 'eth0' already existed, and remove /etc/interfaces.d/. That is clearly brittle and requires updating your hypervisor/cloud which is quite unreasonable.

if cloud-init reads network-interfaces from config drive, then it should handle eth0 correctl (ie, we need to fix that).

David Moreau Simard (dmsimard) wrote :

Hi Scott,

You're correct, we use config drive in our implementation (using_config_drive=True for nova-compute).
The template used by nova is located here: https://github.com/openstack/nova/blob/master/nova/virt/interfaces.template

I also see your point that Nova could also implement a fix for this. I will check with them and refer to this bug.

David Moreau Simard (dmsimard) wrote :

Adding the openstack nova bug for cross reference: https://bugs.launchpad.net/nova/+bug/1319117

Scott Moser (smoser) wrote :

config drive != injection.

if nova "injected" the file (placed it in /etc/network/interfaces) then there isn't anything I can do. Essentially, "nova broke your image".

If nova placed the /etc/network/interfaces file on to a config drive, and cloud-init read it and wrote /etc/network/interfaces then cloud-init was at fault.

Both paths are possible. I generally believe the first one to be simply wrong and don't care if it is broken, the answer is "don't inject files into an image".

We'll fix the second path in cloud-init.

Zoltan Arnold Nagy (zoltan) wrote :

Any ETA on a fix for cloud-init?

Tobias (tobik) wrote :

Any update on that issue? Is there anybody who can suggest a work around?
Thank you!

Any workaround or ETA please?

Vikram Hosakote (vhosakot) wrote :

Is there any workaround for this bug ?

eth0 does not get its IP and the VM cannot be pinged or SSH'ed into.

cloud-init-nonet[13.74]: waiting 120 seconds for network device
cloud-init-nonet[133.74]: gave up waiting for a network device.
Cloud-init v. 0.7.5 running 'init' at Mon, 29 Dec 2014 17:33:38 +0000. Up 133.89 seconds.
ci-info: +++++++++++++++++++++++Net device info++++++++++++++++++++++++
ci-info: +--------+-------+-----------+-----------+-------------------+
ci-info: | Device | Up | Address | Mask | Hw-Address |
ci-info: +--------+-------+-----------+-----------+-------------------+
ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | . |
ci-info: | eth0 | True | . | . | fa:16:3e:e9:35:31 |
ci-info: +--------+-------+-----------+-----------+-------------------+
ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

i got same error on new instalation with Ubuntu 14.04.

Any ETA on a fix for cloud-init?

Joshua Holmes (ethode) wrote :

I've tried 12.04 and have the same issue

David Moreau Simard (dmsimard) wrote :

Lots of people asking for workarounds, I figured I would post mine - it involves deleting the eth0 file from /etc/network/interfaces.d prior to using the image. So it goes a bit like this:

# Install qemu-utils
apt-get install qemu-utils

# Mount the Ubuntu cloud image
modprobe nbd max_part=63
qemu-nbd -c /dev/nbd0 trusty-server-cloudimg-amd64-disk1.img
mount /dev/nbd0p1 /mnt/
rm /mnt/etc/network/interaces.d/eth0
umount /mnt

# Upload and use the edited image

Bobby Yakovich (bgyako) wrote :

That would require access to instance, how do you get into instance?

David Moreau Simard (dmsimard) wrote :

@Bobby

My workaround is applied to the image the instance is spawned with. You must modify the image, upload that modified image and you will be able to successfully boot VMs without this issue.

If you do not have access to do that, the only other way I would see would be to spawn a VM using the image with this problem but pass user-data to cloud-init to grant you console access, a bit like this:

#cloud-config
users:
  - name: root
    lock-passwd: false

chpasswd:
  list: |
    root:root
  expire: false

This will allow you to login as root (with password 'root') in the VM console where you'll be able to either delete the extra eth0 file or diagnose the problem from there.

mahmoh (mahmoh) wrote :

@smoser, hi any outlook on the fix for this? Thanks.

Scott Moser (smoser) wrote :

marking fix-commited in cloud-init as revno 1225.
fix released in yakkety images and will make it back to xenial via sru.

Changed in cloud-init:
status: Confirmed → Fix Committed
Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Released
Debdiptaghosh (debdipta1078) wrote :

@Scott

In which build this issue is fix?
In https://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64-disk1.img
I have found same issue

Scott Moser (smoser) wrote :

This is fixed in cloud-init 0.7.7

Changed in cloud-init:
status: Fix Committed → Fix Released
Roy Zuo (roylez) wrote :

@Scott,

Could you please backport the fix to trusty image? Trusty image is still widely used, and this still affects many until it is fixed in trusty cloud image.

Scott Moser (smoser) wrote :

Roy,
I'm attaching a suggested patch for trusty. I've not tested it, and my only confidence in it is that the changes are down a very particular path (where openstack provides network config).

If you're interested in pushing this a bit further, please grab that patch and build and give it a try.

This bug was originally reported for Trusty, and my organization is experiencing it with Trusty. I would note that 14.04 LTS still has two years left in its support cycle. Do you plan on releasing a fix for this?

Scott Moser (smoser) wrote :

@Alex,
sorry for the slow reply.

Servicing 14.04 could still be done, but doing so would really require some effort on the person wanting bug fixes in.

I had intended in comment 19 to point at my branch https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+ref/bug/1315501-trusty-openstack-interfaces but failed to do so.

Testing that would improve the potential of this getting backported to 14.04.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers