precise daily image fails (release works)

Bug #1582410 reported by LaMont Jones
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
curtin
Medium
Unassigned
maas-images
High
Unassigned
curtin (Ubuntu)
Medium
Unassigned
Trusty
High
Unassigned

Bug Description

Testing maas 2.0, I told it to deploy precise, and it failed. We eventually isolated it to "releases works consistently, daily image fails consistently."

The installed-but-not-quite-deployed host has ssh but no host keys, waits for 120 seconds in cloud-init-nonet before telling me the (correct) network config. Below is the network portion of the curtin config for the vm.

network:
  config:
  - id: ens3
    mac_address: 40:98:e4:1f:66:8f
    mtu: 1500
    name: ens3
    subnets:
    - address: 172.18.0.16/22
      dns_nameservers: []
      gateway: 172.18.0.1
      type: static
    type: physical
  - address: 192.168.133.18
    search:
    - maas.wfg-office
    type: nameserver
  version: 1
network_commands:
  builtin:
  - curtin
  - net-meta
  - custom

Related branches

Revision history for this message
Ryan Harper (raharper) wrote :

To confirm, can you indicate the release date of your precise daily? My recent checking shows a 20160509 as the latest precise release image, and goes back to 20160502; do any of those images work or all fail?

Revision history for this message
Ryan Harper (raharper) wrote :

It looks like the precise daily image includes the following:

/etc/network# cat interfaces
auto lo
iface lo inet loopback
    dns-nameservers 192.168.133.18
    dns-search maas.wfg-office

auto ens3
iface ens3 inet static
    gateway 172.18.0.1
    address 172.18.0.16/22
    mtu 1500

source /etc/network/interfaces.d/*.cfg

And still in the daily image we also have this:

# cat eth0.cfg
# The primary network interface
auto eth0
iface eth0 inet dhcp

Which is the interface that doesn't come up. This causes cloud-init to hang waiting for eth0.

cloud-init I believe has some logic to detect this file and remove it; let me see if that's the case.

As a workaround, if you can tell maas to name the interface 'eth0' instead of ens3; this will prevent cloud-init from waiting for a second nic.

Revision history for this message
Ryan Harper (raharper) wrote :

So in precise, we have cloud-init 0.6.3-0ubuntu1.25, which does not appear to include the logic to remove this file (this is in newer cloud-init which supports writing out networking config).

Oddly, the release image, includes cloud-init 0.7.0; which is newer but not new enough to support the network config which also will remove this legacy /etc/network/interfaces.d/eth0.cfg if supplied with a network config.

I suspect that the deployment with release; the network config used has the nic name set to eth0.

This can be fixed in a few ways; we can decide which one's best here.

1. The maas image for precise (release and daily) can skip inclusion of this file; this would be a change on the image build side. This would likely break older maas deployments where maas isn't modeling networking nor using a curtin new enough to generate the network config.

2. newer maas can inject a curthooks entry to remove /etc/network/interfaces.d/eth0.cfg if it exists and matches the known configuration

Basically a terse version of cloud-init's _maybe_remove_legacy_eth0().

3. curtin can run this _maybe_remove_legacy_eth0() function if it knows it's writing a network configuration.

I think _3_ is the best option; but it requires updating curtin. _2_ works without changing curtin, but then requires a maas update. _1_ I don't think is ideal here since it may break older maas installs.

Separately; it's quite curious that the *daily* image includes newer cloud-init than the released version. We need to file a separate bug for that.

Changed in curtin:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Scott Moser (smoser) wrote :

ok. so the lack of 0.7.0 cloud-init in precise is a regression in the maas-image build process and we need to fix that. the change in /etc/network/interfaces is similar and I'll take a look.

The precise maas images build with a ppa https://launchpad.net/~maas-maintainers/+archive/ubuntu/maas-ephemeral-images that has the newer version of cloud-init, so that is where that comes from. You can see in the description there what all is there and why it is there (cloud-init and maas-enlist at this point).

Scott Moser (smoser)
Changed in maas-images:
status: New → Confirmed
importance: Undecided → High
status: Confirmed → In Progress
Scott Moser (smoser)
Changed in maas-images:
status: In Progress → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

maas images for precise newer than 20160517 will have cloud-init from the ppa again.

Now, in reply to comment 3.

> So in precise, we have cloud-init 0.6.3-0ubuntu1.25, which does not appear to
> include the logic to remove this file (this is in newer cloud-init which
> supports writing out networking config).

Even then, when curtin disables cloud-init networking, it cloud-init's removal
of this file also.

> Oddly, the release image, includes cloud-init 0.7.0; which is newer but not
> new enough to support the network config which also will remove this legacy
> /etc/network/interfaces.d/eth0.cfg if supplied with a network config.

see
https://launchpad.net/~maas-maintainers/+archive/ubuntu/maas-ephemeral-images
for what and why, but basically precise released before maas and maas support
was somewhat bolted on via this archive.

> I suspect that the deployment with release; the network config used has the
> nic name set to eth0.

Or maas used a curtin < revision 382. Which would not write 'source *.cfg'
and would thus ignore the eth0.cfg that was in it. I can't come up with a
way that 2 identical deploys would otherwise behave differently on this.

> This can be fixed in a few ways; we can decide which one's best here.
>
> 1. The maas image for precise (release and daily) can skip inclusion of this
> file; this would be a change on the image build side. This would likely break
> older maas deployments where maas isn't modeling networking nor using a
> curtin new enough to generate the network config.
>
> 2. newer maas can inject a curthooks entry to remove
> /etc/network/interfaces.d/eth0.cfg if it exists and matches the known
> configuration
>
> Basically a terse version of cloud-init's _maybe_remove_legacy_eth0().
>
> 3. curtin can run this _maybe_remove_legacy_eth0() function if it knows it's
> writing a network configuration.
>
> I think _3_ is the best option; but it requires updating curtin. _2_ works
> without changing curtin, but then requires a maas update. _1_ I don't think
> is ideal here since it may break older maas installs.

I think 3 is the best option. Its safe to remove a file named eth0.cfg
that has known build-output content. That is what cloud-init does. If
the user intended to place an 'eth0.cfg' file and have it respected, they
have to any one of:
 a.) name it differently (local-eth0.cfg)
 b.) adding a trailing blank line.

cloud-init's _maybe_remove_legacy_eth0 will only remove the file if
its contents with lines starting with '#' removed is exactly:
 auto eth0
 iface eth0 inet dhcp

so just adding
 <space># asdf

its not the prettiest work around, but 'a' will allow effectively the same thing, and this is not an expected path.

Revision history for this message
Scott Moser (smoser) wrote :

The next precise image (anything named newer than 20160518) will have the new cloud-init version in it again. I've requested a build, and sometime in the next 8 hours or so we should get a 20160519 of precise.

Revision history for this message
Scott Moser (smoser) wrote :

$ s=20160519.1
$ wget http://images.maas.io/ephemeral-v2/daily/precise/amd64/$s/root-image.gz -O - | tee precise-root-image-$s.gz | zcat > precise-root-image-$s
$ sudo mount-image-callback "precise-root-image-$s" --read-only -- \
    chroot _MOUNTPOINT_ sh -c 'cat /etc/cloud/build.info; dpkg-query --show cloud-init'
build_name: server
serial: 20160519.1
cloud-init 0.7.0-0ubuntu2

So the maas image build is now fixed to get cloud-init correctly from the ppa.

Revision history for this message
Scott Moser (smoser) wrote :

i'm marking this fix-released for maas-images as we have a released image with it (as described in bug) and lateest daily has it.

Changed in maas-images:
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

fix-commited in trunk at revno 389

Changed in curtin:
status: Triaged → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

Locally with my maas at 1.9.1+bzr4543-0ubuntu1~trusty1 i installed precise image with 20160519.1

Revision history for this message
Scott Moser (smoser) wrote :

as listed in bug 1588706, revno 389 needs to get pushed so that the eth0.cfg that is in images does not cause havoc.

Scott Moser (smoser)
Changed in curtin (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr389-0ubuntu1

---------------
curtin (0.1.0~bzr389-0ubuntu1) yakkety; urgency=medium

  * New upstream snapshot.
    * Detect and remove legacy /etc/network/interfaces.d/eth0.cfg from
      target (LP: #1582410)

 -- Scott Moser <email address hidden> Fri, 03 Jun 2016 09:34:17 -0400

Changed in curtin (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Hrvoje (hrvoje-habjanic) wrote :

Hi.

Would it be possible to push this to Trusty also? I'm "victim" of this bug also.

Regards,

H.

Revision history for this message
Chris J Arges (arges) wrote : Please test proposed package

Hello LaMont, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr389-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
tags: added: 4010
Revision history for this message
Scott Moser (smoser) wrote :

This is fixed in trusty sru at
https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1577872
It is currently in proposed.

Changed in curtin (Ubuntu Trusty):
importance: Undecided → High
status: New → Fix Committed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This was released, clearing from backlog.

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers