precise daily image fails (release works)

Bug #1582410 reported by LaMont Jones on 2016-05-16
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
curtin
Medium
Unassigned
maas-images
High
Unassigned
curtin (Ubuntu)
Medium
Unassigned
Trusty
High
Unassigned

Bug Description

Testing maas 2.0, I told it to deploy precise, and it failed. We eventually isolated it to "releases works consistently, daily image fails consistently."

The installed-but-not-quite-deployed host has ssh but no host keys, waits for 120 seconds in cloud-init-nonet before telling me the (correct) network config. Below is the network portion of the curtin config for the vm.

network:
  config:
  - id: ens3
    mac_address: 40:98:e4:1f:66:8f
    mtu: 1500
    name: ens3
    subnets:
    - address: 172.18.0.16/22
      dns_nameservers: []
      gateway: 172.18.0.1
      type: static
    type: physical
  - address: 192.168.133.18
    search:
    - maas.wfg-office
    type: nameserver
  version: 1
network_commands:
  builtin:
  - curtin
  - net-meta
  - custom

Related branches

Ryan Harper (raharper) wrote :

To confirm, can you indicate the release date of your precise daily? My recent checking shows a 20160509 as the latest precise release image, and goes back to 20160502; do any of those images work or all fail?

Ryan Harper (raharper) wrote :

It looks like the precise daily image includes the following:

/etc/network# cat interfaces
auto lo
iface lo inet loopback
    dns-nameservers 192.168.133.18
    dns-search maas.wfg-office

auto ens3
iface ens3 inet static
    gateway 172.18.0.1
    address 172.18.0.16/22
    mtu 1500

source /etc/network/interfaces.d/*.cfg

And still in the daily image we also have this:

# cat eth0.cfg
# The primary network interface
auto eth0
iface eth0 inet dhcp

Which is the interface that doesn't come up. This causes cloud-init to hang waiting for eth0.

cloud-init I believe has some logic to detect this file and remove it; let me see if that's the case.

As a workaround, if you can tell maas to name the interface 'eth0' instead of ens3; this will prevent cloud-init from waiting for a second nic.

Ryan Harper (raharper) wrote :

So in precise, we have cloud-init 0.6.3-0ubuntu1.25, which does not appear to include the logic to remove this file (this is in newer cloud-init which supports writing out networking config).

Oddly, the release image, includes cloud-init 0.7.0; which is newer but not new enough to support the network config which also will remove this legacy /etc/network/interfaces.d/eth0.cfg if supplied with a network config.

I suspect that the deployment with release; the network config used has the nic name set to eth0.

This can be fixed in a few ways; we can decide which one's best here.

1. The maas image for precise (release and daily) can skip inclusion of this file; this would be a change on the image build side. This would likely break older maas deployments where maas isn't modeling networking nor using a curtin new enough to generate the network config.

2. newer maas can inject a curthooks entry to remove /etc/network/interfaces.d/eth0.cfg if it exists and matches the known configuration

Basically a terse version of cloud-init's _maybe_remove_legacy_eth0().

3. curtin can run this _maybe_remove_legacy_eth0() function if it knows it's writing a network configuration.

I think _3_ is the best option; but it requires updating curtin. _2_ works without changing curtin, but then requires a maas update. _1_ I don't think is ideal here since it may break older maas installs.

Separately; it's quite curious that the *daily* image includes newer cloud-init than the released version. We need to file a separate bug for that.

Changed in curtin:
importance: Undecided → Medium
status: New → Triaged
Scott Moser (smoser) wrote :

ok. so the lack of 0.7.0 cloud-init in precise is a regression in the maas-image build process and we need to fix that. the change in /etc/network/interfaces is similar and I'll take a look.

The precise maas images build with a ppa https://launchpad.net/~maas-maintainers/+archive/ubuntu/maas-ephemeral-images that has the newer version of cloud-init, so that is where that comes from. You can see in the description there what all is there and why it is there (cloud-init and maas-enlist at this point).

Scott Moser (smoser) on 2016-05-17
Changed in maas-images:
status: New → Confirmed
importance: Undecided → High
status: Confirmed → In Progress
Scott Moser (smoser) on 2016-05-17
Changed in maas-images:
status: In Progress → Fix Committed
Scott Moser (smoser) wrote :

maas images for precise newer than 20160517 will have cloud-init from the ppa again.

Now, in reply to comment 3.

> So in precise, we have cloud-init 0.6.3-0ubuntu1.25, which does not appear to
> include the logic to remove this file (this is in newer cloud-init which
> supports writing out networking config).

Even then, when curtin disables cloud-init networking, it cloud-init's removal
of this file also.

> Oddly, the release image, includes cloud-init 0.7.0; which is newer but not
> new enough to support the network config which also will remove this legacy
> /etc/network/interfaces.d/eth0.cfg if supplied with a network config.

see
https://launchpad.net/~maas-maintainers/+archive/ubuntu/maas-ephemeral-images
for what and why, but basically precise released before maas and maas support
was somewhat bolted on via this archive.

> I suspect that the deployment with release; the network config used has the
> nic name set to eth0.

Or maas used a curtin < revision 382. Which would not write 'source *.cfg'
and would thus ignore the eth0.cfg that was in it. I can't come up with a
way that 2 identical deploys would otherwise behave differently on this.

> This can be fixed in a few ways; we can decide which one's best here.
>
> 1. The maas image for precise (release and daily) can skip inclusion of this
> file; this would be a change on the image build side. This would likely break
> older maas deployments where maas isn't modeling networking nor using a
> curtin new enough to generate the network config.
>
> 2. newer maas can inject a curthooks entry to remove
> /etc/network/interfaces.d/eth0.cfg if it exists and matches the known
> configuration
>
> Basically a terse version of cloud-init's _maybe_remove_legacy_eth0().
>
> 3. curtin can run this _maybe_remove_legacy_eth0() function if it knows it's
> writing a network configuration.
>
> I think _3_ is the best option; but it requires updating curtin. _2_ works
> without changing curtin, but then requires a maas update. _1_ I don't think
> is ideal here since it may break older maas installs.

I think 3 is the best option. Its safe to remove a file named eth0.cfg
that has known build-output content. That is what cloud-init does. If
the user intended to place an 'eth0.cfg' file and have it respected, they
have to any one of:
 a.) name it differently (local-eth0.cfg)
 b.) adding a trailing blank line.

cloud-init's _maybe_remove_legacy_eth0 will only remove the file if
its contents with lines starting with '#' removed is exactly:
 auto eth0
 iface eth0 inet dhcp

so just adding
 <space># asdf

its not the prettiest work around, but 'a' will allow effectively the same thing, and this is not an expected path.

Scott Moser (smoser) wrote :

The next precise image (anything named newer than 20160518) will have the new cloud-init version in it again. I've requested a build, and sometime in the next 8 hours or so we should get a 20160519 of precise.

Scott Moser (smoser) wrote :

$ s=20160519.1
$ wget http://images.maas.io/ephemeral-v2/daily/precise/amd64/$s/root-image.gz -O - | tee precise-root-image-$s.gz | zcat > precise-root-image-$s
$ sudo mount-image-callback "precise-root-image-$s" --read-only -- \
    chroot _MOUNTPOINT_ sh -c 'cat /etc/cloud/build.info; dpkg-query --show cloud-init'
build_name: server
serial: 20160519.1
cloud-init 0.7.0-0ubuntu2

So the maas image build is now fixed to get cloud-init correctly from the ppa.

Scott Moser (smoser) wrote :

i'm marking this fix-released for maas-images as we have a released image with it (as described in bug) and lateest daily has it.

Changed in maas-images:
status: Fix Committed → Fix Released
Scott Moser (smoser) wrote :

fix-commited in trunk at revno 389

Changed in curtin:
status: Triaged → Fix Committed
Scott Moser (smoser) wrote :

Locally with my maas at 1.9.1+bzr4543-0ubuntu1~trusty1 i installed precise image with 20160519.1

Scott Moser (smoser) wrote :

as listed in bug 1588706, revno 389 needs to get pushed so that the eth0.cfg that is in images does not cause havoc.

Scott Moser (smoser) on 2016-06-03
Changed in curtin (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package curtin - 0.1.0~bzr389-0ubuntu1

---------------
curtin (0.1.0~bzr389-0ubuntu1) yakkety; urgency=medium

  * New upstream snapshot.
    * Detect and remove legacy /etc/network/interfaces.d/eth0.cfg from
      target (LP: #1582410)

 -- Scott Moser <email address hidden> Fri, 03 Jun 2016 09:34:17 -0400

Changed in curtin (Ubuntu):
status: Confirmed → Fix Released
Hrvoje (hrvoje-habjanic) wrote :

Hi.

Would it be possible to push this to Trusty also? I'm "victim" of this bug also.

Regards,

H.

Hello LaMont, or anyone else affected,

Accepted curtin into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/curtin/0.1.0~bzr389-0ubuntu1~16.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-needed
tags: added: 4010
Scott Moser (smoser) wrote :

This is fixed in trusty sru at
https://bugs.launchpad.net/ubuntu/+source/curtin/+bug/1577872
It is currently in proposed.

Changed in curtin (Ubuntu Trusty):
importance: Undecided → High
status: New → Fix Committed

This was released, clearing from backlog.

Changed in curtin (Ubuntu Trusty):
status: Fix Committed → Fix Released
Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers