Cloud-init fails if iso9660 filesystem on non-cdrom path in 20171211 image.

Bug #1737704 reported by Christian Ehrhardt  on 2017-12-12
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
High
Scott Moser
cloud-init (Ubuntu)
High
Scott Moser

Bug Description

During OVF datasource checks, ds-identify attempted to ignore non-cdrom iso9660 filesystems. It logs a debug 'skip' message using an undeclared variable in a debug message resulting in the following failure:

_shwrap: d: parameter not set.

==== Original description ===

Hi,
I had the last daily image working fine:
$ uvt-simplestreams-libvirt query
release=bionic arch=amd64 label=daily (20171129.1)

But today after a sync I got this image:
$ uvt-simplestreams-libvirt query
release=bionic arch=amd64 label=daily (20171211)

The latter is failing me to boot correctly in regard to networking and actually cloud-init in general.

In the guest console I see it hanging on the usual "A start job is running for Wait for ..."
It breaks after some time giving up on networking.
"See 'systemctl status systemd-networkd-wait-online.service' for details."
The host confirmd that - the guest did not get an IP from dnsmasq.

Note: I was able to trigger this on a Xenial host as well as a Bionic Host. Also latest Artful image works well on all of these - so I'd expect it safe to assume that it only depends on the guest image.

I have taken full bootup console logs of both cases.

20171129.1 (good): http://paste.ubuntu.com/26169044/
20171211 (bad): http://paste.ubuntu.com/26169046/

There was one more thing that made me perplex - I usually provide --password=ubunut to uvt-kvm.
That adds a snippet to the cloud-init data to set the password of the ubuntu user.
Connecting via "virsh console" I can't log in on the bad guest which made me assume that cloud-init didn't run at all in the bad case.

And in fact the full logs confirm that, in the bad case there is no cloud-init seen at all.

Also my bionic containers today saw a cloud-init update - maybe it really is broken in the current daily image?
OTOH the changelog of cloud-init didn't suggest a change that could explain this.

Related branches

summary: - Cloud-init seems not run on today's bionic images
+ Cloud-init seems not run on today's bionic images (20171211)

Since I can't log into the guest due to the lack of proper init in the bad case I copied and mounted both daily images to check how they start fresh.

I see different cloud-init versions:
good: 17.1-41-g76243487-0ubuntu1
bad: 17.1-53-ga5dc0f42-0ubuntu1

I see status and clean commands got added, but none of these should be the kill switch.
I checked out the more reasonable changes in config and/or systemd files, but they all are of the same md5.

Going slightly wider on what actually changed: http://paste.ubuntu.com/26169161/

The only change left that IMHO could cause this is the one in ds-identify.

Now while I can't really use the new image, I can check what ds-identify left there on its first init (if anything).

But that seems equal, the only thing that differs in:
$ md5sum $(find /etc/cloud/ -type f | sort | xargs)
is "/etc/cloud/build.info" which refers to the different image date.

I fail to see why it didn't run so far :-/

Dan Watkins (daniel-thewatkins) wrote :

The testing we perform before a daily makes it out in to the world includes some basic cloud-init validation (basically "touch /some/file" in user-data, and checking that happened after boot), so it isn't failing in all cases.

Diffing the two manifests gives a pretty substantial set of changes:

new: {'python3-debconf': '1.5.65', 'libnss-systemd:amd64': '235-3ubuntu2', 'linux-headers-4.13.0-17-generic': '4.13.0-17.20', 'libharfbuzz0b:amd64': '1.7.2-1', 'libicu-le-hb0:amd64': '1.0.3+git161113-4', 'libntfs-3g88': '1:2017.3.23-2', 'linux-headers-4.13.0-17': '4.13.0-17.20', 'libgraphite2-3:amd64': '1.3.10-8'}
removed: {'linux-headers-4.13.0-16-generic': '4.13.0-16.19', 'libntfs-3g872': '1:2016.2.22AR.2-2', 'linux-headers-4.13.0-16': '4.13.0-16.19'}
changed: ['apport', 'bsdutils', 'busybox-initramfs', 'busybox-static', 'byobu', 'cloud-init', 'cpio', 'debconf', 'debconf-i18n', 'fdisk', 'gcc-7-base:amd64', 'gdisk', 'grub-common', 'grub-legacy-ec2', 'grub-pc', 'grub-pc-bin', 'grub2-common', 'iproute2', 'less', 'libassuan0:amd64', 'libblkid1:amd64', 'libcap2-bin', 'libcap2:amd64', 'libexpat1:amd64', 'libfdisk1:amd64', 'libgcc1:amd64', 'libicu60:amd64', 'libmount1:amd64', 'libpam-cap:amd64', 'libpam-systemd:amd64', 'libpcre3:amd64', 'libperl5.26:amd64', 'libpsl5:amd64', 'libpython3.6-minimal:amd64', 'libpython3.6-stdlib:amd64', 'libpython3.6:amd64', 'libsmartcols1:amd64', 'libssl1.0.0:amd64', 'libstdc++6:amd64', 'libsystemd0:amd64', 'libudev1:amd64', 'libuuid1:amd64', 'linux-headers-generic', 'linux-headers-virtual', 'linux-image-4.13.0-17-generic', 'linux-image-virtual', 'linux-virtual', 'man-db', 'mount', 'nano', 'netcat-openbsd', 'ntfs-3g', 'openssl', 'perl', 'perl-base', 'perl-modules-5.26', 'python3-apport', 'python3-gi', 'python3-problem-report', 'python3.6', 'python3.6-minimal', 'snapd', 'sosreport', 'sudo', 'systemd', 'systemd-sysv', 'udev', 'util-linux', 'uuid-runtime', 'xauth']

Could you try bisecting with the interim serials to see if we can get a slightly smaller list of packages to consider?

smoser was able to reproduce (without uvt btw) and found up to 20171208 working.
The manifest diff then is much smaller.

Essentially:
+cloud-init 17.1-53-ga5dc0f42-0ubuntu1
+grub-legacy-ec2 17.1-53-ga5dc0f42-0ubuntu1
+libassuan0:amd64 2.5.1-1

Of those only the first seems related.

See: http://paste.ubuntu.com/26170495/

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Scott Moser (smoser) wrote :

The fix for bug 1731868 left a interim change in ds-identify referencing an undeclared variable.

That code path would be hit if you attached an ISO filesystem to a device other than a cdrom.
For example:

qemu-system-x86_64 -enable-kvm \
  -device virtio-net-pci,netdev=net00 \
  -netdev type=user,id=net00 \
  -drive file=disk.img,id=disk00,if=none,format=qcow2,index=0 \
  -device virtio-blk,drive=disk00,serial=disk.img \
  -drive file=my-seed.img,id=disk01,if=none,format=raw,index=1 \
  -device virtio-blk,drive=disk01,serial=my-seed.img \
  -m 768 -nographic

It would work fine if you attached the iso filesystem as a cdrom.
(-cdrom my-seed.img)

no longer affects: cloud-images
Changed in cloud-init:
status: New → In Progress
Changed in cloud-init (Ubuntu):
status: Confirmed → In Progress
Changed in cloud-init:
importance: Undecided → High
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Changed in cloud-init:
assignee: nobody → Scott Moser (smoser)
Changed in cloud-init (Ubuntu):
assignee: nobody → Scott Moser (smoser)
Scott Moser (smoser) on 2017-12-12
summary: - Cloud-init seems not run on today's bionic images (20171211)
+ Cloud-init fails if iso9660 filesystem on non-cdrom path in 20171211
+ image.
Scott Moser (smoser) wrote :

I have a fix
 http://paste.ubuntu.com/26170810/
when launchpad git returns I will push and get uploaded.

Chad Smith (chad.smith) on 2017-12-12
description: updated
description: updated
Chad Smith (chad.smith) on 2017-12-12
Changed in cloud-init:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cloud-init - 17.1-60-ga30a3bb5-0ubuntu1

---------------
cloud-init (17.1-60-ga30a3bb5-0ubuntu1) bionic; urgency=medium

  * New upstream snapshot.
    - ds-identify: failure in NoCloud due to unset variable usage.
      (LP: #1737704)
    - tests: fix collect_console when not implemented [Joshua Powers]

 -- Chad Smith <email address hidden> Tue, 12 Dec 2017 12:03:08 -0700

Changed in cloud-init (Ubuntu):
status: In Progress → Fix Released

This bug is believed to be fixed in cloud-init in 1705804. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers