booting cloud image without initramfs broken

Bug #1377308 reported by Scott Moser on 2014-10-03
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
cloud-init
High
Unassigned
cloud-init (Ubuntu)
High
Unassigned
Trusty
Medium
Unassigned

Bug Description

Booting without a initramfs was broken by the cloud-init change for
bug 1353008 (http://pad.lv/1353008).

This affects arm guests where a bootloader is not used that would load
kernel and initramfs.

There are 2 workarounds:
a.) remove the offensive code
  sudo mount-image-callback ubuntu.img -- \
     sh -c 'f="$MOUNTPOINT/etc/init/cloud-init-local.conf";
            sed -e "/^start on/s/ and mounted .*//" -i.dist $f &&
            diff -u $f.dist $f'

b.) register and boot with an initramfs
  This is done by
   i.) getting the initramfs out of the image:
     sudo mount-image-callback ubuntu.img -- \
       sh -c 'cp $MOUNTPOINT/boot/initrd* . && chmod ugo+r initrd*'
   ii.) upload the initramfs to glance
     glance image-create --name=ubuntu-ramdisk --public \
        --container-format ari --disk-format ari < initrd*
     record the ramdisk id
   iii.) register with --property ramdisk_id=$RAMDISK_ID
     normally for "ami" style images on arm, the user had been
     uploading with --property kernel_id=<kernel_id>.
     now, you need to upload with:
       glance image-create --name="$NAME" \
          --public --container-format ami --disk-format ami \
          --property "kernel_id=$KERNEL_ID" \
          --property "ramdisk_id=$RAMDISK_ID" \ < ubuntu.img

 c.) register 'kernel command line' to include 'rw'.
     glance image-create .... --property kernel_args="root=/dev/vda rw"

Related bugs:
 * bug 1031065:cloud-init-nonet runs 'start networking' explicitly
 * bug 643289: [mountall] idmapd does not starts to work after system reboot
 * bug 1353008:[cloud-init] MAAS Provider: LXC did not get DHCP address, stuck in "pending"

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: cloud-init 0.7.5-0ubuntu1.2
ProcVersionSignature: User Name 3.13.0-36.63-generic 3.13.11.6
Uname: Linux 3.13.0-36-generic aarch64
ApportVersion: 2.14.1-0ubuntu3.4
Architecture: arm64
Date: Thu Jan 1 00:02:09 1970
Ec2AMI: ami-00000007
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.5GB
Ec2Kernel: aki-00000005
Ec2Ramdisk: ari-00000003
PackageArchitecture: all
ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: cloud-init
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile..etc.init.cloud.init.local.conf: 2014-10-03T19:49:16.813801

Scott Moser (smoser) wrote :
Changed in cloud-init (Ubuntu):
status: New → Triaged
Changed in cloud-init:
status: New → Triaged
importance: Undecided → High
Changed in cloud-init (Ubuntu):
importance: Undecided → High
Scott Moser (smoser) wrote :

The offensive change was this in /etc/init/cloud-init-local.conf
-start on mounted MOUNTPOINT=/ and mounted MOUNTPOINT=/run
+start on mounted MOUNTPOINT=/

Scott Moser (smoser) wrote :
Scott Moser (smoser) on 2014-10-03
description: updated
Scott Moser (smoser) wrote :

I just noticed from reading one of the related bugs that booting image without 'ro' on the kernel command line might also be a requirement to trigger this. I'd like to see a kernel booted iwth 'ro' on the command line .

that said, I tried to reproduce this with a daily image of trusty (20141003) and could not.
that attempt looked like this:

$ tgz_url=http://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-amd64.tar.gz
$ tgz=${tgz_url##*/}
$ wget -O "${tgz_url}" "$tgz"

$ mkdir dist
$ tar -C dist -Scvzf "$tgz"
$ dist_disk=$(echo dist/*.img)
$ kernel=$(echo dist/*vmlinuz*)

$ cat > user-data <<EOF
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF
$ echo "instance-id: $(uuidgen || echo i-abcdefg)" > meta-data
$ cloud-localds seed.img user-data meta-datak

$ qemu-img create -f qcow2 -b "$dist_disk" disk.img
$ qemu-system-x86_64 -enable-kvm -net nic -net user,hostfwd=tcp::2222-:22 \
    -drive file=disk.img,if=virtio -drive file=seed.img,if=virtio \
    -kernel "${kernel}" -append "root=LABEL=cloudimg-rootfs ro" -curses

i could not cause a hang here with or without 'ro'

Scott Moser (smoser) wrote :

so to clarify above, i could not recreate the error on amd64, but it most certainly *does* fail on arm64 (AArch64).

Raghuram Kota (rkota) on 2014-10-09
tags: added: hs-arm64
tags: added: hs-moonshot
tags: added: hs-moonshot-maas-juju
removed: hs-moonshot
Scott Moser (smoser) wrote :

ok. so i can reproduce this on both arm64 and ppc64el.
On ppc64el, both on trusty and on utopic.
Adding '-initrd <extracted-initramfs>' fixes the problem.

Heres an improved copy/paste to recreate. It does *not* fail on (arch=amd64).

arch=ppc64el
rel=trusty
tgz_url=http://cloud-images.ubuntu.com/$rel/current/${rel}-server-cloudimg-${arch}.tar.gz
tgz=${tgz_url##*/}
qemu=qemu-system-${arch}
[ "$arch" = "amd64" ] && qemu="qemu-system-x86_64"
[ "$arch" = "ppc64el" ] && qemu="qemu-system-ppc64"

[ -f "$tgz" ] || { wget "${tgz_url}" -O "$tgz.part" && mv "$tgz.part" "$tgz"; }

mkdir -p dist
( cd dist && ls *$rel*$arch*.img 2>/dev/null ) || tar -C dist -Sxvzf "$tgz"
dist_disk=$(echo dist/*$rel*$arch*.img)
kernel=$(echo dist/*$rel*$arch*vmlinu?*)

cat > user-data <<EOF
#cloud-config
password: passw0rd
chpasswd: { expire: False }
ssh_pwauth: True
EOF
echo "instance-id: $(uuidgen || echo i-abcdefg)" > meta-data
cloud-localds seed.img user-data meta-data

qemu-img create -f qcow2 -b "$dist_disk" disk.img

# on intel:
$qemu -enable-kvm \
   -net nic -net user,hostfwd=tcp::2222-:22 \
   -drive file=disk.img,if=virtio -drive file=seed.img,if=virtio \
   -kernel "${kernel}" -append "root=/dev/vda ro" -curses

# on ppc64el
$qemu -m 1G -enable-kvm -machine pseries,usb=off -device spapr-vscsi \
   -device spapr-vlan,netdev=net00 -netdev type=user,id=net00 \
   -drive file=disk.img,if=virtio -drive file=seed.img,if=virtio \
   -kernel "$kernel" -append "root=/dev/vda console=hvc0 ro --verbose" \
   -display none -nographic

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

a bit more info. on ppc64el at least, you can boot with 'rw' as a kernel parameter and fix this. thats less than ideal, but it adds an interesting piece of information. previously i just assumed we were blocked in a hang where / was mounted rw but /run was not yet mounted. it is the other way, though, in that '/run' gets mounted and we're blocked on /.

description: updated
Scott Moser (smoser) wrote :

comment 3 above is ordered wrong. the change that caused this is:
/etc/init/cloud-init-local.conf
+start on mounted MOUNTPOINT=/
+start on mounted MOUNTPOINT=/ and mounted MOUNTPOINT=/run

the reason was that cloud-init-local needs to write to / and to /run. previously it was using /run without declaring the need for it. also cloud-init generally wants 'mounted' to block things.

Scott Moser (smoser) wrote :

bah.
/etc/init/cloud-init-local.conf
- start on mounted MOUNTPOINT=/
+start on mounted MOUNTPOINT=/ and mounted MOUNTPOINT=/run

tags: added: hs-arm64-maas-juju
Scott Moser (smoser) on 2016-04-09
Changed in cloud-init (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.