sr0 not available at generator timeframe causes cloud-init.target not run

Bug #1940791 reported by Adam Chasen
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-images
Incomplete
Undecided
Unassigned
cloud-init
Expired
Undecided
Unassigned

Bug Description

Focal image cloud-init generator reports:
'cloud-init is enabled but no datasource found, disabling'

looks to be related to ds-identify not finding the cdrom drive (and caching it) on first run. Not sure why /dev/sr0 would not be available early enough.

cat /run/cloud-init/ds-identify.log
...
ISO9660_DEVS=
...
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 1.20s] returning 1
root@ubuntu:~# /usr/lib/cloud-init/ds-identify --force
[up 200.71s] ds-identify --force
...
ISO9660_DEVS=/dev/sr0=cidata
...
Found single datasource: NoCloud
[up 200.79s] returning 0

Booting https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-disk-kvm.img as of Aug 22, 2021 in KVM (created with virt-install and libvirt) along with cloud-config ISO

$ cat /tmp/cloud
#cloud-config
hostname: proxy1
$ cloud-localds /tmp/test.iso /tmp/cloud

cloud-init.target never reached and network doesn't come up (default behavior for cloud-init is eth0 DHCP). If I manually start `systemctl start cloud-init.target` then I get what I expected, but by then it is "too late" and I also have to kick systemd-networkd.

cloud-init starts up as expected with the same environment when using Bionic (https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img)

The focal image never touches cloud-init.target. Note that there is no reverse dependency in focal.

root@ubuntu:~# systemctl list-dependencies --reverse cloud-init.target
cloud-init.target

Both images have default target of "graphical.target"

There is mention of a "generator" and "detection" in the cloud-init docs. https://cloudinit.readthedocs.io/en/latest/topics/boot.html

The generator appears to be what is adding the "wants" of cloud-init.target to multi-user.target
from /lib/systemd/system-generators/cloud-init-generator:
    local target_name="multi-user.target" gen_d="$early_d"
    local link_path="$gen_d/${target_name}.wants/${CLOUD_TARGET_NAME}"

Bionic:
root@proxy1:~# systemctl get-default
graphical.target
root@proxy1:~#
UNIT LOAD ACTIVE SUB DESCRIPTION
basic.target loaded active active Basic System
cloud-config.target loaded active active Cloud-config availability
cloud-init.target loaded active active Cloud-init target
...
root@proxy1:~# systemctl list-dependencies --reverse cloud-init.target
cloud-init.target
● └─multi-user.target
● └─graphical.target
root@proxy1:/etc/systemd/system# cat /run/cloud-init/cloud-init-generator.log
/lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late
kernel command line (/proc/cmdline): BOOT_IMAGE=/boot/vmlinuz-4.15.0-154-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
kernel_cmdline found unset
etc_file found unset
default found enabled
checking for datasource
ds-identify rc=0
ds-identify _RET=found
enabled via /run/systemd/generator.early/multi-user.target.wants/cloud-init.target -> /lib/systemd/system/cloud-init.target

Focal:
root@ubuntu:~# systemctl get-default
graphical.target
root@ubuntu:~# systemctl list-units --type=target --all
  UNIT LOAD ACTIVE SUB >
  basic.target loaded active activ>
  blockdev@dev-disk-by\x2dlabel-cloudimg\x2drootfs.target loaded inactive dead >
  blockdev@dev-disk-by\x2dlabel-UEFI.target loaded inactive dead >
  <email address hidden> loaded inactive dead >
  <email address hidden> loaded inactive dead >
  <email address hidden> loaded inactive dead >
  <email address hidden> loaded inactive dead >
  cloud-config.target loaded inactive dead >
  cloud-init.target loaded inactive dead >
root@ubuntu:~# systemctl list-unit-files
...
cloud-config.service enabled enabled
cloud-final.service enabled enabled
cloud-init-local.service enabled enabled
cloud-init.service enabled enabled
...
root@ubuntu:~# systemctl list-dependencies --reverse cloud-init.target
cloud-init.target
root@ubuntu:~# systemctl list-dependencies cloud-init.target
cloud-init.target
● ├─cloud-config.service
● ├─cloud-final.service
● ├─cloud-init-local.service
● └─cloud-init.service

root@ubuntu:~# cat /run/cloud-init/cloud-init-generator.log
/usr/lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late
kernel command line (/proc/cmdline): BOOT_IMAGE=/boot/vmlinuz-5.4.0-1045-kvm root=PARTUUID=14530a28-f129-4b51-a64e-c64075fae7c7 ro console=tty1 console=ttyS0 panic=-1
kernel_cmdline found unset
etc_file found unset
default found enabled
checking for datasource
ds-identify rc=1
ds-identify _RET=notfound
cloud-init is enabled but no datasource found, disabling
already disabled: no change needed [no /run/systemd/generator.early/multi-user.target.wants/cloud-init.target]

Additional Resources:
Possibly same issue https://bugzilla.redhat.com/show_bug.cgi?id=1820540

Adam Chasen (achasen)
description: updated
Revision history for this message
John Chittum (jchittum) wrote :

Could you provide exact reproduction steps with virt-install and libvirt. I am attempting to reproduce locally with setups we normally use for testing, and am unable to:

1. downloaded image from https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-disk-kvm.img
2. created a simple cloud-init yaml file:

#cloud-config
password: <INSERT PASSWORD HERE>
chpasswd: { expire: False }
ssh_pwauth: True
ssh_import_id: jchittum
sudo: ALL=(ALL) NOPASSWD:ALL

3. using `cloud-localds` from `cloud-image-utils`, made an ISO of the cloud-config
cloud-localds cloud_init_with_pass.iso cloud-init.yaml

3. used qemu to test the image:

qemu-system-x86_64 \
  -cpu host -machine type=q35,accel=kvm -m 2048 \
  -nographic \
  -snapshot \
  -netdev id=net00,type=user,hostfwd=tcp::2222-:22 \
  -device virtio-net-pci,netdev=net00 \
  -drive if=virtio,format=qcow2,file=focal-server-cloudimg-amd64-disk-kvm.img \
  -drive if=virtio,format=raw,file=cloud_init_with_pass.iso

This qemu command sets the accel to kvm, and i had no issues. I'm guessing that the drive setup is very different though.

From my working knowledge of libvirt and cloud-init, you do need to mount the cloud-init image in a specific place, and I don't think there would be an issue, generally, with the kvm image not getting sr0 up fast enough. `qemu` is mounting to the same place in that command.

Could you provide the libvirt XML definition and exact reproduction steps for us to dig a little deeper?

John Chittum (jchittum)
Changed in cloud-images:
status: New → Incomplete
Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Chad Smith (chad.smith) wrote :

I also haven't been able to reproduce on focal. It makes me think that there is a potential systemd unit ordering cycle on the image/config that represented this issue?

on focal I see the reverse deps on latest daily images:

root@dev-ff:~# systemctl list-dependencies --reverse cloud-init.target
cloud-init.target
● └─multi-user.target
● └─graphical.target
root@dev-ff:~# lsb_release -sc
focal

A guess in the dark would be to check is `journalctl -b 0` and look for "ordering cycle" related messages too.

Revision history for this message
Adam Chasen (achasen) wrote :

able to reproduce with image created with

```
virt-install --connect qemu:///session \
--name cloudinit-test \
--memory 2048 \
--disk /home/achasen/tmp/focal.img,device=disk,bus=virtio \
--os-type linux \
--os-variant ubuntu20.04 \
--virt-type kvm \
--graphics none \
--network bridge=virbr0,model=virtio \
--import \
--disk /tmp/test.iso,device=cdrom,bus=sata
```

/run/cloud-init/cloud-init-generator.log indicated run around 0.69s:

```
No ds found [mode=search, notfound=disabled]. Disabled cloud-init [1]
[up 0.69s] returning 1
```

jornalctl shows things like "Starting Network Service" before sr0 is in the log (which makes me think the sr0 is delayed). I didn't find anything in journalctl output related to the generator.

```
[ 1.857890] ubuntu systemd[1]: Starting Network Service...
...
[ 2.364539] ubuntu kernel: ata3: SATA link down (SStatus 0 SControl 300)
[ 2.364609] ubuntu kernel: ata5: SATA link down (SStatus 0 SControl 300)
[ 2.364642] ubuntu kernel: ata1.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[ 2.364643] ubuntu kernel: ata1.00: applying bridge limits
[ 2.364884] ubuntu kernel: ata1.00: configured for UDMA/100
[ 2.364350] ubuntu kernel: ata4: SATA link down (SStatus 0 SControl 300)
[ 2.364426] ubuntu kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl>
[ 2.364539] ubuntu kernel: ata3: SATA link down (SStatus 0 SControl 300)
[ 2.364609] ubuntu kernel: ata5: SATA link down (SStatus 0 SControl 300)
[ 2.364642] ubuntu kernel: ata1.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100
[ 2.364643] ubuntu kernel: ata1.00: applying bridge limits
[ 2.364884] ubuntu kernel: ata1.00: configured for UDMA/100
[ 2.365032] ubuntu kernel: scsi 0:0:0:0: CD-ROM QEMU QEMU DVD>
[ 2.365242] ubuntu kernel: sr 0:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa>
[ 2.365250] ubuntu kernel: cdrom: Uniform CD-ROM driver Revision: 3.20
[ 2.379293] ubuntu kernel: sr 0:0:0:0: Attached scsi CD-ROM sr0
[ 2.416795] ubuntu systemd[1]: Finished udev Wait for Complete Device Initia>
[ 2.417385] ubuntu systemd[1]: Starting Device-Mapper Multipath Device Contr>
```

Revision history for this message
Vincent Saelzler (vincentsaelzler) wrote :

I have the same issue with the Azure/Hyper-V Image. Running on local Windows desktop, using Hyper-V as the hypervisor.

Steps to reproduce:

1. Download and extract https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64-azure.vhd.zip. Save disk image as 20.04-cloud.vhd.

2. Create my-seed.iso file almost exactly as described in cloud-init documentation. Only small tweak is saving as ISO instead of IMG. https://cloudinit.readthedocs.io/en/latest/topics/debugging.html

$ cat > user-data <<EOF
  #cloud-config
  password: passw0rd
  chpasswd: { expire: False }
  EOF
$ cloud-localds my-seed.iso user-data

3. Create new VM using Hyper-V GUI
- Virtual Hard Disk Image = 20.04-cloud.vhd
- Virtual DVD Drive Image = my-seed.iso

=> After starting the VM, I cannot log in.

Possibly helpful note: When using the standard (non-cloud) installer, this file seems to prevent the VM from using an ISO attached to the system: /etc/cloud/cloud.cfg.d/99-installer.cfg

It saves the user details that I manually entered during the install process, and critically, explicitly sets the data source to none.

$ cat /run/cloud-init/ds-identify.log
/etc/cloud/cloud.cfg.d/99-installer.cfg set datasource_list: [None]

After deleting the file, the ISO was recognized (and PW of "passw0rd" for ubuntu user worked)

$ cat /run/cloud-init/ds-identify.log
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, UpCloud, Vultr, None ]

I do not know how to get debug output from the cloud image, because I cannot login as any user! If someone can explain how to do that, I would be happy to provide more output from the cloud image VM.

Revision history for this message
Gauthier Jolly (gjolly) wrote :

Hi Vincent,

Thank you for your comment. What you are seeing with the Azure cloud-images is not related with the current issue.

Azure VHDs you can find on c-i.u.c are the same images we publish on Azure Cloud. Those are configured with a single Cloud-Init datasource (Azure) to make the image boot faster. While it is possible to boot those images locally on hyper-v, you will end up with a VM that is not fully functional.

If you look carefully at the bug description, you will see that @achasen uses KVM images (not Azure images) that should work out of the box on KVM.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init because there has been no activity for 60 days.]

Changed in cloud-init:
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-images because there has been no activity for 60 days.]

Changed in cloud-images:
status: Incomplete → Expired
Paride Legovini (paride)
Changed in cloud-images:
status: Expired → Incomplete
Changed in cloud-init:
status: Expired → Incomplete
Revision history for this message
Chad Smith (chad.smith) wrote :

I apologize for the expiry on this bug it slipped through the cracks as it was set to incomplete status which eventually expires if not set back to New.

The reason we don't have cloud-init included in your boot target is due to the ds-identify generator not seeing the /dev/sr0 yet with a cidata label due to what appears to be a later module load.

Cloud-init can tell you on focal that it's disabled due to the generator-time failure to find a matching datasource.

root@focal:~# cloud-init status --long
status: disabled
detail:
Cloud-init disabled by cloud-init-generator

I am able to reproduce the original error with the following steps and as Adam suggested:
$ sudo virt-install --connect qemu:///session --name cloudinit-test --memory 2048 --disk /home/csmith/src/cloud-init/focal-server-cloudimg-amd64-disk-kvm.img,device=disk,bus=virtio --os-type linux --os-variant ubuntu20.04 --virt-type kvm --graphics none --network bridge=virbr0,model=virtio --import --disk "/tmp/test.iso,device=cdrom,bus=sata"

 On Focal, we can see /run/cloud-init/ds-identify.log which is emitted when cloud-init's generator runs beats the journalctl -b 0 timing of when the /dev/sr0 is seen due to later kernel module load.

from journalctl:

   Feb 24 21:56:28 ubuntu kernel: sr 0:0:0:0: Attached scsi CD-ROM sr0

root@focal:~# ls -ltr --full-time /dev/disk/by-label/ /run/cloud-init/ds-identify.log
# Generator time 21:56:27
-rw-r--r-- 1 root root 1504 2022-02-24 21:56:27.241872017 +0000 /run/cloud-init/ds-identify.log

# /dev/sr0 availability no until 1 second later
/dev/disk/by-label/:
total 0
lrwxrwxrwx 1 root root 10 2022-02-24 21:56:28.173872017 +0000 cloudimg-rootfs -> ../../vda1
lrwxrwxrwx 1 root root 9 2022-02-24 21:56:28.441872017 +0000 cidata -> ../../sr0
lrwxrwxrwx 1 root root 11 2022-02-24 21:56:28.581872017 +0000 UEFI -> ../../vda15

This needs a bit more investigation and probably can be worked around with add the virt-install argument `--sysinfo system.serial='ds=nocloud'` which will force ds-identify to detect NoCloud regardless of the presence of /dev/sr0. Since the device will be up before NoCloud.get_data is run, this will avoid the race.

Changed in cloud-init:
status: Incomplete → Triaged
Revision history for this message
James Falcon (falcojr) wrote :

A duplicate bug, https://bugs.launchpad.net/bugs/1961832 , provides some additional context and consistent reproduction steps.

summary: - sr0 not available causes cloud-init.target not run on focal cloud image
+ sr0 not available at generator timeframe causes cloud-init.target not
+ run
Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Triaged → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.