New GCP dailies are failing startup-script tests, due to network not being fully set up when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init.
New GCP dailies are failing startup-script tests, due to network not being fully set up when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh: sources. list /tmp/startup- sources. list
#!/bin/bash
cp /etc/apt/
$ gcloud compute instances create startup-test --image daily-ubuntu- 2204-jammy- v20240314 --image-project ubuntu- os-cloud- devel --metadata- from-file= startup- script= startup_ script. sh sources. list /etc/apt/ sources. list sources_ list: true' to /etc/cloud/ cloud.cfg sources. list.d templates/ sources. list.tmpl archive. ubuntu. com/ubuntu/ jammy main restricted archive. ubuntu. com/ubuntu/ jammy main restricted us-central1. gce.archive. ubuntu. com/ubuntu/ jammy main restricted us-central1. gce.archive. ubuntu. com/ubuntu/ jammy main restricted archive. ubuntu. com/ubuntu/ jammy-updates main restricted archive. ubuntu. com/ubuntu/ jammy-updates main restricted
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/
> ## c.) make changes to template file /etc/cloud/
>
3,4c11,12
< deb http://
< # deb-src http://
---
> deb http://
> # deb-src http://
8,9c16,17
< deb http://
< # deb-src http://
---
[...]
On earlier images (such as ubuntu- 2204-jammy- v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https:/ /github. com/canonical/ ubuntu- pro-client/ blob/dfe1f1ed46 78c50240d4e251f 41d33bb4034135e /debian/ changelog# L40 for details) that removes a systemd ordering on cloud-config. service. As side effect of this change was the removal of cloud-config. service (and ubuntu- advantage. service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly): startup- test-control: ~$ systemd-analyze critical-chain google- startup- scripts. service
catred@
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google- startup- scripts. service +18.262s advantage. service @28.480s config. service @27.372s +1.095s ─snapd. seeded. service @20.048s +7.312s
└─snapd. service @12.469s +7.555s
└─basic. target @11.558s
└─ sockets. target @11.540s
└─snap. lxd.daemon. unix.socket @24.376s
└─sysinit. target @10.825s
└─cloud- init.service @8.432s +2.267s
└ ─systemd- networkd- wait-online. service @6.467s +1.935s
└─systemd- networkd. service @6.347s +112ms
└─network- pre.target @6.328s
└─ cloud-init- local.service @4.309s +2.006s
└─systemd- remount- fs.service @1.829s +68ms
└─systemd- fsck-root. service @1.587s +160ms
└─systemd- journald. socket @1.292s
└ ─system. slice @1.068s
└─-.slice @1.068s
└─multi-user.target @28.480s
└─ubuntu-
└─cloud-
└
On v20240314 (startup scripts fail): startup- test:~$ systemd-analyze critical-chain google- startup- scripts. service
catred@
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google- startup- scripts. service +260ms ─sockets. target @13.225s
└─snap. lxd.user- daemon. unix.socket @26.765s
└─sysinit. target @12.550s
└─ cloud-init. service @7.933s +4.503s
└─systemd- networkd- wait-online. service @6.741s +1.171s
└─systemd- networkd. service @6.593s +124ms
└─network- pre.target @6.573s
└ ─cloud- init-local. service @4.478s +2.083s
└─systemd- remount- fs.service @1.717s +64ms
└─systemd- fsck-root. service @1.510s +95ms
└─ systemd- journald. socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└
This can be fixed by adding an explict `After= cloud-config. service` to the google- startup- scripts. service file, which enforces the correct ordering between google- startup- scripts and cloud-init.