2024-03-14 19:51:37 |
Catherine Redfield |
bug |
|
|
added bug |
2024-03-14 20:08:37 |
Andrew Cloke |
bug |
|
|
added subscriber Andrew Cloke |
2024-03-14 21:03:40 |
Launchpad Janitor |
google-guest-agent (Ubuntu): status |
New |
Confirmed |
|
2024-03-14 21:03:52 |
Chloé Smith |
bug |
|
|
added subscriber Chloé Smith |
2024-03-15 09:28:32 |
Philip Roche |
description |
New GCP dailies are failing startup-script tests, due to network not being fully set up when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init. |
New GCP dailies are failing startup-script tests, due to configuration via cloud-init not being fully completed, apt sources for example, when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init. |
|
2024-03-15 16:26:18 |
Philip Roche |
nominated for series |
|
Ubuntu Bionic |
|
2024-03-15 16:26:18 |
Philip Roche |
bug task added |
|
google-guest-agent (Ubuntu Bionic) |
|
2024-03-15 16:26:18 |
Philip Roche |
nominated for series |
|
Ubuntu Noble |
|
2024-03-15 16:26:18 |
Philip Roche |
bug task added |
|
google-guest-agent (Ubuntu Noble) |
|
2024-03-15 16:26:18 |
Philip Roche |
nominated for series |
|
Ubuntu Mantic |
|
2024-03-15 16:26:18 |
Philip Roche |
bug task added |
|
google-guest-agent (Ubuntu Mantic) |
|
2024-03-15 16:26:18 |
Philip Roche |
nominated for series |
|
Ubuntu Jammy |
|
2024-03-15 16:26:18 |
Philip Roche |
bug task added |
|
google-guest-agent (Ubuntu Jammy) |
|
2024-03-15 16:26:18 |
Philip Roche |
nominated for series |
|
Ubuntu Focal |
|
2024-03-15 16:26:18 |
Philip Roche |
bug task added |
|
google-guest-agent (Ubuntu Focal) |
|
2024-03-15 16:26:18 |
Philip Roche |
nominated for series |
|
Ubuntu Xenial |
|
2024-03-15 16:26:18 |
Philip Roche |
bug task added |
|
google-guest-agent (Ubuntu Xenial) |
|
2024-03-16 01:07:27 |
Catherine Redfield |
attachment added |
|
0006-order-startup-scripts-after-cloud-final.patch https://bugs.launchpad.net/ubuntu/+source/google-guest-agent/+bug/2057965/+attachment/5756329/+files/0006-order-startup-scripts-after-cloud-final.patch |
|
2024-03-16 01:40:46 |
Catherine Redfield |
description |
New GCP dailies are failing startup-script tests, due to configuration via cloud-init not being fully completed, apt sources for example, when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init. |
[ Impact ]
In certain situations (consistently with ubuntu-pro=31.2 and cloud-init=23.4.4), cloud-config.service has not completed before google-startup-scripts.service runs. This can cause startup scripts that rely on apt to fail, as cloud-init is responsible for reconfiguring sources.list to point at the GCE archives.
Since pro and cloud-init are backported to all older releases, this bug will affect them too.
The change that results in this race condition is the removal an ordering condition between pro and cloud-init, so adding `After=cloud-final.service` to google-startup-scripts.service should ensure that the startup scripts are correctly run regardless of the ordering (or lack thereof) between other services.
[ Test Plan ]
To reproduce:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
Since this bug particularly effects first boot (once sources.list is configured with the GCE mirrors on first boot it will remain correctly configured), the best way to test that fix is correctly created will be to create an image with pro pinned at 31.2, cloud-init pinned at 23.4.4, and google-guest-agent install from proposed. The test would be:
1. Create an instance with startup script as above
$ gcloud compute instances create startup-test --image [IMAGE_NAME] --image-project [IMAGE PROJECT] --metadata-from-file=startup-script=startup_script.sh
2. SSH into the instance and verify pro/cloud-init/google-guest-agent versions/source
> pro --version
32.1~[RELEASE]
> cloud-init --version
/usr/bin/cloud-init 23.4.4-0ubuntu0~[RELEASE]
> apt-cache policy google-guest-agent
[ensure from -proposed]
3. Verify startup script ran correctly after cloud-config.service.
> diff /tmp/startup-sources.list /etc/apt/sources.list
>
[ Where problems could occur ]
#TODO STILL
* Think about what the upload changes in the software. Imagine the change is
wrong or breaks something else: how would this show up?
* It is assumed that any SRU candidate patch is well-tested before
upload and has a low overall risk of regression, but it's important
to make the effort to think about what ''could'' happen in the
event of a regression.
* This must '''never''' be "None" or "Low", or entirely an argument as to why
your upload is low risk.
* This both shows the SRU team that the risks have been considered,
and provides guidance to testers in regression-testing the SRU.
[ Other Info ]
Original bug report retained below.
New GCP dailies are failing startup-script tests, due to configuration via cloud-init not being fully completed, apt sources for example, when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init. |
|
2024-03-16 04:17:06 |
Ubuntu Foundations Team Bug Bot |
tags |
|
patch |
|
2024-03-16 04:17:11 |
Ubuntu Foundations Team Bug Bot |
bug |
|
|
added subscriber Ubuntu Review Team |
2024-03-18 15:42:21 |
Catherine Redfield |
description |
[ Impact ]
In certain situations (consistently with ubuntu-pro=31.2 and cloud-init=23.4.4), cloud-config.service has not completed before google-startup-scripts.service runs. This can cause startup scripts that rely on apt to fail, as cloud-init is responsible for reconfiguring sources.list to point at the GCE archives.
Since pro and cloud-init are backported to all older releases, this bug will affect them too.
The change that results in this race condition is the removal an ordering condition between pro and cloud-init, so adding `After=cloud-final.service` to google-startup-scripts.service should ensure that the startup scripts are correctly run regardless of the ordering (or lack thereof) between other services.
[ Test Plan ]
To reproduce:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
Since this bug particularly effects first boot (once sources.list is configured with the GCE mirrors on first boot it will remain correctly configured), the best way to test that fix is correctly created will be to create an image with pro pinned at 31.2, cloud-init pinned at 23.4.4, and google-guest-agent install from proposed. The test would be:
1. Create an instance with startup script as above
$ gcloud compute instances create startup-test --image [IMAGE_NAME] --image-project [IMAGE PROJECT] --metadata-from-file=startup-script=startup_script.sh
2. SSH into the instance and verify pro/cloud-init/google-guest-agent versions/source
> pro --version
32.1~[RELEASE]
> cloud-init --version
/usr/bin/cloud-init 23.4.4-0ubuntu0~[RELEASE]
> apt-cache policy google-guest-agent
[ensure from -proposed]
3. Verify startup script ran correctly after cloud-config.service.
> diff /tmp/startup-sources.list /etc/apt/sources.list
>
[ Where problems could occur ]
#TODO STILL
* Think about what the upload changes in the software. Imagine the change is
wrong or breaks something else: how would this show up?
* It is assumed that any SRU candidate patch is well-tested before
upload and has a low overall risk of regression, but it's important
to make the effort to think about what ''could'' happen in the
event of a regression.
* This must '''never''' be "None" or "Low", or entirely an argument as to why
your upload is low risk.
* This both shows the SRU team that the risks have been considered,
and provides guidance to testers in regression-testing the SRU.
[ Other Info ]
Original bug report retained below.
New GCP dailies are failing startup-script tests, due to configuration via cloud-init not being fully completed, apt sources for example, when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init. |
[ Impact ]
In certain situations (consistently with ubuntu-pro=31.2 and cloud-init=23.4.4), cloud-config.service has not completed before google-startup-scripts.service runs. This can cause startup scripts that rely on apt to fail, as cloud-init is responsible for reconfiguring sources.list to point at the GCE archives.
Since pro and cloud-init are backported to all older releases, this bug will affect them too.
The change that results in this race condition is the removal an ordering condition between pro and cloud-init, so adding `After=cloud-final.service` to google-startup-scripts.service should ensure that the startup scripts are correctly run regardless of the ordering (or lack thereof) between other services.
[ Test Plan ]
To reproduce:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
Since this bug particularly effects first boot (once sources.list is configured with the GCE mirrors on first boot it will remain correctly configured), the best way to test that fix is correctly created will be to create an image with pro pinned at 31.2, cloud-init pinned at 23.4.4, and google-guest-agent install from proposed. The test would be:
1. Create an instance with startup script as above
$ gcloud compute instances create startup-test --image [IMAGE_NAME] --image-project [IMAGE PROJECT] --metadata-from-file=startup-script=startup_script.sh
2. SSH into the instance and verify pro/cloud-init/google-guest-agent versions/source
> pro --version
32.1~[RELEASE]
> cloud-init --version
/usr/bin/cloud-init 23.4.4-0ubuntu0~[RELEASE]
> apt-cache policy google-guest-agent
[ensure from -proposed]
3. Verify startup script ran correctly after cloud-config.service.
> diff /tmp/startup-sources.list /etc/apt/sources.list
>
[ Where problems could occur ]
Since this introduces a new ordering constraint, it will likely have performance impacts (google-startup-scripts will run later). This seems preferable to breaking a subset of startup scripts in some situations; it is not uncommon to use startup scripts to install packages so it's important for the mirrors to be correctly configured.
[ Other Info ]
Original bug report retained below.
New GCP dailies are failing startup-script tests, due to configuration via cloud-init not being fully completed, apt sources for example, when startup scripts are run. The failure can be reproduced as follows:
Using startup_script.sh:
#!/bin/bash
cp /etc/apt/sources.list /tmp/startup-sources.list
$ gcloud compute instances create startup-test --image daily-ubuntu-2204-jammy-v20240314 --image-project ubuntu-os-cloud-devel --metadata-from-file=startup-script=startup_script.sh
[...]
$ ssh [INSTANCE IP]
> diff /tmp/startup-sources.list /etc/apt/sources.list
0a1,8
> ## Note, this file is written by cloud-init on first boot of an instance
> ## modifications made here will not survive a re-bundle.
> ## if you wish to make changes you can:
> ## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg
> ## or do the same in user-data
> ## b.) add sources in /etc/apt/sources.list.d
> ## c.) make changes to template file /etc/cloud/templates/sources.list.tmpl
>
3,4c11,12
< deb http://archive.ubuntu.com/ubuntu/ jammy main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy main restricted
---
> deb http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
> # deb-src http://us-central1.gce.archive.ubuntu.com/ubuntu/ jammy main restricted
8,9c16,17
< deb http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
< # deb-src http://archive.ubuntu.com/ubuntu/ jammy-updates main restricted
---
[...]
On earlier images (such as ubuntu-2204-jammy-v20240307 in ubuntu-os-cloud) do not show this behaviour. The change is due to a change in ubuntu-pro 31 (see https://github.com/canonical/ubuntu-pro-client/blob/dfe1f1ed4678c50240d4e251f41d33bb4034135e/debian/changelog#L40 for details) that removes a systemd ordering on cloud-config.service. As side effect of this change was the removal of cloud-config.service (and ubuntu-advantage.service) from systemd's critical chain.
On v20240307 (startup scripts execute correctly):
catred@startup-test-control:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +18.262s
└─multi-user.target @28.480s
└─ubuntu-advantage.service @28.480s
└─cloud-config.service @27.372s +1.095s
└─snapd.seeded.service @20.048s +7.312s
└─snapd.service @12.469s +7.555s
└─basic.target @11.558s
└─sockets.target @11.540s
└─snap.lxd.daemon.unix.socket @24.376s
└─sysinit.target @10.825s
└─cloud-init.service @8.432s +2.267s
└─systemd-networkd-wait-online.service @6.467s +1.935s
└─systemd-networkd.service @6.347s +112ms
└─network-pre.target @6.328s
└─cloud-init-local.service @4.309s +2.006s
└─systemd-remount-fs.service @1.829s +68ms
└─systemd-fsck-root.service @1.587s +160ms
└─systemd-journald.socket @1.292s
└─system.slice @1.068s
└─-.slice @1.068s
On v20240314 (startup scripts fail):
catred@startup-test:~$ systemd-analyze critical-chain google-startup-scripts.service
The time when unit became active or started is printed after the "@" characte>
The time the unit took to start is printed after the "+" character.
google-startup-scripts.service +260ms
└─multi-user.target @29.237s
└─chrony.service @30.240s +56ms
└─basic.target @13.364s
└─sockets.target @13.225s
└─snap.lxd.user-daemon.unix.socket @26.765s
└─sysinit.target @12.550s
└─cloud-init.service @7.933s +4.503s
└─systemd-networkd-wait-online.service @6.741s +1.171s
└─systemd-networkd.service @6.593s +124ms
└─network-pre.target @6.573s
└─cloud-init-local.service @4.478s +2.083s
└─systemd-remount-fs.service @1.717s +64ms
└─systemd-fsck-root.service @1.510s +95ms
└─systemd-journald.socket @1.193s
└─-.mount @974ms
└─-.slice @974ms
This can be fixed by adding an explict `After=cloud-config.service` to the google-startup-scripts.service file, which enforces the correct ordering between google-startup-scripts and cloud-init. |
|
2024-07-16 18:51:14 |
Brian Murray |
google-guest-agent (Ubuntu Mantic): status |
New |
Won't Fix |
|
2024-07-24 12:41:07 |
Launchpad Janitor |
google-guest-agent (Ubuntu): status |
Confirmed |
Fix Released |
|
2024-07-26 17:59:07 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~kajiya/+git/google-guest-agent/+merge/470200 |
|
2024-07-26 18:00:17 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~kajiya/+git/google-guest-agent/+merge/470201 |
|
2024-08-09 06:57:28 |
Timo Aaltonen |
google-guest-agent (Ubuntu Noble): status |
Confirmed |
Fix Committed |
|
2024-08-09 06:57:29 |
Timo Aaltonen |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2024-08-09 06:57:32 |
Timo Aaltonen |
bug |
|
|
added subscriber SRU Verification |
2024-08-09 06:57:36 |
Timo Aaltonen |
tags |
patch |
patch verification-needed verification-needed-noble |
|
2024-08-09 06:59:55 |
Timo Aaltonen |
google-guest-agent (Ubuntu Jammy): status |
New |
Fix Committed |
|
2024-08-09 06:59:59 |
Timo Aaltonen |
tags |
patch verification-needed verification-needed-noble |
patch verification-needed verification-needed-jammy verification-needed-noble |
|
2024-08-09 07:01:50 |
Timo Aaltonen |
google-guest-agent (Ubuntu Focal): status |
New |
Fix Committed |
|
2024-08-09 07:01:58 |
Timo Aaltonen |
tags |
patch verification-needed verification-needed-jammy verification-needed-noble |
patch verification-needed verification-needed-focal verification-needed-jammy verification-needed-noble |
|
2024-08-22 19:21:27 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~kajiya/+git/google-guest-agent/+merge/471839 |
|
2024-08-22 20:11:11 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~kajiya/+git/google-guest-agent/+merge/471840 |
|
2024-08-27 14:57:36 |
Chloé Smith |
tags |
patch verification-needed verification-needed-focal verification-needed-jammy verification-needed-noble |
patch verification-done verification-done-focal verification-done-jammy verification-done-noble |
|
2024-08-28 10:12:51 |
Launchpad Janitor |
google-guest-agent (Ubuntu Noble): status |
Fix Committed |
Fix Released |
|
2024-08-28 10:12:59 |
Łukasz Zemczak |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2024-08-28 10:15:28 |
Launchpad Janitor |
google-guest-agent (Ubuntu Jammy): status |
Fix Committed |
Fix Released |
|
2024-08-28 10:18:59 |
Launchpad Janitor |
google-guest-agent (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|
2024-08-29 20:14:21 |
Launchpad Janitor |
merge proposal linked |
|
https://code.launchpad.net/~kajiya/+git/google-guest-agent/+merge/472244 |
|