cloud-init seems like the most likely change to be affecting this issue.
So then I tried downgrading cloud-init to 23.2.2-0ubuntu0~20.04.1 in 20231011 using mount-image-callback. I tried uvt-kvm wait against this three times, and it consistently works.
To try to eliminate me accidentally fixing the issue due to my method itself, I tried reinstalling cloud-init 23.3.1-0ubuntu1~20.04.1 using mount-image-callback. I tried uvt-kvm wait against this three times, and it consistently fails.
Conclusion: something changed in cloud-init 23.3.1-0ubuntu1~20.04.1 to consistently cause early ssh to fail, whereas it did not before. This could be that the ssh listening socket is open before pam_nologin is deactivated whereas this didn't happen before, or because the window when it is active has grown. The former suggests a behavioural change; the latter suggests a "time to ssh" boot speed regression. So it seems appropriate to add a cloud-init regression task to this bug.
Nevertheless we should also fix this in uvt-kvm wait.
I tried uvt-kvm wait three times against the following image and it consistently works:
http:// cloud-images. ubuntu. com/daily/ server/ focal/20231003/ focal-server- cloudimg- amd64.img
I tried uvt-kvm wait three times against the following image and it consistently fails:
http:// cloud-images. ubuntu. com/releases/ focal/release- 20231011/ ubuntu- 20.04-server- cloudimg- amd64.img
Differences in package versiosn between these images are as follows:
$ diff -u0 good bad 0ubuntu0~ 20.04.1 0ubuntu1~ 20.04.1 amd64-signed 1.187.4~ 20.04.1+ 2.06-2ubuntu14. 2 amd64-signed 1.187.6~ 20.04.1+ 2.06-2ubuntu14. 4 gnutls: amd64 7.68.0-1ubuntu2.19 gnutls: amd64 7.68.0-1ubuntu2.20 headers- 5.4.0-163 5.4.0-163.180 headers- 5.4.0-163- generic 5.4.0-163.180 headers- generic 5.4.0.163.160 headers- virtual 5.4.0.163.160 image-5. 4.0-163- generic 5.4.0-163.180 image-virtual 5.4.0.163.160 modules- 5.4.0-163- generic 5.4.0-163.180 headers- 5.4.0-164 5.4.0-164.181 headers- 5.4.0-164- generic 5.4.0-164.181 headers- generic 5.4.0.164.161 headers- virtual 5.4.0.164.161 image-5. 4.0-164- generic 5.4.0-164.181 image-virtual 5.4.0.164.161 modules- 5.4.0-164- generic 5.4.0-164.181 1ubuntu5. 17 1ubuntu5. 17 1ubuntu5. 17 1ubuntu5. 17 1ubuntu5. 18 1ubuntu5. 18 1ubuntu5. 18 1ubuntu5. 18 1ubuntu5. 17 1ubuntu5. 18
--- good 2023-10-16 16:13:54.245903813 +0000
+++ bad 2023-10-16 16:14:04.801959310 +0000
@@ -30 +30 @@
-cloud-init 23.2.2-
+cloud-init 23.3.1-
@@ -43 +43 @@
-curl 7.68.0-1ubuntu2.19
+curl 7.68.0-1ubuntu2.20
@@ -101,2 +101,2 @@
-grub-efi-amd64-bin 2.06-2ubuntu14.2
-grub-efi-
+grub-efi-amd64-bin 2.06-2ubuntu14.4
+grub-efi-
@@ -175,2 +175,2 @@
-libcurl3-
-libcurl4:amd64 7.68.0-1ubuntu2.19
+libcurl3-
+libcurl4:amd64 7.68.0-1ubuntu2.20
@@ -380,8 +380,8 @@
-linux-
-linux-
-linux-
-linux-
-linux-
-linux-
-linux-
-linux-virtual 5.4.0.163.160
+linux-
+linux-
+linux-
+linux-
+linux-
+linux-
+linux-
+linux-virtual 5.4.0.164.161
@@ -579,4 +579,4 @@
-vim 2:8.1.2269-
-vim-common 2:8.1.2269-
-vim-runtime 2:8.1.2269-
-vim-tiny 2:8.1.2269-
+vim 2:8.1.2269-
+vim-common 2:8.1.2269-
+vim-runtime 2:8.1.2269-
+vim-tiny 2:8.1.2269-
@@ -589 +589 @@
-xxd 2:8.1.2269-
+xxd 2:8.1.2269-
cloud-init seems like the most likely change to be affecting this issue.
So then I tried downgrading cloud-init to 23.2.2- 0ubuntu0~ 20.04.1 in 20231011 using mount-image- callback. I tried uvt-kvm wait against this three times, and it consistently works.
To try to eliminate me accidentally fixing the issue due to my method itself, I tried reinstalling cloud-init 23.3.1- 0ubuntu1~ 20.04.1 using mount-image- callback. I tried uvt-kvm wait against this three times, and it consistently fails.
Conclusion: something changed in cloud-init 23.3.1- 0ubuntu1~ 20.04.1 to consistently cause early ssh to fail, whereas it did not before. This could be that the ssh listening socket is open before pam_nologin is deactivated whereas this didn't happen before, or because the window when it is active has grown. The former suggests a behavioural change; the latter suggests a "time to ssh" boot speed regression. So it seems appropriate to add a cloud-init regression task to this bug.
Nevertheless we should also fix this in uvt-kvm wait.