"System is booting up. Unprivileged users are not permitted to log in yet." causes wait subcommand to fail

Bug #2039441 reported by Thibf
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
uvtool
Fix Committed
Undecided
Unassigned
cloud-init (Ubuntu)
Fix Committed
Undecided
Unassigned
Focal
Fix Committed
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned
Lunar
Fix Committed
Undecided
Unassigned
Mantic
Fix Committed
Undecided
Unassigned
uvtool (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Agathe Porte
Jammy
Fix Released
Undecided
Agathe Porte
Lunar
Won't Fix
Undecided
Unassigned
Mantic
Fix Released
Undecided
Agathe Porte

Bug Description

[ Impact ]

The bug breaks `uvt-kvm wait` promise that the virtualized system is up and running. `uvt-kvm` naively waited for the ssh server to be up, but if the
ssh server is up before authentication (pam notably) is ready, the system is still not ready to use.
For that reason, all automation relying on it breaks and report failures blaming the underlying image instead of waiting for it to be really ready.
This fix will make `uvt-kvm wait` retry in case the login fails.

[ Test Plan ]

Use the patched uvt-kvm to start then wait for this image to come up:
http://cloud-images.ubuntu.com/releases/focal/release-20231011/ubuntu-20.04-server-cloudimg-amd64.img
This image was know to trigger this error, so verify ten times we can: start it, wait then ssh without any failure.
Verify also this with a previously working image such as http://cloud-images.ubuntu.com/daily/server/focal/20231003/focal-server-cloudimg-amd64.img

[ Where problems could occur ]

The retry mechanism could be triggered for other reasons than login error and slow down the whole process.

[ Original Bug Report ]
After doing `uvt-kvm wait`, we can expect to be able to ssh into the VMs.
That's not always the case as the ssh port can be up before PAM is setup:
`System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8).`
This means that subsequent programs can't rely on `uvt-kvm wait` to know if the system is up, which defeats the purpose of this function and drives the complexity up in highly automated environment.

Personally, I see two ways to fix the wait to handle this case:
- Change the behavior of the created VM to avoid this edge case.
- Makes `uvt-kvm wait` smarter by actually establishing a communication to check if we really can login.

The last option seems less intrusive but will make the library more complex.
I'm not convinced that would be a reasonable default or would be better as an option to `uvt-kvm wait`.

Related branches

Revision history for this message
Robie Basak (racb) wrote :

Thank you for the report. I have seen the same thing happening in CI over the past few days. It seems to be some change in behaviour, since that CI was working before. Do you see the same? Any idea what changed, or is it just the change in an outcome of a race condition?

The entire point of `uvt-kvm wait` is to wrap the messy business so the caller doesn't have to worry about it, so I think we do need to adjust it to handle this scenario correctly. Perhaps the easiest way would be a string match on the output in case of failure, and keep retrying if there is a match (with a timeout in case the condition persists).

Changed in uvtool:
status: New → Triaged
summary: - System is booting up prevent ssh at startup
+ "System is booting up. Unprivileged users are not permitted to log in
+ yet." causes wait subcommand to fail
Revision history for this message
Thibf (thibf) wrote :

No idea what changed, I would guess it's result of a race condition which is more likely to trigger due to unrelated changed. I don't have the issue every time, but much more.

I think we can reuse the existing connection timeout and handle this string matching error as if we failed to establish communication.

Revision history for this message
Thibf (thibf) wrote :

I just noticed there is an uvtool/wait.py and uvtool/libvirt/kvm.py:780L:ssh.
I confused both of them. Ignore my previous comment part on the timeout. I don't see any timeout here.

Revision history for this message
Robie Basak (racb) wrote :
Download full text (3.2 KiB)

I tried uvt-kvm wait three times against the following image and it consistently works:

http://cloud-images.ubuntu.com/daily/server/focal/20231003/focal-server-cloudimg-amd64.img

I tried uvt-kvm wait three times against the following image and it consistently fails:

http://cloud-images.ubuntu.com/releases/focal/release-20231011/ubuntu-20.04-server-cloudimg-amd64.img

Differences in package versiosn between these images are as follows:

$ diff -u0 good bad
--- good 2023-10-16 16:13:54.245903813 +0000
+++ bad 2023-10-16 16:14:04.801959310 +0000
@@ -30 +30 @@
-cloud-init 23.2.2-0ubuntu0~20.04.1
+cloud-init 23.3.1-0ubuntu1~20.04.1
@@ -43 +43 @@
-curl 7.68.0-1ubuntu2.19
+curl 7.68.0-1ubuntu2.20
@@ -101,2 +101,2 @@
-grub-efi-amd64-bin 2.06-2ubuntu14.2
-grub-efi-amd64-signed 1.187.4~20.04.1+2.06-2ubuntu14.2
+grub-efi-amd64-bin 2.06-2ubuntu14.4
+grub-efi-amd64-signed 1.187.6~20.04.1+2.06-2ubuntu14.4
@@ -175,2 +175,2 @@
-libcurl3-gnutls:amd64 7.68.0-1ubuntu2.19
-libcurl4:amd64 7.68.0-1ubuntu2.19
+libcurl3-gnutls:amd64 7.68.0-1ubuntu2.20
+libcurl4:amd64 7.68.0-1ubuntu2.20
@@ -380,8 +380,8 @@
-linux-headers-5.4.0-163 5.4.0-163.180
-linux-headers-5.4.0-163-generic 5.4.0-163.180
-linux-headers-generic 5.4.0.163.160
-linux-headers-virtual 5.4.0.163.160
-linux-image-5.4.0-163-generic 5.4.0-163.180
-linux-image-virtual 5.4.0.163.160
-linux-modules-5.4.0-163-generic 5.4.0-163.180
-linux-virtual 5.4.0.163.160
+linux-headers-5.4.0-164 5.4.0-164.181
+linux-headers-5.4.0-164-generic 5.4.0-164.181
+linux-headers-generic 5.4.0.164.161
+linux-headers-virtual 5.4.0.164.161
+linux-image-5.4.0-164-generic 5.4.0-164.181
+linux-image-virtual 5.4.0.164.161
+linux-modules-5.4.0-164-generic 5.4.0-164.181
+linux-virtual 5.4.0.164.161
@@ -579,4 +579,4 @@
-vim 2:8.1.2269-1ubuntu5.17
-vim-common 2:8.1.2269-1ubuntu5.17
-vim-runtime 2:8.1.2269-1ubuntu5.17
-vim-tiny 2:8.1.2269-1ubuntu5.17
+vim 2:8.1.2269-1ubuntu5.18
+vim-common 2:8.1.2269-1ubuntu5.18
+vim-runtime 2:8.1.2269-1ubuntu5.18
+vim-tiny 2:8.1.2269-1ubuntu5.18
@@ -589 +589 @@
-xxd 2:8.1.2269-1ubuntu5.17
+xxd 2:8.1.2269-1ubuntu5.18

cloud-init seems like the most likely change to be affecting this issue.

So then I tried downgrading cloud-init to 23.2.2-0ubuntu0~20.04.1 in 20231011 using mount-image-callback. I tried uvt-kvm wait against this three times, and it consistently works.

To try to eliminate me accidentally fixing the issue due to my method itself, I tried reinstalling cloud-init 23.3.1-0ubuntu1~20.04.1 using mount-image-callback. I tried uvt-kvm wait against this three times, and it consistently fails.

Conclusion: something changed in cloud-init 23.3.1-0ubuntu1~20.04.1 to consistently cause early ssh to fail, whereas it did not before. This could be that the ssh listening socket is open before pam_nologin is deactivated whereas this didn't happen before, or because the window when it is active has grown. The former suggests a behavioural change; the latter suggests a "time to ssh" boot speed regression. So it seems appropriate to add a cloud-init regression task to this bug.

Neverthel...

Read more...

tags: added: regression-update
Revision history for this message
Robie Basak (racb) wrote :

> So it seems appropriate to add a cloud-init regression task to this bug.

To be clear, I'm not claiming that this definitely is a problem in cloud-init; just that appears it may be and warrants further investigation.

Revision history for this message
Thibf (thibf) wrote :

Looking a bit into cloud-init between 23.2.2 and 23.3.1 and found this:

```
commit b3c9b6a79c85ebc8c87908383c34b0314c2205b6
Refs: 23.2-53-gb3c9b6a7
Author: Brett Holman <email address hidden>
AuthorDate: Mon Jul 10 08:36:31 2023 -0600
Commit: GitHub <email address hidden>
CommitDate: Mon Jul 10 08:36:31 2023 -0600

    systemd: Block login until config stage completes (#2111)

    This is required to prevent login prompt prior to user provisioning.
    Performance testing reveals no regression.

    LP: #2013403
---
```

Which seems like a good culprit.

Revision history for this message
Chad Smith (chad.smith) wrote :

Yes the above comment/link is the source of this condition we now are affected on in uvt-kvm wait. Sorry, we are chatting on this internally.

I'm not certain this is truly a bug that cloud-init should fix/revert in this case as the solution was closing a different gap which incorrectly allowed logins prompts prior to cloud-init's default user being created and passwords being setup.

The change we landed in cloud-init was to shift the `Before=systemd-user-sessions.service` from ordering defined in cloud-init.service (earlier boot stage). To cloud-config.service (where ssh_import_id) is performed. Ultimately, this pushed back systemd-user-sessions.service which is the service that cleans up /run/nologin (which contains that unprivileged login rjection message). So now uvt-kvm wait is hitting a case where ssh port 22 is being brought up by ssh.service, beforce /run/nologin is removed by systemd-user-sessions.service and we get a different failure path/behavior.

Revision history for this message
Brett Holman (holmanb) wrote :

This condition was possible prior to the change in cloud-init. Cloud-init's ordering change just revealed this unhandled condition in uvtool.

From systemd-user-sessions.service(8):

> After basic system initialization is complete, it removes /run/nologin, thus permitting logins.

From uvt-kvm(1):

> Wait for a VM to become ready....

Given the purpose and scope of uvtool's `uvt-kvm wait` and systemd-user-sessions.service, this looks like a condition that uvtool should really be able to handle; when logins are not permitted the VM is not "ready".

Cloud-init made the change in b3c9b6a79c to fix a real bug caused by implicit systemd ordering assumptions proved invalid (observed on odd architectures and virtualized instruction sets).

Ultimately, this isn't cloud-init's responsibility, and uvtool should likewise be fixed to handle this condition.

That said, due to LP: #2039505, this change will temporarily be reverted until snapd is no longer an ordering dependency, since #2039505 is a serious regression.

Advise this delay be used to fix uvtool such that when cloud-init re-implements this change uvtool works seamlessly.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cloud-init (Ubuntu):
status: New → Confirmed
Revision history for this message
Robie Basak (racb) wrote :

Fixed in uvtool commit 35dc66 in the main branch. This needs Ubuntu uploads and SRUs.

Changed in uvtool:
status: Triaged → Fix Committed
Changed in uvtool (Ubuntu):
status: New → Triaged
Changed in uvtool (Ubuntu Focal):
status: New → Triaged
Changed in uvtool (Ubuntu Jammy):
status: New → Triaged
Changed in uvtool (Ubuntu Lunar):
status: New → Triaged
Changed in uvtool (Ubuntu Mantic):
status: New → Triaged
Robie Basak (racb)
Changed in uvtool (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package uvtool - 0~git183-0ubuntu1

---------------
uvtool (0~git183-0ubuntu1) noble; urgency=medium

  [ Lena Voytek ]
  * Swap simplestreams test comments to allowlist for inclusive naming

  [ Andrei Gherzan ]
  * Fix "uvt-kvm list" behaviour on non-login shells (LP: #2002530):
    - Clarify comment for the list command
    - Force virsh for the qemu system daemon
    - Refactor libvirt connection usage

  [ Thibault Ferrante ]
  * Continue waiting while pam_nologin is active in the VM (LP: #2039441)

 -- Robie Basak <email address hidden> Tue, 31 Oct 2023 09:11:32 +0000

Changed in uvtool (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Alberto Contreras (aciba) wrote :

The cloud-init commits provoking this new behavior / issue were reverted as part of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/2039505/comments/1.

Changed in cloud-init (Ubuntu):
status: Confirmed → Fix Committed
Changed in cloud-init (Ubuntu Focal):
status: New → Fix Committed
Changed in cloud-init (Ubuntu Jammy):
status: New → Fix Committed
Changed in cloud-init (Ubuntu Lunar):
status: New → Fix Committed
Changed in cloud-init (Ubuntu Mantic):
status: New → Fix Committed
James Page (james-page)
Changed in uvtool (Ubuntu Mantic):
assignee: nobody → Agathe Porte (gagath)
Revision history for this message
James Page (james-page) wrote :

Marking Lunar task as Won't Fix as so close to the end of support for this series.

Changed in uvtool (Ubuntu Lunar):
status: Triaged → Won't Fix
James Page (james-page)
description: updated
Revision history for this message
Agathe Porte (gagath) wrote :

Attached debdiff for mantic SRU

description: updated
Changed in uvtool (Ubuntu Jammy):
assignee: nobody → Agathe Porte (gagath)
Changed in uvtool (Ubuntu Focal):
assignee: nobody → Agathe Porte (gagath)
Revision history for this message
Agathe Porte (gagath) wrote :

Attached debdiff for jammy SRU

Revision history for this message
Agathe Porte (gagath) wrote :

Attached debdiff for focal SRU

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "uvtool_mantic.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Revision history for this message
James Page (james-page) wrote :

Updates for mantic, jammy and focal sponsored into the UNAPPROVED queue ready for SRU team review.

Revision history for this message
Robie Basak (racb) wrote : Please test proposed package

Hello Thibf, or anyone else affected,

Accepted uvtool into mantic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/uvtool/0~git178-0ubuntu1.23.10.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-mantic to verification-done-mantic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-mantic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in uvtool (Ubuntu Mantic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-mantic
Changed in uvtool (Ubuntu Jammy):
status: Triaged → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Robie Basak (racb) wrote :

Hello Thibf, or anyone else affected,

Accepted uvtool into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/uvtool/0~git178-0ubuntu1.22.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in uvtool (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Robie Basak (racb) wrote :

Hello Thibf, or anyone else affected,

Accepted uvtool into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/uvtool/0~git169-0ubuntu1.20.04.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Thibf (thibf) wrote :

I tested 0~git178-0ubuntu1.23.10.1 on Mantic successfully with the failed images specified here: https://bugs.launchpad.net/uvtool/+bug/2039441/comments/4

I tested 0~git178-0ubuntu1.22.04.1 on Jammy successfully with the failed images specified here: https://bugs.launchpad.net/uvtool/+bug/2039441/comments/4

I tested 0~git169-0ubuntu1.20.04.1 on Focal successfully with the failed images specified here: https://bugs.launchpad.net/uvtool/+bug/2039441/comments/4

Every time I checked the actual failure before upgrading to the fixed version in the proposed pocket.

Agathe Porte (gagath)
tags: added: verification-done verification-done-focal verification-done-jammy verification-done-mantic
removed: verification-needed verification-needed-focal verification-needed-jammy verification-needed-mantic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package uvtool - 0~git178-0ubuntu1.23.10.1

---------------
uvtool (0~git178-0ubuntu1.23.10.1) mantic; urgency=medium

  [ Thibault Ferrante ]
  * Continue waiting while pam_nologin is active in the VM (LP: #2039441)

 -- Agathe Porte <email address hidden> Fri, 02 Feb 2024 10:34:16 +0100

Changed in uvtool (Ubuntu Mantic):
status: Fix Committed → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Update Released

The verification of the Stable Release Update for uvtool has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package uvtool - 0~git178-0ubuntu1.22.04.1

---------------
uvtool (0~git178-0ubuntu1.22.04.1) jammy; urgency=medium

  [ Thibault Ferrante ]
  * Continue waiting while pam_nologin is active in the VM (LP: #2039441)

 -- Agathe Porte <email address hidden> Fri, 02 Feb 2024 10:43:47 +0100

Changed in uvtool (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package uvtool - 0~git169-0ubuntu1.20.04.1

---------------
uvtool (0~git169-0ubuntu1.20.04.1) focal; urgency=medium

  [ Thibault Ferrante ]
  * Continue waiting while pam_nologin is active in the VM (LP: #2039441)

 -- Agathe Porte <email address hidden> Fri, 02 Feb 2024 10:53:45 +0100

Changed in uvtool (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.