"System is booting up. Unprivileged users are not permitted to log in yet." causes wait subcommand to fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
uvtool |
Fix Committed
|
Undecided
|
Unassigned | ||
cloud-init (Ubuntu) |
Fix Committed
|
Undecided
|
Unassigned | ||
Focal |
Fix Committed
|
Undecided
|
Unassigned | ||
Jammy |
Fix Committed
|
Undecided
|
Unassigned | ||
Lunar |
Fix Committed
|
Undecided
|
Unassigned | ||
Mantic |
Fix Committed
|
Undecided
|
Unassigned | ||
uvtool (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Agathe Porte | ||
Jammy |
Fix Released
|
Undecided
|
Agathe Porte | ||
Lunar |
Won't Fix
|
Undecided
|
Unassigned | ||
Mantic |
Fix Released
|
Undecided
|
Agathe Porte |
Bug Description
[ Impact ]
The bug breaks `uvt-kvm wait` promise that the virtualized system is up and running. `uvt-kvm` naively waited for the ssh server to be up, but if the
ssh server is up before authentication (pam notably) is ready, the system is still not ready to use.
For that reason, all automation relying on it breaks and report failures blaming the underlying image instead of waiting for it to be really ready.
This fix will make `uvt-kvm wait` retry in case the login fails.
[ Test Plan ]
Use the patched uvt-kvm to start then wait for this image to come up:
http://
This image was know to trigger this error, so verify ten times we can: start it, wait then ssh without any failure.
Verify also this with a previously working image such as http://
[ Where problems could occur ]
The retry mechanism could be triggered for other reasons than login error and slow down the whole process.
[ Original Bug Report ]
After doing `uvt-kvm wait`, we can expect to be able to ssh into the VMs.
That's not always the case as the ssh port can be up before PAM is setup:
`System is booting up. Unprivileged users are not permitted to log in yet. Please come back later. For technical details, see pam_nologin(8).`
This means that subsequent programs can't rely on `uvt-kvm wait` to know if the system is up, which defeats the purpose of this function and drives the complexity up in highly automated environment.
Personally, I see two ways to fix the wait to handle this case:
- Change the behavior of the created VM to avoid this edge case.
- Makes `uvt-kvm wait` smarter by actually establishing a communication to check if we really can login.
The last option seems less intrusive but will make the library more complex.
I'm not convinced that would be a reasonable default or would be better as an option to `uvt-kvm wait`.
Related branches
- Robie Basak: Approve
- Brett Holman (community): Approve
-
Diff: 73 lines (+34/-21)1 file modifieduvtool/libvirt/kvm.py (+34/-21)
Changed in uvtool (Ubuntu): | |
status: | Triaged → Fix Committed |
Changed in uvtool (Ubuntu Mantic): | |
assignee: | nobody → Agathe Porte (gagath) |
description: | updated |
tags: |
added: verification-done verification-done-focal verification-done-jammy verification-done-mantic removed: verification-needed verification-needed-focal verification-needed-jammy verification-needed-mantic |
Thank you for the report. I have seen the same thing happening in CI over the past few days. It seems to be some change in behaviour, since that CI was working before. Do you see the same? Any idea what changed, or is it just the change in an outcome of a race condition?
The entire point of `uvt-kvm wait` is to wrap the messy business so the caller doesn't have to worry about it, so I think we do need to adjust it to handle this scenario correctly. Perhaps the easiest way would be a string match on the output in case of failure, and keep retrying if there is a match (with a timeout in case the condition persists).