cloud-init not restarting ssh service after writing sshd_config
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
High
|
James Falcon |
Bug Description
cloud-init 22.2 introduced a race condition with ssh.service since it added a systemctl call to check if service is stopped/
I believed this is the commit that introduced the issue:
https:/
I've attached the cloud-init.log and the auth.log showing the time ssh service started.
From cloud-init.log - the call to check ssh service status happened at 22:44:43,630, when cloud-init wrote the file sshd_config at 22:44:51, ssh service already started. There's a strange 8s delay from systemctl that might have to do with systemd or the condition of the VM. Regardless, the race is definitely there.
2022-11-22 22:44:43,630 - subp.py[DEBUG]: Running command ['systemctl', 'status', 'ssh'] with allowed return codes [0] (shell=False, capture=True)
2022-11-22 22:44:51,116 - cc_set_
2022-11-22 22:44:51,116 - util.py[DEBUG]: Reading from /etc/ssh/
2022-11-22 22:44:51,116 - util.py[DEBUG]: Read 3546 bytes from /etc/ssh/
2022-11-22 22:44:51,116 - ssh_util.py[DEBUG]: line 55: option PasswordAuthent
2022-11-22 22:44:51,117 - util.py[DEBUG]: Writing to /etc/ssh/
Here's what the customer gave us from their VM:
ssh.service - OpenBSD Secure Shell server
Loaded: loaded (/lib/systemd/
Active: active (running) since Tue 2022-11-22 22:44:43 UTC; 6 days ago
sshd_config file changed after only few seconds.
-rw------- 1 root root 3547 2022-11-22 22:44:51.113697898 +0000 /etc/ssh/
Confirmed race in logs comparison between auth.log and the systemctl status output info vs cloud-init.log thanks Anh.
It is strange though that we see an 8 second delay getting the response from the `systemctl status ssh` check and the immediate log line that follows that response. T
We can do one of two things:
1. shift the point in time for our systemctl status ssh to inspect and restart ssh service only after we are certain we have successfully made ssh config changes.
2. check something akin to `systemctl is-enabled ssh`, but we would also need to handle socket-activated ssh status on Ubuntu systems in 22.10 and later as systemctl is-enabled ssh will return 'disabled' in those states.
Minimally, I think we reduce exposure to this particular race by moving the logic for the status ssh check after the write/update of sshd_config.
But, ideally, we probably want cloud-init to grow awareness of whether or not ssh service is enabled, but not yet started. distros. manage_ service( )
This solution will likely take a bit of time and touch implementation details for both systemd and non-systemd environments via cloudinit.