ssh host keys deleted by cloud-init between sshd-keygen and sshd start

Bug #1995609 reported by Gabriel PREDA
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Invalid
Undecided
Unassigned

Bug Description

This happened on a CentOS Stream 8.

I created an AWS instance from a snapshot of another instance.
Upon start I was unable to login via SSH because it failed to start.

Upon log investigation I found out that cloud-init deleted the files from /etc/ssh/ssh_host_* between `sshd-keygen.target` and starting of OpenSSH.

I recovered the instance in another way but I dug the logs.
Here are the logs extracts:

messages:
Nov 3 08:30:38 ip-172-21-3-249 systemd[1]: Reached target sshd-keygen.target.

cloud-init.log:
2022-11-03 08:31:02,307 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key
2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ed25519_key.pub
2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key
2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_ecdsa_key.pub
2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key
2022-11-03 08:31:02,308 - util.py[DEBUG]: Attempting to remove /etc/ssh/ssh_host_rsa_key.pub

messages:
Nov 3 08:31:02 ip-172-21-3-249 systemd[1]: Starting OpenSSH server daemon...
Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_rsa_key
Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_ecdsa_key
Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: Unable to load host key: /etc/ssh/ssh_host_ed25519_key
Nov 3 08:31:03 ip-172-21-3-249 sshd[1337]: sshd: no hostkeys available -- exiting.
Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Main process exited, code=exited, status=1/FAILURE
Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: sshd.service: Failed with result 'exit-code'.
Nov 3 08:31:03 ip-172-21-3-249 systemd[1]: Failed to start OpenSSH server daemon.

The cloud-init file has the right dependencies:

[root@ip-172-21-3-249 log]# more /usr/lib/systemd/system/cloud-init.service
[Unit]
Description=Initial cloud-init job (metadata service crawler)
DefaultDependencies=no
Wants=cloud-init-local.service
Wants=sshd-keygen.service
Wants=sshd.service
After=cloud-init-local.service
After=systemd-networkd-wait-online.service
After=network.service
After=NetworkManager.service
Before=network-online.target
Before=sshd-keygen.service
Before=sshd.service
Before=systemd-user-sessions.service

[Service]
Type=oneshot
ExecStart=/usr/bin/cloud-init init
RemainAfterExit=yes
TimeoutSec=0

# Output needs to appear in instance console output
StandardOutput=journal+console

[Install]
WantedBy=cloud-init.target

But I wonder if they still work for SystemD templates:

[root@ip-172-21-3-249 log]# systemctl status sshd-keygen.service
Unit sshd-keygen.service could not be found.
[root@ip-172-21-3-249 log]# systemctl status sshd-keygen@.service
Failed to get properties: Unit name sshd-keygen@.service is neither a valid invocation ID nor unit name.
[root@ip-172-21-3-249 log]# systemctl status sshd-keygen@
<email address hidden> <email address hidden> <email address hidden> ««« there are 3 services each for it's key type.

I can see that the keygen is disabled here because cloud-init is disabled:

[root@ip-172-21-3-249 log]# systemctl status <email address hidden>
● <email address hidden> - OpenSSH ed25519 Server Key Generation
   Loaded: loaded (/usr/lib/systemd/system/sshd-keygen@.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/sshd-keygen@.service.d
           └─disable-sshd-keygen-if-cloud-init-active.conf
   Active: inactive (dead)
Condition: start condition failed at Thu 2022-11-03 10:18:28 UTC; 3h 4min ago
           └─ ConditionPathExists=!/run/systemd/generator.early/multi-user.target.wants/cloud-init.target was not met

How can we ensure this does not happen in the future?

Revision history for this message
Gabriel PREDA (gabriel-e-radical) wrote :

[root@ip-172-21-3-249 ~]# cloud-init --version
/usr/bin/cloud-init 22.1-5.el8

Revision history for this message
Brett Holman (holmanb) wrote :

Hi Gabriel,

Thank you for reporting.

Cloud-init should generate keys, which is why sshd-keygen is disabled when cloud-init is enabled. Can you please run `sudo cloud-init collect-logs` and upload the tarball so we can figure out why your keys are not getting generated?

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Gabriel PREDA (gabriel-e-radical) wrote :

Obfuscated cloud init logs

Revision history for this message
Gabriel PREDA (gabriel-e-radical) wrote :

Hi Brett,

Thanx for the fast reply.

I've attached obfuscated logs.

Unfortunately I'll terminate the machine today as we don't need it anymore.
If the reasons are not clear from the logs we can close the ticket.
If it happens again I'll came back.

Revision history for this message
Jay A (jatabe) wrote :

We also encountered this problem. Exactly from a restored ec2 instance. SSH keys got deleted and never got regenerated.
We noticed the problem right after upgrading to version cloud-init 22.1-5.el8

Revision history for this message
James Falcon (falcojr) wrote :

I'm looking into this, but I'm not exactly sure what's happening based on the logs, and I can't reproduce the issue on ec2. Does your user data include the `ssh_keys:` key for specifying host keys? If so, can you post here that section of user data with any sensitive data redacted?

Revision history for this message
Gabriel PREDA (gabriel-e-radical) wrote :

EC2 instances are started w/ Ansible which injects the following `cloud-init.j2`:

»»»»»»»»»»»»»»»» start cloud-init.j2
#cloud-config
# vim: syntax=yaml
repo_update: true
repo_upgrade: all

groups:
  - {{ devops_group }}

users:
{% for i in company_devops %}
  - name: {{ i }}
    gecos: {{ company_employees[i].name }}
    primary_group: {{ devops_group }}
    groups: wheel
    no_user_group: True
    ssh_authorized_keys:
{% for key in company_employees[i].pub_keys %}
      - {{ key }}
{% endfor %}
{% endfor %}

runcmd:
  - echo 'JXdoZWVsCUFMTD0oQUxMKQlOT1BBU1NXRDogQUxMCg==' | base64 -d > /etc/sudoers.d/wheel
  - chage -E -1 -d 01/01/2017 -M 99999 root
  - rm -f /root/anaconda-ks.cfg /root/original-ks.cfg
  - /usr/bin/hostnamectl set-hostname {{ inventory_hostname }}
  - echo yes > {{ machine_is_ansible_ready_file }}

»»»»»»»»»»»»»»»» end cloud-init.j2

The initial machine is started w/ this.
From that machine a snapshot is created.
Another machine is started from the snapshot and this latter machine has this issue.

Revision history for this message
Jay A (jatabe) wrote :

This line in cloud.cfg

ssh_deletekeys: true
ssh_genkeytypes: ['rsa', 'ecdsa', 'ed25519']

got replaced with this after yum update.

ssh_deletekeys: 1
ssh_genkeytypes: ~

restoring the old, created a snapshot and restore that snapshot, I was able to see from the logs that keys are regenrated.

Revision history for this message
Jay A (jatabe) wrote :

This line in cloud.cfg

ssh_deletekeys: true
ssh_genkeytypes: ['rsa', 'ecdsa', 'ed25519']

got replaced with this after yum update.

ssh_deletekeys: 1
ssh_genkeytypes: ~

restoring the old config (cloud.cfg.rpmnew), created a snapshot and then restore that snapshot, I was able to see from the logs that keys are regenrated and can ssh to the server without problem

Revision history for this message
Alberto Contreras (aciba) wrote :

Thanks for the investigation work. As it sounds like a rpm-update bug, and we cannot reproduce it on our side, I mark the bug as invalid.

Changed in cloud-init:
status: Incomplete → Invalid
Revision history for this message
Gabriel PREDA (gabriel-e-radical) wrote :

Awesome, thanx Jay.

Revision history for this message
Alberto Contreras (aciba) wrote :

Could you, gabriel-e-radical or jatabe, please create a bug report on https://bugs.centos.org/main_page.php and link it here ?

Revision history for this message
Gabriel PREDA (gabriel-e-radical) wrote :

It's fixed in 22.1-6.el8 but upon update cloud.cfg.rpmnew is created so one might need to merge the configs.

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.