cloud-init regenerating ssh-keys

Bug #1885527 reported by Hadmut Danisch on 2020-06-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Undecided
Unassigned

Bug Description

Hi,

I made some experiments with virtual machines with Ubuntu-20.04 at a german cloud provider (Hetzner), who uses cloud-init to initialize machines with a basic setup such as ip and ssh access.

During my installation tests I had to reboot the virtual machines several times after installing or removing packages.

Occassionally (not always) I noticed that the ssh host keys have changed, ssh complained. After accepting the new host keys (insecure!) I found, that all key files in /etc/ssh had fresh mod times, i.e. were freshly regenerated.

This reminds me to a bug I had reported about cloud-init some time ago, where I could not change the host name permanently, because cloud-init reset it to it's initial configuration at every boot time (highly dangerous, because it seemed to reset passwords to their original state as well.

Although cloud-init is intended to do an initial configuration for the first boot only, it seems to remain on the system and – even worse: occasionally – change configurations.

I've never understood what's the purpose of cloud-init remaining active once after the machine is up and running.

Hadmut Danisch (hadmut) wrote :

BTW,

docs at https://cloudinit.readthedocs.io completely fail to tell what cloud-init actually is or is supposed to do.

It is not explaining that or why cloud-init survives the first boot and remains active for future boots, and what this is good for.

There is no warning, no hint, no information that cloud-init keeps continuously twiddeling with the system.

Scott Moser (smoser) wrote :

Hi, please attach the output of 'cloud-init collect-logs' when run on a system that demonstrates the problem.

cloud-init uses the "instance-id" from the metadata service to indicate a new instance. Some things run once per instance, some things run once per boot.

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Scott Moser (smoser) wrote :

after replying with collected logs, please set the status back to 'new'.
thanks for taking the time to file a bug.

Valery Tschopp (valery-tschopp) wrote :

We have a similar issue:

1. At first boot cloud-init generated the host ssh keys
2. The metadata service crashed and went down :(
3. At reboot, cloud-init can NOT reach the metadata service and regenerates the host ssh keys

Hadmut Danisch (hadmut) wrote :

I currently cannot give logs, since these were only temporary testing machines in a cloud, that existed only for tens of minutes to test installation procedures. I will supply logs as soon as a proceed with testing and the problem occurs again.

However, I do not understand and did not find any documentation about why cloud-init even remains active after first boot.

Descriptions like https://help.ubuntu.com/community/CloudInit or https://cloudinit.readthedocs.io/en/latest/ are just misleading as they suggest, that this is just about the initialization of the machine. They don't tell that cloud-init remains active and keeps manipulating the system.

I found this to be a severy security issue (which I reported in an earlier bug report for 18.04) when I could not permanently change the hostname of a machine, since cloud-init was resetting it with every reboot, and the file, where this was stored, was hidden deeply somewhere in /var. I'm afraid I cannot even change a password, since cloud-init might reset it to it's initial state.

I do consider it as a serious flaw and security problem just that cloud-init is behaving very differently from what's described in the documentation.

AGAIN: Why is cloud-init still manipulating the machine *after* initialization and first boot?

Changed in cloud-init (Ubuntu):
status: Incomplete → New
Scott Moser (smoser) wrote :

"AGAIN: Why is cloud-init still manipulating the machine *after* initialization and first boot?"

Because cloud-init thinks it is a "first boot". A supported use case for cloud-init is:
 * boot instance on cloud
 * ssh in
 * install some packages, prep this instance
 * stop instance
 * snapshot disk
 * register new image from disk
 * start new instances from this image

cloud-init will recognize that these instances are new instances, and initialize them. It recognizes this by comparing the cached value of 'instance-id' versus the current value of 'instance-id'. If they have changed, then you have a new instance.

The other reason for cloud-init to "remain active" is that it offers "per-boot" things.

Valery Tschopp (valery-tschopp) wrote :

We have no problem about cloud-init still being active on the machine after the first boot init.

My issue is:

On first instance boot, the metadata service is successfully contacted, and the initialisation succeed (host ssh key generated, hostname set, ...)

But on any reboot, if the metadata service is DOWN or not reachable for any reason, then cloud-init regenerates the host ssh keys.

My understanding is that determining if it is a first boot, or not, is only based on the cached instance-id, compared to the data received from the metadata service. So if the metadata service is DOWN or not reachable, cloud-init will always think it is a first boot, right?

Isn't it possible to make this test more robust?

Scott Moser (smoser) wrote :

@Valery,

Some cloud platforms provide the instance id via some non-network channel (dmi data is common). In those cases, cloud-init will check cached value versus the locally-available instance-id before looking for a network available datasource.

So, if Hetzner provides that information in some way, cloud-init can use it.

If not, the only options are to for the user to disable cloud-init (touch /etc/cloud/cloud-init.disabled) or set manual_cache_clean (https://bugs.launchpad.net/cloud-init/+bug/1712680/comments/11).

I'm not really sold on "what if the metadata service is DOWN" argument. Your cloud should not have its important services just fail. If it does, things are going to break. You could make a similar argument "What if DNS server is down?". I'm not discounting "Design for failure", and cloud-init could definitely do better here, but we need some support from the platform (locally available instance-id) to do better without sacrificing design goals.

Dan Watkins (oddbloke) wrote :

If Hetzner has (or starts to provide) a way of determining instance ID without using the network, we'd be more than happy to accept patches to use that in cloud-init. However, as it sounds like the issue here is Hetzner's internal services being unreliable, rather than a cloud-init issue, I'm going to mark this Incomplete. If you think this is unreasonable, please comment and change the status back to New.

Thanks!

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Changed in cloud-init (Ubuntu):
status: Incomplete → New
Hadmut Danisch (hadmut) wrote :

Yes, I do think this is unreasonable.

It is definitely not Hetzner's task to fix Ubuntu.

Especially since that process of re-initiialization of that instance ID is neither obvious nor documented.

Looking at

https://cloudinit.readthedocs.io/

I did not yet find an explanation of what is going on, and at

https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html

it just says

v1.instance_id

Unique instance_id allocated by the cloud.

Examples output:

    i-<hash>

but does not give a hint that this is to be constantly provided by some internal service.

On the contrary, it says

"Cloud-init is the industry standard multi-distribution method for cross-platform cloud instance initialization. It is supported across all major public cloud providers, provisioning systems for private cloud infrastructure, and bare-metal installations."

It says „instance initalization”. It does not say that is keeps modifying the living instance.

So this is undocumented behaviour, and I am more and more thinking about the question, whether this is a backdoor.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers