does not recognize cloned KVM VM as new instance

Bug #1800848 reported by Martin Steigerwald
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Invalid
Undecided
Unassigned
cloud-init (Suse)
Fix Released
Medium

Bug Description

On a SLES 15 KVM VM on Proxmox VE cloud-init 18.2-3.13 from module/repo Public-Cloud-Module_15-0 cloud-init fails to recognize a newly cloned VM from a template where cloud-init has initially been run for test purposes as a new instance. VM is using NoCloud resource.

This leads to network configuration not applied and thus duplicate IP addresses when VM is started:

% tail -f /var/log/cloud-init.log | grep -v util.py
2018-10-31 13:55:27,370 - __init__.py[INFO]: /var/lib/cloud/data/previous-hostname differs from /etc/hostname, assuming user maintained hostname.
2018-10-31 13:57:37,431 - main.py[DEBUG]: No kernel command line url found.
2018-10-31 13:57:37,431 - main.py[DEBUG]: Closing stdin.
2018-10-31 13:57:37,454 - main.py[DEBUG]: Checking to see if files that we need already exist from a previous run that would allow us to stop early.
2018-10-31 13:57:37,455 - main.py[DEBUG]: Execution continuing, no previous run detected that would allow us to stop early.
2018-10-31 13:57:37,455 - handlers.py[DEBUG]: start: init-network/check-cache: attempting to read from cache [trust]
2018-10-31 13:57:37,463 - stages.py[DEBUG]: restored from cache with run check: DataSourceNoCloudNet [seed=/dev/sr0] [dsmode=net]
2018-10-31 13:57:37,464 - handlers.py[DEBUG]: finish: init-network/check-cache: SUCCESS: restored from cache with run check: DataSourceNoCloudNet [seed=/dev/sr0][dsmode=net]
2018-10-31 13:57:37,488 - stages.py[DEBUG]: previous iid found to be 414842fe12da6f1078eca77443e6ab84592299ba
2018-10-31 13:57:37,493 - main.py[DEBUG]: [net] init will now be targeting instance id: 414842fe12da6f1078eca77443e6ab84592299ba. new=False
2018-10-31 13:57:37,514 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'mac_address': '76:61:cf:d5:65:b2', 'subnets': [{'type': 'static', 'address': '10.0.88.32', 'netmask': '255.255.0.0', 'gateway': '10.0.0.4'}, {'type': 'static', 'address': 'auto'}]}, {'type': 'nameserver', 'address': ['10.0.0.4'], 'search': ['qs.de']}]}
2018-10-31 13:57:37,515 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.sles.Distro'>
2018-10-31 13:57:37,529 - __init__.py[DEBUG]: no work necessary for renaming of [['76:61:cf:d5:65:b2', 'eth0', 'virtio_net', '0x0001']]
2018-10-31 13:57:37,530 - stages.py[DEBUG]: not a new instance. network config is not applied.

Network configuration obviously was changed however.

Commands used for cloning (using a self-made shell script)

qm shutdown 1032 ; qm clone 1032 2200 --name slesmaster ; qm template 2200
qm clone 2200 2201 --name sles1 ; qm set 2201 --ipconfig0 ip=10.0.88.201/8,gw=10.0.0.4 ; qm start 2201
qm clone 2200 2202 --name sles2 ; qm set 2202 --ipconfig0 ip=10.0.88.202/8,gw=10.0.0.4 ; qm start 2202

Revision history for this message
In , Martin Steigerwald (ms-proact) wrote :

Reported upstream as: https://bugs.launchpad.net/cloud-init/+bug/1800848

On a SLES 15 KVM VM on Proxmox VE cloud-init 18.2-3.13 from module/repo Public-Cloud-Module_15-0 cloud-init fails to recognize a newly cloned VM from a template where cloud-init has initially been run for test purposes as a new instance. VM is using NoCloud resource.

This leads to network configuration not applied and thus duplicate IP addresses when VM is started:

% tail -f /var/log/cloud-init.log | grep -v util.py
2018-10-31 13:55:27,370 - __init__.py[INFO]: /var/lib/cloud/data/previous-hostname differs from /etc/hostname, assuming user maintained hostname.
2018-10-31 13:57:37,431 - main.py[DEBUG]: No kernel command line url found.
2018-10-31 13:57:37,431 - main.py[DEBUG]: Closing stdin.
2018-10-31 13:57:37,454 - main.py[DEBUG]: Checking to see if files that we need already exist from a previous run that would allow us to stop early.
2018-10-31 13:57:37,455 - main.py[DEBUG]: Execution continuing, no previous run detected that would allow us to stop early.
2018-10-31 13:57:37,455 - handlers.py[DEBUG]: start: init-network/check-cache: attempting to read from cache [trust]
2018-10-31 13:57:37,463 - stages.py[DEBUG]: restored from cache with run check: DataSourceNoCloudNet [seed=/dev/sr0] [dsmode=net]
2018-10-31 13:57:37,464 - handlers.py[DEBUG]: finish: init-network/check-cache: SUCCESS: restored from cache with run check: DataSourceNoCloudNet [seed=/dev/sr0][dsmode=net]
2018-10-31 13:57:37,488 - stages.py[DEBUG]: previous iid found to be 414842fe12da6f1078eca77443e6ab84592299ba
2018-10-31 13:57:37,493 - main.py[DEBUG]: [net] init will now be targeting instance id: 414842fe12da6f1078eca77443e6ab84592299ba. new=False
2018-10-31 13:57:37,514 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'name': 'eth0', 'mac_address': '76:61:cf:d5:65:b2', 'subnets': [{'type': 'static', 'address': '10.0.88.32', 'netmask': '255.255.0.0', 'gateway': '10.0.0.4'}, {'type': 'static', 'address': 'auto'}]}, {'type': 'nameserver', 'address': ['10.0.0.4'], 'search': ['qs.de']}]}
2018-10-31 13:57:37,515 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.sles.Distro'>
2018-10-31 13:57:37,529 - __init__.py[DEBUG]: no work necessary for renaming of [['76:61:cf:d5:65:b2', 'eth0', 'virtio_net', '0x0001']]
2018-10-31 13:57:37,530 - stages.py[DEBUG]: not a new instance. network config is not applied.

Network configuration obviously was changed however.

Commands used for cloning (using a self-made shell script)

qm shutdown 1032 ; qm clone 1032 2200 --name slesmaster ; qm template 2200
qm clone 2200 2201 --name sles1 ; qm set 2201 --ipconfig0 ip=10.0.88.201/8,gw=10.0.0.4 ; qm start 2201
qm clone 2200 2202 --name sles2 ; qm set 2202 --ipconfig0 ip=10.0.88.202/8,gw=10.0.0.4 ; qm start 2202

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

I bet this is related to this change:

apply networking only on first instance boot
https://bugs.launchpad.net/cloud-init/+bug/1571004

An option would also be to be able to override this default and have it again apply network configuration on every boot. NoCloud datasource in Proxmox can be updated any time. Even for an existing VM. It may need a reboot of the VM but then the new data is available.

Changed in cloud-init (Suse):
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
In , Martin Steigerwald (ms-proact) wrote :

This does not yet appear to be fixed with:

slestemplate:~ # rpm -qa | grep cloud
cloud-init-18.4-6.124.x86_64
sle-module-public-cloud-release-15.1-65.1.x86_64
cloud-init-config-suse-18.4-6.124.x86_64

However, on SLES 12 template with

sles12-1:~ # rpm -qa | grep cloud
cloud-init-config-suse-18.5-13.1.x86_64
cloud-init-18.5-13.1.x86_64

sles12-1:~ # cat /etc/zypp/repos.d/Cloud_Tools.repo
[Cloud_Tools]
name=Cloud:Tools (SLE_12_SP4)
enabled=1
autorefresh=1
baseurl=http://download.opensuse.org/repositories/Cloud:/Tools/SLE_12_SP4/
path=/
type=rpm-md
keeppackages=0

it works okay.

Thus going switching SLES 15 template to clou- init from

http://download.opensuse.org/repositories/Cloud:/Tools/SLE_15_SP1/

and reporting back whether this works.

Revision history for this message
In , Martin Steigerwald (ms-proact) wrote :

Nope, this did not help. It appears that cloud-init is not even ran on startup. But this would be a different bug than this one.

Revision history for this message
In , Martin Steigerwald (ms-proact) wrote :

I resolved Bug 1136532 - cloud-init-generator does not run cloud-init on SLES 15 SP 1 (mistake in my understanding it appears) and applying IP addresses works correctly with

sles2:~ # rpm -qa | grep cloud-init
cloud-init-18.5-13.3.x86_64
cloud-init-config-suse-18.5-13.3.x86_64

from

http://download.opensuse.org/repositories/Cloud:/Tools/SLE_15_SP1/

Now downgrading cloud-init to the version in the Public Cloud module of SLES 15 SP 1 and see whether it works there either. If yes, I think this bug can be closed.

Revision history for this message
In , Martin Steigerwald (ms-proact) wrote :

This appears to be fixed with both cloud-init from Public Cloud module in SLES 15 SP 1

sles1:~ # rpm -qa | grep cloud
cloud-init-18.4-6.124.x86_64
sle-module-public-cloud-release-15.1-65.1.x86_64
cloud-init-config-suse-18.4-6.124.x86_64

and cloud init in Cloud:Tools repository (tested with SLES 15 SP 1 and SLES 12 SP 4)

sles12-1:~ # rpm -qa | grep cloud
cloud-init-config-suse-18.5-13.1.x86_64
cloud-init-18.5-13.1.x86_64

Thus closing.

Changed in cloud-init (Suse):
status: Confirmed → Fix Released
Revision history for this message
Ryan Harper (raharper) wrote :

Hi,

Thanks for reporting the issue. Currently cloud-init does not apply network-config on every boot as you've seen; it applies only if it's a new instance. In your scenario where it's booted once and then network-config has changed but not the instance-id is working as designed.

I'm interested in understanding your last comment suggesting that 18.4 or 18.5 is "working"; could you attach 'cloud-init collect-logs' tarball for those cases?

Further, is there a reason the clone operation does not create a new instance-id? Cloud-init generates other things that shouldn't be cloned, like ssh host keys.

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Martin Steigerwald (ms-proact) wrote :

It might be difficult to get back the RPM package of the cloud-init version I used as I reported the bug. And the recent ones work.

Reason for the initial failure likely has been that I missed to enable cloud-init services anyway. I thought the Systemd target would take care of running cloud-init services, and well it does, but only after I enable the necessary cloud-init services for me[1].

So this all might be just have been a misunderstanding on my side.

What do you mean by instance-id? How does cloud-init determine it? If its something that is usually done when cloning a KVM VM, I bet Proxmox VE would do it. And I could verify it.

[1] Details in this OpenSUSE non bug:
https://bugzilla.opensuse.org/show_bug.cgi?id=1136532

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

Hmmm, I am confused, I posted the log entry "not a new instance"… so some part of cloud-init was running. So maybe the issue has been that I forgot to enable all of cloud-init services that would to the new instance check or either cloud-init 18.2-3.13 had a bug. I am quite sure I was not aware that there are four different services for cloud-init (cloud-init, cloud-final, cloud-config, cloud-init-local). I find it sometimes challenging to crasp the complexity behind cloud-init.

Still downgrading to 18.2-3.13 to make sure would require me to dig out that an RPM package for it. So I'd rather check out whether Proxmox VE does the necessary changes for a VM to be a new instance on cloning, which I bet it does. (Also as cloud-init works nicely with Ubuntu, Debian and CentOS VM templates.)

Revision history for this message
Ryan Harper (raharper) wrote :

Hi Martin,

Thanks for the reply.

Cloud-init will enable itself *if* it detects a DataSource, or if it is running on a particular cloud platform, see here for details[1]. It may be that you booted an image that either didn't have any user-data provided or on a platform that cloud-init doesn't recognize; in which case cloud-init won't "enable" any of the cloud-init services (cloud-local, cloud-init, cloud-modules, cloud-final). This behavior ensures that cloud-init doesn't run if it isn't needed.

W.r.t instance-id, it is an arbitrary string to uniquely identify an instance. It may be as simple as string or a name; this value is provided in the meta-data that cloud-init reads on different platforms[2].

General cloning of images without removing/changing the meta-data associated with the instance means cloud-init assumes this is a boot of the same image.

Looking at the bug you pointed at, it was using the NoCloud datasource[3], via an attached iso.

Within the iso, you'll have a 'meta-data' file which includes the instance-id: value which is used to identify the instance to cloud-init. If you clone a vm and attach the same iso file (which has the same instance-id value) then it will be as if you are just booting the same VM again (even if it was cloned).

I suspect that if the clone operation would ensure that the iso you attach has a new instance-id value in the meta-data file, then things will work as you expect.

1. https://cloudinit.readthedocs.io/en/latest/topics/boot.html
2. https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html
3. https://cloudinit.readthedocs.io/en/latest/topics/datasources/nocloud.html

Revision history for this message
Martin Steigerwald (ms-proact) wrote :

Proxmox VE gets this right:

root@ubuntu1:~# cat /mnt/zeit/meta-data
instance-id: a337d93155eaab77783b0684bd9168f4032ebc6d

root@ubuntu2:~# cat /mnt/zeit/meta-data
instance-id: 88b8507bd3c7d356a5595bc52c5751e52f8817e0

(both have /dev/sr0 with NoCloud ISO mounted)

As for the start at bootup:

So if cloud-init does not run on SLES 15 with NoCloud datasource without enabling the cloud-init services manually this would be a bug? If so I'd reopen the corresponding downstream bug. However, it may be distro preference not to enable cloud-init automatically?

Anyway, this bug can be closed, as it was about the new instance thing.

Revision history for this message
Ryan Harper (raharper) wrote :

OK, good to know.

w.r.t SLES 15 and cloud-init; if you've attached a NoCloud datasource correctly (your logs indicate that you are) then I would say yes this is a bug.

Now, upstream since 18.4 there have been quite a few fixes applied which may not have made it into SLES15, not 100% sure about that but:

% git log 18.4..HEAD --oneline | grep -i suse
bb0b6f1 net/sysconfig: write out SUSE-compatible IPv6 config
dfe50e3 tox: Update testenv for openSUSE Leap to 15.0
3f12012 sysconfig: On SUSE, use STARTMODE instead of ONBOOT
744c423 Add cloud-id binary to packages for SUSE
e0084a5 systemd: On SUSE ensure cloud-init.service runs before wicked
4ea64f1 update detection of openSUSE variants

Changed in cloud-init:
status: Incomplete → Invalid
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.