cloud-init should support changing the NICs present in an instance between boots

Bug #1841582 reported by sutefun
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

Hello,

I uploaded

bionic-server-cloudimg-amd64.img

into glance. I started an Instance. The instance is reachable.

Next i shutdown the instance
openstack server stop <instance-id>
detach the interface
openstack server remove network <instance-id> <network>
afterwards I attach a new interface with the same IP as before
openstack server add fixed ip --fixed-ip-address <IP-address> <instance-id> <network>
then start the instance and the instance is not reachable.

I can reproduce this behaviour with.

bionic-server-cloudimg-amd64.img
CentOS-7-x86_64-GenericCloud-1907.qcow2

This does not happen with :
CentOS-6-x86_64-GenericCloud-1907.qcow2
xenial-server-cloudimg-amd64-disk1.img
trusty-server-cloudimg-amd64-disk1.img
cirros-0.4.0-x86_64-disk.img

The images are unchanged.
I logged in via local console into ubuntu 18:04. The interface was down and I could see the following logs:

Aug 26 08:38:17 os-steb-pa1 systemd[1]: Starting Apply the settings specified in cloud-config...
Aug 26 08:38:17 os-steb-pa1 networkd-dispatcher[754]: No valid path found for iwconfig
Aug 26 08:38:17 os-steb-pa1 networkd-dispatcher[754]: No valid path found for iw

A dhclient -v <interface> started the interface and the the instance got an answer from dhcp and was reachable again.

I logged in via local console into CentOS 7. The interface was also down and I could see the following logs:

Aug 26 09:05:37 os-steb-cl1 network: Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device eth0 has different MAC address than expected, ignoring.
Aug 26 09:05:37 os-steb-cl1 /etc/sysconfig/network-scripts/ifup-eth: Device eth0 has different MAC address than expected, ignoring.
Aug 26 09:05:37 os-steb-cl1 network: [FAILED]
Aug 26 09:05:37 os-steb-cl1 systemd: network.service: control process exited, code=exited status=1

A dhclient -v <interface> started the interface and the the instance got an answer from dhcp and was reachable again.

The problem is the old mac address in

ubuntu 18.04 /etc/netplan/50-cloud-init.yaml

centos 7 /etc/sysconfig/network-scripts/ifcfg-eth0

Manually changing the mac address in these files to the new one, solves the problem and the instances are reachable again after reboots.

I don't know how the mechanism worked for the older operating systems to establish a network connection after the interface changed via openstack, but this seems to be broken with the newer operating systems.

Environment
===========================
1.
Rocky

ii nova-api 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - API frontend
ii nova-common 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - common files
ii nova-conductor 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - conductor service
ii nova-consoleauth 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - NoVNC proxy
ii nova-placement-api 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend
ii nova-scheduler 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler
ii python-nova 2:18.1.0-0ubuntu1~cloud0 all OpenStack Compute Python 2 libraries

2.
Libvirt + KVM

ii qemu-kvm 1:2.11+dfsg-1ubuntu7.17 amd64 QEMU Full virtualization on x86 hardware

ii libvirt-daemon 4.0.0-1ubuntu8.12 amd64 Virtualization daemon
ii libvirt-daemon-driver-storage-rbd 4.0.0-1ubuntu8.12 amd64 Virtualization daemon RBD storage driver
ii libvirt0:amd64 4.0.0-1ubuntu8.12 amd64 library for interfacing with different virtualization systems
ii python-libvirt 4.0.0-1 amd64 libvirt Python bindings

3.
Neutron with OpenVSwitch and dvr_snat

Greets

Revision history for this message
Matt Riedemann (mriedem) wrote :

This isn't really a nova problem as far as I can tell, it's a problem with the images.

sutefun (stefan-bujack)
affects: nova → cloud-images
Revision history for this message
Robert C Jennings (rcj) wrote :

I believe this behavior is governed by cloud-init. The network config with the mac address is created by cloud-init. I know that cloud-init will reconfigure networking it if believes the image is launched as a new instance. I am not aware of the expected behavior with the removal of the primary interface and replacement with another on the same instance.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hello sutefun,

Could you run `cloud-init collect-logs` on an affected instance and attach the output to this bug, please? Once you've done so, please move the cloud-init task back to New.

Thanks!

Dan

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Revision history for this message
sutefun (stefan-bujack) wrote :

Hello,

good-cloud-init.tar.gz is the log for the working instance before detach/attach
bad-cloud-init.tar.gz is the log for not woking instance after detach/attach.

Greets

Revision history for this message
sutefun (stefan-bujack) wrote :
Changed in cloud-init (Ubuntu):
status: Incomplete → New
Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi sutefun,

Thanks for the details! So the issue here is that cloud-init only generates network configuration at first boot for OpenStack instances. So it's generating configuration that is tied to the MAC of your first network interface, which you then remove. As a result, the system ends up with no configured network interfaces on the next boot.

To work around this, I would suggest either modifying your network configuration before reboot (with netplan on Ubuntu, you should be able to relax the match clause in the cloud-init file in /etc/netplan/), or attaching the new network interface before reboot, so that you can configure it exactly.

From a longer term cloud-init POV, I see a couple of things we could do to improve this: (a) add a way for users to indicate that they would like network configuration regenerated every boot (or perhaps at just the next boot) which would override the data source setting, and/or (b) detect when the fallback configuration that was previously generated is no longer applicable (e.g. in this case, the MAC address it specifies is gone) and regenerate network configuration regardless of boot status in that situation.

Thanks,

Dan

Revision history for this message
Chad Smith (chad.smith) wrote :

Hi sutefun,

 As Dan mentioned, cloud-init applies network configuration on first boot. In Bionic and later (not Xenial/Trusty) cloud-init uses an ephemeral dhclient run to get initial network information on a detected fallback network interface and then it tries to obtain and apply the network config provided by your Openstack's network_data.json.

From your logs, it looks like the ephemeral dhclient run is falling over and either:
  (a) getting invalid dhcp configuration for ens3 or
  (b) your metadata service is not yet up for the instance to talk to.

I'm guessing because the first init-local (pre-network setup) stage failed to discover OpenStackLocal datasource type:

2019-09-30 11:14:39,422 - dhcp.py[DEBUG]: Performing a dhcp discovery on ens3
...
2019-09-30 11:14:49,518 - DataSourceOpenStack.py[DEBUG]: Giving up on OpenStack md from ['http://169.254.169.254/openstack'] after 10 seconds

cloudinit.sources.InvalidMetaDataException: No active metadata service found
2019-09-30 11:14:49,535 - __init__.py[DEBUG]: Datasource DataSourceOpenStackLocal [net,ver=None] not updated for events: New instance first boot
2019-09-30 11:14:49,536 - handlers.py[DEBUG]: finish: init-local/search-OpenStackLocal: SUCCESS: no local data found from DataSourceOpenStackLocal

In the event that OpenStackLocal datasource were detected Bionic and later (not Xenial/Trusty) would set up network from your provided network_data.json which is at
http://169.254.169.254/openstack/2018-08-27/network_data.json. So we need to be sure that the network config presented there is what the VM should have configured.

Revision history for this message
Dan Watkins (oddbloke) wrote :

Chad, I think those are all symptoms of the underlying issue: the network configuration was generated just fine on first boot, but on a subsequent boot with a different NIC, it no longer applies.

I'm going to mark this as Triaged/Wishlist because cloud-init is behaving as intended, but we would (sometimes) like it to behave differently in this case.

summary: - Interface detach and attach does not work with CentOS 7 and Ubuntu 18.04
- cloud images
+ cloud-init should support changing the NICs present in an instance
+ between boots
Changed in cloud-init (Ubuntu):
status: New → Triaged
importance: Undecided → Wishlist
no longer affects: cloud-images
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.