Comment 12 for bug 1899487

Revision history for this message
Frode Nordahl (fnordahl) wrote :

> Hmm, however it should be the same version of cloud-init in either Bionic or Focal tests. Is this a netplan change?

In the bionic image I have:
$ dpkg -l | egrep "(cloud-init|netplan)"
ii cloud-init 20.3-2-g371b392c-0ubuntu1~18.04.1 all Init scripts for cloud instances
ii cloud-initramfs-copymods 0.40ubuntu1.1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.40ubuntu1.1 all write a network interface file in /run for BOOTIF
ii libnetplan0:amd64 0.99-0ubuntu3~18.04.3 amd64 YAML network configuration abstraction runtime library
ii netplan.io 0.99-0ubuntu3~18.04.3 amd64 YAML network configuration abstraction for various backends

In the focal image I have:
$ dpkg -l | egrep "(cloud-init|netplan)"
ii cloud-init 20.3-2-g371b392c-0ubuntu1~20.04.1 all initialization and customization tool for cloud instances
ii cloud-initramfs-copymods 0.45ubuntu1 all copy initramfs modules into root filesystem for later use
ii cloud-initramfs-dyn-netconf 0.45ubuntu1 all write a network interface file in /run for BOOTIF
ii libnetplan0:amd64 0.99-0ubuntu3~20.04.2 amd64 YAML network configuration abstraction runtime library
ii netplan.io 0.99-0ubuntu3~20.04.2 amd64 YAML network configuration abstraction for various backends

I guess it's time for me to ask a question: is it cloud-init that renders /etc/netplan/50-cloud-init.yaml? If so where does netplan fit in when the difference is how that file is rendered and not how it is interpreted. As you can see in #10 the mtu statement is not in the file on bionic, while it is on focal.

Since versions appear to be the same my guess would be that there is some internal modelling of how bionic vs. focal should be configured?

> The controls in the OpenStack DHCP service are purely a "on/off" switch (`advertise_mtu`) about advertising MTU in the DHCP network details and not the control for what that setting is? The reason I want to make sure is because I'm always leery when there are multiple sources of possible truth that have to be sorted and if a user can bite themselves by changing an MTU value in one place but also another and which do you listen to/respect.

Previously you had the `advertise_mtu` "on/off" switch which was removed at OpenStack Ocata (leaving it permanently "on"). In addition to that the operator of the cloud can inject DHCP configration into the DHCP servers. The end user consuming the cloud also have control over their virtual network MTU's through the OpenStack Neutron API.

> The actual value for the MTU is a Neutron setting and in theory, should be the same then from DHCP network data or by the provided network_data.json information?

The value for the MTU is a per virtual network setting which is exposed to the end user of the cloud. And yes, the setting set on the virtual network should be the one exposed in the network_data.json. But remember that the operator of the cloud has power to inject options directly into the DHCP server which could mean the DHCP server could advertise a different MTU than the user has chosen for their network.

If the operator of the cloud has chosen to do so it is most likely for a very good operational reason. If the end user or operator intends to configure instances with DHCP, DHCP should be authoritative source of truth.

> The final nail in this coffin is that the setting cloud-init is setting overrides the value for MTU that comes in via DHCP.

> Do I have that right? If so, a couple of questions then.

Yes.

> 1. It seems it would be worthwhile to see if there was some method for a refreshing cloud-init's details on the network. Ideally here, changing the network data in Neutron would be able to trigger an update on instances, though that's a tricky can of worms that could lead to broken networking on existing hosts if it goes wrong. It smells a bit like the work we want to completed next cycle around allowing hotplug of a new nic/device and be able to help drive networking config for it...just without the whole new device thing heh.

This does indeed sound interesting, with regards to a cloud operator possibly not having any access or control over the instances end users run on their cloud having levers to control network configuration in such instances for maintenance/migration purposes in some manner would be valuable. The alternative is forklift and endless nagging of end users to do manual intervention and the support load that comes afterwards when everything breaks because they did not pay attention to the operators requests in time.

> 2. Should the setting that cloud-init writes be an overwriting value in netplan? Could this be a bug that netplan is not allowing the dhcp details to be respected over what cloud-init started with?

I think when the operator and/or end user intends to use network auto configuration (that be DHCP or IPv6 SLAAC) that should be the authoritative source of truth for the instance. Any other path will risk turning a whole estate of instances the cloud operator does not necessarily have access to into door stops whenever the network configuration changes.

My conclusion so far is that Bionic guests behaves correctly as detailed in #10, Focal guests behave incorrectly as detailed in the original bug description.