cloud-init in xenial generates bond configs that do not appear to work

Bug #1684092 reported by Dimitri John Ledkov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Expired
Undecided
Unassigned
Xenial
Expired
Undecided
Unassigned

Bug Description

cloud-init 0.7.9-48-g1c795b9-0ubuntu1~16.04.1

Cloud init generated /etc/network/interfaces.d/50-cloud-init.cfg for a network configuration of two vlan'ed networks, on top of a bond, of two interfaces.

The generated config did not work, as the bond specified "bond-slaves none", there were also bond_* settings specified on the interfaces, which I believe are redundant.

Comparing the config generated by this cloud-init with another instance of similar type, I have removed the extrac bond_* settings from the cloud init generated config, and also specified "bond-salves" on the bond stanza, and disabled the cloud-init network config via snippet as advised at the top of the file.

This made networking come up.

I will attach, the generated 50-cloud-init.cfg; diff of applied differences; as well as the journal logs of both boots (broken and working).

Imho cloud-init should generate bond-slaves stanzas with the interfaces names.

However, further investigation suggests that changing bond-slaves from none to list of interfaces is a red herring. And instead the generated configuration is just fine, as long as network config is disabled in cloud-init (e.g. echo 'network: {config: disabled}' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg). So cloud-init generates the right eni.d drop-in file, but fails to correctly online these devices?! seems like "ifup -a" is all that is needed here.

Tags: patch
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

However grepping for all the bond/ens related things from the system logs does not appear to show anything in particular =(

There are two boots; before reboot instance is inacessible via ssh; after reboot it is.

The interfaces.diff shows the difference applied to the generated interfaces file.

As well as the network config module is disabled in cloud-init:

# cat /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
network: {config: disabled}

After reboot instance is accessible via ssh...

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I wonder if the networking configuration changes are a red-herring, and the 'network: {config: disabled}' is what makes all the interfaces to come up correctly.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Reverted the config back to what cloud-init generates; but disabled 'network: {config: disabled}' and everything works now.

I will try cloud-init from xenial-proposed now.

description: updated
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

actually maybe none of this matters at all, and it's just that my reboots don't work at all?!

tags: added: patch
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Doing nova reboot of the instance, makes it come up fine. So I am suspecting that it is simply that reboot/shutdown is broken.

Changed in cloud-init (Ubuntu):
status: New → Incomplete
Changed in cloud-init (Ubuntu Xenial):
status: New → Incomplete
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Ok, i'm back to hard reboots via API rather than doing $ sudo reboot; and it seems like no, i do in fact need changes to bond-slaves stanza after all, even with cloud-init from xenial-proposed.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Links are bouncing on boot.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Links are bouncing on boot.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Logs for new boot, with more verbose networking.service and cloud-init init stage RuntimeError.

Revision history for this message
Ryan Harper (raharper) wrote :

RuntimeError: duplicate mac found! both 'bond0.101' and 'ens9f1' have mac 'a0:36:9f:2c:df:f1'

Which cloud-init are you running (-updates or -proposed)?

Revision history for this message
Ryan Harper (raharper) wrote :

Also, it's somewhat frustrating to see the physical devices appear sooner in this run vs. the failed runs we've seen; That is, the runtime error is different then the bond0 timeout when the slaves didn't bother showing up (ie, they somehow missed an ifup call).

Revision history for this message
Ryan Harper (raharper) wrote :

Also thinking... can we reproduce if we do the following:

1) allocate new baremetal instance
2) first boot fails to bring up network
3) reboot
4) second boot comes up (some of the time, right?)
5) login, rm -rf /var/log/cloud-init* /var/lib/cloud/* /etc/network/interfaces.d/*
6) reboot

If we're having issues with the devices becoming "up" on the switch, then we can simulate a fresh boot with the above but long after the switch will have seen the MAC addresses from the nics. We should see cloud-init from -proposed coming up just fine in that case.

And we may need to see about adding some up-delay or poking the hosting provider for details w.r.t network switch config/nic config if there's stuff going on in the background that keeps the interfaces from coming up sooner.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init (Ubuntu Xenial) because there has been no activity for 60 days.]

Changed in cloud-init (Ubuntu Xenial):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for cloud-init (Ubuntu) because there has been no activity for 60 days.]

Changed in cloud-init (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.