cloud-init fails to configure network interfaces running with OpenStack cloud

Bug #1857031 reported by Madhuri Kumari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
cloud-init
Expired
Undecided
Madhuri Kumari

Bug Description

cloud-init 18.5:

Node has 3 interfaces:
 -enp5s0f0 - not connected
 -enp5s0f1 - connected
 -ib0 - an HFI port

Centos7.6 running on the node.

Openstack boots the server with two interfaces enp5s0f1 and ib0 and it is successful but the node is not reachable. On the node, the cloud-init configures the wrong interface enp5s0f0. It is because cloud-init fails to configure network interfaces running with OpenStack cloud if any of the network interfaces don't exist on the node. In this case, ib0 was missing.

Please note that when I try to boot the server with only 1 interface enp5s0f1, everything works fine and the node is reachable too.

Logs: http://paste.openstack.org/show/787707/
network-data and nics: http://paste.openstack.org/show/787797/ (note that enp5s0f1 is manually configured)

Revision history for this message
Madhuri Kumari (madhuri-rai07) wrote :
affects: cloud-init → nova
affects: nova → cloud-init
Revision history for this message
Scott Moser (smoser) wrote :

Hi,

The problem you're seeing here is a result of a failure to persist data between
cloud-init's local stage and network stage.

https://bugs.launchpad.net/cloud-init/+bug/1801364

Chad Smith (chad.smith)
description: updated
Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you for filing this bug and helping to improve cloud-init.

 From your linked logs it looks like you have been running cloud-init 18.2 and 18.5 on CentOS. There have been a number of fixes that touched this and fixing the persisting of instance-data.json bug we see in your logs here, as well as some of the network rendering logic.

If possible please try installing our latest upstream release cloud-init v. 19.4 from a copr-repo that we update at

https://copr.fedorainfracloud.org/coprs/g/cloud-init/el-testing

Once you have installed cloud-init 19.4, please run "sudo cloud-init clean --logs --reboot" to clean the system and allow cloud-init to "run fresh" on the system to see if we are still exposed to this error.

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Chad Smith (chad.smith) wrote :

Marking the cloud-init task incomplete, please mark it back to if you are able to confirm that this bug still exists on latest cloud-init 19.4

Revision history for this message
Madhuri Kumari (madhuri-rai07) wrote :

@smoser, Hi, I think it is not the issue. I tried to run the node with only 1 interface and the network is configured just fine. Even though I see a similar error in cloud init logs.

Logs for successful cloud-init with 1 interface only: http://paste.openstack.org/show/787763/

Revision history for this message
Madhuri Kumari (madhuri-rai07) wrote :

@chad.smith, Updated cloud-init to 19.4 and performed a clean reboot, the issue still exists. Attached is the log.

Revision history for this message
Madhuri Kumari (madhuri-rai07) wrote :

Hi, I found the issue. cloud-init fails to configure network interfaces if any of the interfaces are missing on the node at https://github.com/canonical/cloud-init/blob/master/cloudinit/sources/helpers/openstack.py#L677

IMO cloud-init should continue to configure other existing interfaces on the nodes and skip the non-existing interfaces. I have pushed a patch to fix this issue https://github.com/canonical/cloud-init/pull/122
Thanks!

Revision history for this message
Matt Riedemann (mriedem) wrote :

Marking invalid for nova since it sounds like this is a cloud-init issue.

Changed in nova:
status: New → Invalid
Revision history for this message
Madhuri Kumari (madhuri-rai07) wrote :

Hi Chad, Scott,

Can you confirm this issue if valid or not?

summary: - cloud-init configures wrong interface when trying to configure two
- interfaces with OpenStack cloud
+ cloud-init fails to configure network interfaces running with OpenStack
+ cloud
description: updated
Changed in cloud-init:
assignee: nobody → Madhuri Kumari (madhuri-rai07)
Revision history for this message
Ryan Harper (raharper) wrote :

@Madhuri,

Do you have cloud-init logs from the multi-nic + infiniband device boot? The logs posted are some what confusing.

The journal, we can see errors with "eth0" and "ib0":

Dec 20 08:55:15.447754 opa-new-4.novalocal network[3506]: Bringing up interface eth0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device eth0 does not seem to be present, delaying initialization.
Dec 20 08:55:15.449801 opa-new-4.novalocal /etc/sysconfig/network-scripts/ifup-eth[3691]: Device eth0 does not seem to be present, delaying initialization.
Dec 20 08:55:15.451435 opa-new-4.novalocal network[3506]: [FAILED]
Dec 20 08:55:15.810635 opa-new-4.novalocal network[3506]: Bringing up interface ib0: ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Device ib0 does not seem to be present, delaying initialization.
Dec 20 08:55:15.811909 opa-new-4.novalocal /etc/sysconfig/network-scripts/ifup-eth[3720]: Device ib0 does not seem to be present, delaying initialization.
Dec 20 08:55:15.813392 opa-new-4.novalocal network[3506]: [FAILED]

However, the cloud-init log shows cloud-init only writing config for enp5s0f0,

 Applying network configuration from fallback bringup=False: {'ethernets': {'enp5s0f0': {'set-name': 'enp5s0f0', 'match': {'macaddress': '00:1e:67:fe:d2:59'}, 'dhcp4': True}}, 'version': 2}

That makes me wonder if there are some existing configuration files already present in this image?

Would you be able to attach the network_data.json that was supplied, you can fetch this from the running instance via:

curl -s http://169.254.169.254/openstack/latest/network_data.json

Also, if you could capture /etc/sysconfig/network-scripts/ifcfg-*, we could see if additional files are causing conflicts with what cloud-init is generating.

Revision history for this message
Madhuri Kumari (madhuri-rai07) wrote :

Hi Ryan,

Thank you for your response. Please find the details below:

network_data.json:
{"services": [], "networks": [{"network_id": "9f3f91e1-5926-4345-953b-14049d48f17e", "link": "tapb2e6093b-d6", "type": "ipv4_dhcp", "id": "network0"}, {"network_id": "843dc3a5-2ff6-4bb6-8594-cf0459ca344b", "link": "tapacc60427-fa", "type": "ipv4_dhcp", "id": "network1"}], "links": [{"vif_id": "b2e6093b-d6a9-4fb4-aa6d-f5a598b216c8", "type": "phy", "ethernet_mac_address": "00:11:75:67:1e:bf", "id": "tapb2e6093b-d6", "mtu": 1500}, {"vif_id": "acc60427-facd-4db3-bd2b-5bce4fdbd57c", "type": "phy", "ethernet_mac_address": "00:1e:67:ed:f2:64", "id": "tapacc60427-fa", "mtu": 1500}]}

Node has 3 ifcg files:
[centos@opa-node latest]$ ls /etc/sysconfig/network-scripts/ifcfg-*
/etc/sysconfig/network-scripts/ifcfg-enp3s0f0 /etc/sysconfig/network-scripts/ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-lo

Revision history for this message
James Falcon (falcojr) wrote :
Changed in cloud-init:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.