init local crash - unknown subnet type 'loopback'

Bug #1671927 reported by Dan Peschman
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Unassigned
cloud-init (CentOS)
Unknown
Medium

Bug Description

cloud-init-0.7.8-5.fc25.noarch on Fedora 25
Data source: ConfigDrive
Cloud provider: Openstack

Local init ($ cloud-init init -l) crashes with error [Unknown subnet type 'loopback' found for interface 'lo']. Stack trace attached.

I have a config drive w/ Ubuntu-style interface file at sr0/openstack/content/0000, and network_data.json at sr0/openstack/latest/. Only 0000 defines the loopback device. Screenshots of these files are at http://imgur.com/a/qEElh.

Expected outcome is to render ifcfg-eth0 in /etc/sysconfig/network-scripts, but leave ifcfg-lo in that directory untouched as per previous versions, for instance cloud-init-0.7.6-5.20140218bzr1060.fc23.noarch on Fedora 23.

Revision history for this message
Dan Peschman (dpeschman) wrote :
Revision history for this message
Dan Peschman (dpeschman) wrote :

Screenshot of openstack/content/0000.

Revision history for this message
Dan Peschman (dpeschman) wrote :

First bit of stack trace from cloud-init-output.log.

Scott Moser (smoser)
Changed in cloud-init:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Dan Peschman (dpeschman) wrote :

I don't know if it should be a separate bug or not, but - if you work around this issue by deleting the "lo" interface from the collection, the run finishes cleanly, but the GATEWAY is not rendered to ifcfg-eth0 nor route-eth0 because subnet['routes'] is empty.

I worked around this by adding a route to subnet['routes'] with gateway = subnet['gateway']. Like this in net/network_state.py around line 253:
+ if (subnet.get('routes', None) is None and
+ 'gateway' in subnet):
+ subnet['routes'] = [
+ {
+ 'gateway': subnet['gateway'],
+ 'netmask': '0.0.0.0',
+ 'network': '0.0.0.0'
+ }
+ ]

Revision history for this message
Dan Peschman (dpeschman) wrote :

This also affects CentOS 7 now w/ cloud-init-0.7.9-9.el7.

Revision history for this message
In , Greg (greg-redhat-bugs) wrote :

Description of problem:

cloud-init throws an exception when processing the "iface lo inet loopback" entry in the config drive (will attach):

ValueError: Unknown subnet type 'loopback' found for interface 'lo'

Version-Release number of selected component (if applicable):

cloud-init-0.7.9-9.el7.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Boot rhel-server-7.4-x86_64-kvm.qcow2 (we're currently running OpenStack Liberty)
2. Observe network is inaccessible
3. Use spice-console to log in and observe traceback in /var/log/cloud.log (will attach)

Actual results:

Missing /etc/sysconfig/network-scripts/ifcfg-lo
Missing /etc/sysconfig/network-scripts/ifcfg-eth0

Expected results:

Populated /etc/sysconfig/network-scripts/ifcfg-lo
Populated /etc/sysconfig/network-scripts/ifcfg-eth0

Additional info:

The following patch appears to resolve the issue:

--- /usr/lib/python2.7/site-packages/cloudinit/net/sysconfig.py.orig 2017-06-22 11:04:23.000000000 -0400
+++ /usr/lib/python2.7/site-packages/cloudinit/net/sysconfig.py 2017-10-05 09:54:28.861469214 -0400
@@ -299,6 +299,9 @@
                 # iface_cfg['BOOTPROTO'] = 'static'
                 if _subnet_is_ipv6(subnet):
                     iface_cfg['IPV6INIT'] = True
+ elif subnet_type == 'loopback':
+ iface_cfg['IPADDR'] = '127.0.0.1'
+ iface_cfg['NETMASK'] = '255.0.0.0'
             else:
                 raise ValueError("Unknown subnet type '%s' found"
                                  " for interface '%s'" % (subnet_type,

Revision history for this message
In , Greg (greg-redhat-bugs) wrote :

Created attachment 1334952
config_drive

Revision history for this message
In , Greg (greg-redhat-bugs) wrote :

Created attachment 1334953
/var/log/cloud-init.log

Revision history for this message
In , Greg (greg-redhat-bugs) wrote :

Created attachment 1334954
Patch for ValueError: Unknown subnet type 'loopback'

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :

Greg, did you submit the patch attached here upstream? I don't see it attached to the bug report. It may help to move things along. If you don't have time or have not signed the CLA, I could do it for you.

Revision history for this message
In , Greg (greg-redhat-bugs) wrote :

Hi Ryan, I have not submitted the patch upstream, as I wasn't sure whether RHEL builds of cloud-init would follow a policy of tracking upstream (a la Fedora), or if the plan was to backport fixes to the version already in RHEL. Given the number of RHEL patches already being applied, I guess I was assuming that Red Hat would likely backport a patch as opposed to pulling a new version from upstream.

I haven't signed the CLA, as I haven't yet had any direct interaction with the cloud-init people, other than with Josh Harlow (who I work with).

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :

Our rule is that fixes need to be accepted upstream before we ship them. If the problem is fixed in the current upstream version already, then we'd backport the patch, or rework the fix, if it's not possible to directly backport. I didn't get the impression the issue was resolved upstream, though, as the issue on launchpad is still in the confirmed state.

Revision history for this message
Ryan McCabe (rmccabe) wrote :

Any progress on this? There is a patch attached to the RH BZ that I've linked that's pretty simple and seems to solve the issue.

Changed in cloud-init (CentOS):
importance: Unknown → Undecided
status: Unknown → In Progress
Revision history for this message
Scott Moser (smoser) wrote :

Hi,

The fix for your stack trace is upstream in commit
 1a2ca7530518d819cbab7287b12f942743427e38
https://git.launchpad.net/cloud-init/commit/?id=1a2ca7530518

I didn't mark that commit as having fixed this bug because I'm really unsure why cloud-init went down the path it did.

The 'interfaces' path that you seem to have hit should only be used if there is no network_data.json on the config drive, and any recent openstack should have provided that.

Dan, Could you perhaps attach the full config drive?

Revision history for this message
Scott Moser (smoser) wrote :

I've marked this 'incomplete' set it back to 'confirmed' when you've attached the config drive.
That said, you wont see this bug with master or 17.1.

Changed in cloud-init:
status: Confirmed → Incomplete
Revision history for this message
Dan Peschman (dpeschman) wrote :
Revision history for this message
Dan Peschman (dpeschman) wrote :

Also - we're on Openstack Liberty, not anything recent.

Revision history for this message
Scott Moser (smoser) wrote :

Dan,
I've tried to recreate this issue and I'm not able to.
Your log shows:

2017-10-05 09:46:15,582 - openstack.py[DEBUG]: Failed reading optional path /tmp/tmpB4Bmvi/openstack/2015-10-15/network_data.json due to: [Errno 2] No such file or directory: '/tmp/tmpB4Bmvi/openstack/2015-10-15/network_data.json'

which is inconsistent wit the tar file you sent. the tar file has a 'network_data.json'.

All in all, I don't think this is an issue any more. I'm going to go ahead and mark the bug fixed in 17.1.
If you are able to reproduce error and your log contains something like:

2017-10-05 09:46:15,654 - DataSourceConfigDrive.py[DEBUG]: network config provided via converted eni data

then please attach the *full* config drive (dd if=/dev/sr0 of=out.img && gzip out.img) and /var/log/cloud-init.log
and just please open a new bug with that information.

Thanks.

Changed in cloud-init:
status: Incomplete → Fix Released
Revision history for this message
Dan Peschman (dpeschman) wrote :

Hey Scott. Just to close the loop, my config drive does not have netowork_data.json in 2015-10-15/ - it is only in latest/:
[root@f25 dan]# tree /mnt/z
/mnt/z
├── ec2
│   ├── 2009-04-04
│   │   ├── meta-data.json
│   │   └── user-data
│   └── latest
│   ├── meta-data.json
│   └── user-data
└── openstack
    ├── 2012-08-10
    │   ├── meta_data.json
    │   └── user_data
    ├── 2013-04-04
    │   ├── meta_data.json
    │   └── user_data
    ├── 2013-10-17
    │   ├── meta_data.json
    │   ├── user_data
    │   └── vendor_data.json
    ├── 2015-10-15
    │   ├── meta_data.json
    │   ├── user_data
    │   └── vendor_data.json
    ├── content
    │   ├── 0000
    │   └── 0001
    └── latest
        ├── meta_data.json
        ├── network_data.json
        ├── user_data
        └── vendor_data.json

Revision history for this message
Scott Moser (smoser) wrote :

Jon,
That seems like a bug in your openstack. The point of 'latest' is that it is supposed to just be a copy of the "latest" (which would be 2015-10-15 for you). It should have the network-data.json file.

the code that generates that is in
https://github.com/openstack/nova/blob/master/nova/api/metadata/base.py

From the openstack that I can get at, I see network_data.json available in 2015-10-15 (liberty) and later which aligns with what the code says should be there.

$ mount /dev/sr0 /mnt
$ ( cd /mnt && find * -type f )
ec2/2009-04-04/meta-data.json
ec2/latest/meta-data.json
openstack/2012-08-10/meta_data.json
openstack/2013-04-04/meta_data.json
openstack/2013-10-17/meta_data.json
openstack/2013-10-17/vendor_data.json
openstack/2015-10-15/meta_data.json
openstack/2015-10-15/network_data.json
openstack/2015-10-15/vendor_data.json
openstack/2016-06-30/meta_data.json
openstack/2016-06-30/network_data.json
openstack/2016-06-30/vendor_data.json
openstack/2016-10-06/meta_data.json
openstack/2016-10-06/network_data.json
openstack/2016-10-06/vendor_data.json
openstack/2016-10-06/vendor_data2.json
openstack/2017-02-22/meta_data.json
openstack/2017-02-22/network_data.json
openstack/2017-02-22/vendor_data.json
openstack/2017-02-22/vendor_data2.json
openstack/latest/meta_data.json
openstack/latest/network_data.json
openstack/latest/vendor_data.json
openstack/latest/vendor_data2.json

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :

I think this may be fixed as a side-effect of another set of patches that had to be pulled in to fix another network config bug. Could you give the package at people.redhat.com/rmccabe/cloud-init/cloud-init-0.7.9-15.el7.x86_64.rpm a shot to see if it resolves the problem?

Revision history for this message
In , Ryan (ryan-redhat-bugs) wrote :
Revision history for this message
In , Greg (greg-redhat-bugs) wrote :

I tested against cloud-init-0.7.9-15.el7.x86_64.rpm as provided above, and it appears to resolve the issue.

Tested against RHEL 7.4 and CentOS 7.4 images (same images that were used originally when reporting the error), and I'm no longer seeing the exception.

/etc/sysconfig/network-scripts/ifcfg-eth0 appears to contain the expected values. /etc/sysconfig/network-scripts/ifcfg-lo does not appear to be created but I do see the loopback interface configured.

Changed in cloud-init (CentOS):
status: In Progress → Unknown
Changed in cloud-init (CentOS):
importance: Undecided → Medium
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.