lxd containers created by juju are not receiving IP addresses

Bug #1762700 reported by Jason Hobbs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Heather Lanigan
2.3
Invalid
Undecided
Unassigned
2.4
Fix Released
Undecided
Unassigned
cloud-init
Invalid
Undecided
Unassigned

Bug Description

using juju 2.3.5, maas 2.3.0-6434-gd354690-0ubuntu1~16.04.1, and lxd 2.0.11-0ubuntu1~16.04.4, containers are not receiving IP addresses:

http://paste.ubuntu.com/p/wdY9ShtrTP/

there is an error in cloud-init-output.log:
http://paste.ubuntu.com/p/k8rdGHjDQq/

Revision history for this message
Chris Gregan (cgregan) wrote :

Escalated to Field Critical due to blocking nature of this issue

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1762700] Re: lxd containers created by juju are not receiving IP addresses

Is that Juju 2.0.11? I'm pretty sure we don't support that version. 2.0
didn't support network spaces very well, 2.1 would certainly be a minimum
recommendation.

On Tue, Apr 10, 2018 at 3:09 PM, Chris Gregan <email address hidden>
wrote:

> Escalated to Field Critical due to blocking nature of this issue
>
> --
> You received this bug notification because you are a member of Canonical
> Field Critical, which is subscribed to the bug report.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1762700
>
> Title:
> lxd containers created by juju are not receiving IP addresses
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1762700/+subscriptions
>

Revision history for this message
John A Meinel (jameinel) wrote :

I believe:
 AttributeError: 'NoneType' object has no attribute 'iter_interfaces'

Was a bug that we ran into from cloud-init. I don't remember whether that
was just fixed upstream or whether we committed our own workaround for it.
But the failure is in the python code vs Juju proper.

On Tue, Apr 10, 2018 at 3:42 PM, John Meinel <email address hidden> wrote:

> Is that Juju 2.0.11? I'm pretty sure we don't support that version. 2.0
> didn't support network spaces very well, 2.1 would certainly be a minimum
> recommendation.
>
>
> On Tue, Apr 10, 2018 at 3:09 PM, Chris Gregan <email address hidden>
> wrote:
>
>> Escalated to Field Critical due to blocking nature of this issue
>>
>> --
>> You received this bug notification because you are a member of Canonical
>> Field Critical, which is subscribed to the bug report.
>> Matching subscriptions: juju bugs
>> https://bugs.launchpad.net/bugs/1762700
>>
>> Title:
>> lxd containers created by juju are not receiving IP addresses
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/juju/+bug/1762700/+subscriptions
>>
>
>

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

No, it was lxd 2.0.11, sorry, fixed.

description: updated
Revision history for this message
Eric Vasquez (envas) wrote :

Juju crash dump attached

Revision history for this message
John A Meinel (jameinel) wrote :

Can you describe the steps used to create the environment? Are these Trusty machines or Xenial?

Knowing how they were bootstrapped, what it is targeting (what bundle was deployed, what configuration in the bundle, etc).

Changed in juju:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

The attached crashdump should show how they're being deployed, but
they're all xenial.

Here's some bundles: http://paste.ubuntu.com/p/5gPZhrPPyJ/

On Tue, Apr 10, 2018 at 4:07 PM, John A Meinel <email address hidden> wrote:
> Can you describe the steps used to create the environment? Are these
> Trusty machines or Xenial?
>
> Knowing how they were bootstrapped, what it is targeting (what bundle
> was deployed, what configuration in the bundle, etc).
>
>
> ** Changed in: juju
> Importance: Undecided => High
>
> ** Changed in: juju
> Status: New => Triaged
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1762700
>
> Title:
> lxd containers created by juju are not receiving IP addresses
>
> Status in juju:
> Triaged
>
> Bug description:
> using juju 2.3.5, maas 2.3.0-6434-gd354690-0ubuntu1~16.04.1, and lxd
> 2.0.11-0ubuntu1~16.04.4, containers are not receiving IP addresses:
>
> http://paste.ubuntu.com/p/wdY9ShtrTP/
>
> there is an error in cloud-init-output.log:
> http://paste.ubuntu.com/p/k8rdGHjDQq/
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1762700/+subscriptions

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

07:13 < jam> jhobbs: my initial thought is that it is a cloud-init bug, where we might be passing something like "networking=false" because we are going to set up networking ourself
07:14 < jam> and that causes a particular place that always assumed it had a value to iterate, but the object it has is now None.

Revision history for this message
John A Meinel (jameinel) wrote :

Is it possible to get the crashdump from the controller model as well?

Changed in juju:
assignee: nobody → Heather Lanigan (hmlanigan)
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

The error at the top of the cloud-init-output.log appears to be a red herring. It's part of the init-local piece done by cloud-init asap after boot. I bootstrapped an controller on OpenStack, deployed the ubuntu charm --to lxd. The container did get an IP addr in this case, but also had the same error at the top of the cloud-init-output.log

Is it possible to get the following files from one of the containers exhibiting this issue:
/var/lib/cloud/instance/*

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

From the logs provided, juju is determining ip addresses to give to the containers and the cloud-init-output.log says they were assigned:

https://pastebin.canonical.com/p/4CqQrS92VR/

Currently undetermined why lxc list isn't showing them.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Ok, the crashdumps with "da37941a-0cdd-4004-a761-481247ff8bff.tar.gz" are from the wrong test run. It did not exhibit this bug. I'm uploading crashdumps from a run that did.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

This is a juju 2.4-beta1 bug per the new log files.

I can reproduce with:
juju bootstrap localhost test
juju deploy ubuntu --to lxd

Per var/log/juju/machine-0.log, juju is determining an ip address for the containers, and adding to the network config part of the cloud-init user-data file.... something goes wrong after that.

Not reproducible with juju 2.3.5

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

Using this log tarball: juju-crashdump-ad91fa98-f4a7-4ab3-90a3-e9c2f3b75f9b.tar.gz

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Hmm, after hitting this with 2.4 I switched to 2.3.5 from the snap
store and hit it. I do not have logs from that reproduction. We will
continue to test and see if we can reproduce on 2.3.5.

On Wed, Apr 11, 2018 at 9:58 PM, Heather Lanigan
<email address hidden> wrote:
> Using this log tarball: juju-crashdump-
> ad91fa98-f4a7-4ab3-90a3-e9c2f3b75f9b.tar.gz
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1762700
>
> Title:
> lxd containers created by juju are not receiving IP addresses
>
> Status in cloud-init:
> New
> Status in juju:
> In Progress
>
> Bug description:
> using juju 2.3.5, maas 2.3.0-6434-gd354690-0ubuntu1~16.04.1, and lxd
> 2.0.11-0ubuntu1~16.04.4, containers are not receiving IP addresses:
>
> http://paste.ubuntu.com/p/wdY9ShtrTP/
>
> there is an error in cloud-init-output.log:
> http://paste.ubuntu.com/p/k8rdGHjDQq/
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1762700/+subscriptions

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

good news is that lxc exec to the unhappy containers is possible.

part of the cloudinit code that juju provides is broken:
$ sudo lxc exec juju-fc6bd6-0-lxd-0 -- python3 /etc/network/interfaces.py --interfaces-file /etc/network/interfaces.tmp
sudo: unable to resolve host juju-fc6bd6-default-0
Parsing ip command output [b'1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1\\ link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00', b'7: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8908 qdisc noqueue state UP mode DEFAULT group default qlen 1000\\ link/ether 00:16:3e:17:53:2a brd ff:ff:ff:ff:ff:ff link-netnsid 0']
Found the following devices: {'00:16:3e:17:53:2a': 'eth0'}
Traceback (most recent call last):
  File "/etc/network/interfaces.py", line 90, in <module>
    main()
  File "/etc/network/interfaces.py", line 84, in main
    if replace_ethernets(args.intf_file, ip_output, (tries != retries - 1)):
  File "/etc/network/interfaces.py", line 41, in replace_ethernets
    with open(interfaces_file + ".templ", "r") as intf_file:
FileNotFoundError: [Errno 2] No such file or directory: '/etc/network/interfaces.tmp.templ

Revision history for this message
Chad Smith (chad.smith) wrote :

anytime I see cloud-init generating tracebacks on network config I generally end up looking at what was metadata provided to the datasource. If possible please also attach /run/cloud-init/instance-data.json if available. (cloud-init v. 17.2 ++)

Revision history for this message
Chad Smith (chad.smith) wrote :

also an attachment of the entire /var/log/cloud-init.log helps too.
I'm expecting we can see specifically the network config passed to cloud-init local stage with a message like

Applying network configuration from ds ....

Revision history for this message
Heather Lanigan (hmlanigan) wrote :

PR for issue as seen in 2.4-beta1: https://github.com/juju/juju/pull/8582

Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

Removing field critical as this was confirmed to not affect stable juju.

Revision history for this message
Andrew McLeod (admcleod) wrote :

I have replicated this with artful on arm64 and juju 2.3.2:

juju version: 2.3.2-xenial-amd64 (bootstrap node is amd64)
lxd version: 2.18-0ubuntu6
MAAS version: 2.3.0 (6434-gd354690-0ubuntu1~16.04.1)

https://pastebin.canonical.com/p/gWr7dKdpMg/

Attached files from /var/lib/cloud/instance/ of a container

dhclient gets an ip immediately.

Revision history for this message
Andrew McLeod (admcleod) wrote :

My comment/issue appears to be a slightly different bug as it applies to artful+netplan as opposed to xenial

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (6.1 KiB)

I just ran into this on AWS running Xenial instances and Xenial containers.
I don't know if AttributeError: 'NoneType' object has no attribute
'iter_interfaces' is a red-herring or not, as it does continue onward with
the rest of the actions. (It successfully runs the bootcmd/runcmd stages of
the scripts.)

This was using Juju 2.3.5 not 2.4.
However, I did *not* have the final lines:

Traceback (most recent call last):
  File "/etc/network/interfaces.py", line 90, in <module>
    main()
  File "/etc/network/interfaces.py", line 84, in main
    if replace_ethernets(args.intf_file, ip_output, (tries != retries - 1)):
  File "/etc/network/interfaces.py", line 41, in replace_ethernets
    with open(interfaces_file + ".templ", "r") as intf_file:
FileNotFoundError: [Errno 2] No such file or directory:
'/etc/network/interfaces.tmp.templ'

My cloud-init-output.log is:
Cloud-init v. 17.2 running 'init-local' at Wed, 18 Apr 2018 07:59:33 +0000.
Up 15.00 seconds.
2018-04-18 07:59:38,076 - stages.py[WARNING]: Failed to rename devices:
Failed to apply network config names. Found bad network config version: None
2018-04-18 07:59:38,120 - util.py[WARNING]: failed stage init-local
failed run of stage init-local
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 650, in
status_wrapper
    ret = functor(name, args)
  File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 357, in
main_init
    init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 654, in
apply_network_config
    return self.distro.apply_network_config(netcfg, bring_up=bring_up)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line
171, in apply_network_config
    dev_names = self._write_network_config(netconfig)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/debian.py", line
119, in _write_network_config
    return self._supported_write_network_config(netconfig)
  File "/usr/lib/python3/dist-packages/cloudinit/distros/__init__.py", line
90, in _supported_write_network_config
    renderer.render_network_config(network_config=network_config)
  File "/usr/lib/python3/dist-packages/cloudinit/net/renderer.py", line 53,
in render_network_config
    network_state=parse_net_config_data(network_config), target=target)
  File "/usr/lib/python3/dist-packages/cloudinit/net/eni.py", line 466, in
render_network_state
    util.write_file(fpeni, header + self._render_interfaces(network_state))
  File "/usr/lib/python3/dist-packages/cloudinit/net/eni.py", line 423, in
_render_interfaces
    for iface in network_state.iter_interfaces():
AttributeError: 'NoneType' object has no attribute 'iter_interfaces'
------------------------------------------------------------
Cloud-init v. 17.2 running 'init' at Wed, 18 Apr 2018 07:59:46 +0000. Up
29.00 seconds.
ci-info: ++++++++++++++++++++++++++++++++++++Net device
info++++++++++++++++++++++++++++++++++++
ci-info:
+--------+------+-----------------------------+-----------+-------+-------------------+
ci-info: | Device | Up | ...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :

If this happens in a container, can you provide:

/var/lib/cloud/seed/nocloud-net/ directory?

The traceback in #23 looks like the /var/lib/cloud/seed/nocloud-net/network-config is malformed.

root@b2:/var/lib/cloud/seed/nocloud-net# cat network-config
version: 1
config:
    - type: physical
      name: eth0
      subnets:
          - type: dhcp
            control: auto

The traceback in #25 is looks like the same issue, but since Xenial uses a different renderer (eni vs netplan in artful/bionic) the stack is slightly different, but note both indicate that the network config fed to cloud-init via the datasource (for lxd that's nocloud-net) isn't formatted correctly, it's missing the config: value in the yaml.

So, please do capture /var/lib/cloud/seed/* from any of these failing instances and post that here soonest.

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Ryan Harper (raharper) wrote :

Did juju come to some solution here? Could someone summarize what changed?

Changed in juju:
status: Fix Committed → Fix Released
James Falcon (falcojr)
Changed in cloud-init:
status: Incomplete → Invalid
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.