Canonical Juju

LXD machines on AWS occasionally get stuck with no IP

Bug #1663740 reported by George Kraft on 2017-02-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	John A Meinel	Canonical Juju 2.2-alpha1
	2.1	Fix Released	High	John A Meinel	Canonical Juju 2.1-rc2

Bug Description

When deploying a unit --to lxd:0 on AWS, occasionally I see the machine 0/lxd/0 get stuck in a "pending" state forever.

I do not recall this happening in 2.1-beta4, but have seen it several times in 2.1-beta5 - probably about 30% of my deployments get stuck here.

From machine 0, I'm able to `lxc exec` into juju-90d10d-0-lxd-0 and see that it came up with no IP:

$ sudo lxc exec juju-90d10d-0-lxd-0 ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:16:3e:da:02:74
          inet6 addr: fe80::216:3eff:feda:274/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:648 (648.0 B) TX bytes:648 (648.0 B)

I haven't been able to find any leads on this, though the output of /var/log/juju/machine-0.log looks interesting, as I see an unusual message saying "generated network config has no gateway". Will attach.

See original description

Tags:

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-10:

Output of `juju status --format yaml` Edit (6.0 KiB, text/plain)

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-10:

machine-0.log Edit (42.5 KiB, text/plain)

description:

updated

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-10:

I have not found a way to reproduce this reliably, however, I see it occasionally when I do `juju deploy cs:~containers/kubernetes-core` in an AWS model.

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-10:

Output of `lxc config show juju-90d10d-0-lxd-0` Edit (9.2 KiB, text/plain)

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-13:

This is still occurring for me with a newly bootstrapped controller on Juju 2.1-rc1.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-14:

This seems like regression from recent network-related changes. We'll have a look.

Thank you for report!

Changed in juju:
status:	New → Triaged
importance:	Undecided → Critical
milestone:	none → 2.1.0
importance:	Critical → High
tags:	added: network regression

Anastasia (anastasia-macmood) on 2017-02-14

Changed in juju:
milestone:	2.1.0 → 2.2.0-alpha1

John A Meinel (jameinel) on 2017-02-15

Changed in juju:
assignee:	nobody → John A Meinel (jameinel)

Revision history for this message

John A Meinel (jameinel) wrote on 2017-02-15:

This might be fixed by the fix for bug #1664409

Revision history for this message

John A Meinel (jameinel) wrote on 2017-02-15:

Are you doing anything different around OS series? Is it only when deploying the kubernetes charm?

Anastasia (anastasia-macmood) on 2017-02-15

Changed in juju:
status:	Triaged → Incomplete

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-15:

> Are you doing anything different around OS series?

Not that I'm aware of. All machines are running xenial.

> Is it only when deploying the kubernetes charm?

I was able to reproduce it just now using the standard ubuntu charm:

$ juju deploy ubuntu
$ juju add-unit ubuntu --to lxd:0

Although again, it's inconsistent - the problem only reproduced on the third attempt.

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-15:

#10

Output of `juju status` with ubuntu charm Edit (661 bytes, text/plain)

To clarify, the most recent repro involved only the ubuntu charm. See attachment.

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-15:

#11

/etc/network/interfaces on the LXD machine Edit (205 bytes, text/plain)

Grabbing /etc/network/interfaces since it was mentioned in the PR for bug #1664409.

Anastasia (anastasia-macmood) on 2017-02-15

Changed in juju:
status:	Incomplete → Triaged

Revision history for this message

John A Meinel (jameinel) wrote on 2017-02-16:

#12

So /etc/network/interfaces having "iface eth0 inet manual" is, indeed the symptom of the bug for #1664490 which is what you're seeing here.

I also have the "juju deploy cs:~jameinel/ubuntu-lite-nested" which was used to test some of the similar issues on LXD, but should work on AWS as well.

Are you able to test with a branch version, or only with released versions? (My fix for bug #1664490 should be in the next release which happens ~tommorrow.)

Anastasia (anastasia-macmood) on 2017-02-16

Changed in juju:
status:	Triaged → Incomplete
milestone:	2.2.0-alpha1 → none

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-16:

#13

> Are you able to test with a branch version, or only with released versions?

I haven't run a branch version before, but I'll give it a shot and reach out on IRC if I can't figure it out. :)

Revision history for this message

George Kraft (cynerva) wrote on 2017-02-23:

#14

I haven't hit this bug since I upgraded to 2.1-rc2 about a week ago. I think it's probably fixed. Thanks John!

Anastasia (anastasia-macmood) on 2017-02-23

Changed in juju:
status:	Incomplete → Fix Committed
milestone:	none → 2.2-rc1

Canonical Juju QA Bot (juju-qa-bot) on 2017-06-28

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.