LXD machines on AWS occasionally get stuck with no IP

Bug #1663740 reported by George Kraft
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
John A Meinel
2.1
Fix Released
High
John A Meinel

Bug Description

When deploying a unit --to lxd:0 on AWS, occasionally I see the machine 0/lxd/0 get stuck in a "pending" state forever.

I do not recall this happening in 2.1-beta4, but have seen it several times in 2.1-beta5 - probably about 30% of my deployments get stuck here.

From machine 0, I'm able to `lxc exec` into juju-90d10d-0-lxd-0 and see that it came up with no IP:

$ sudo lxc exec juju-90d10d-0-lxd-0 ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:16:3e:da:02:74
          inet6 addr: fe80::216:3eff:feda:274/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:648 (648.0 B) TX bytes:648 (648.0 B)

I haven't been able to find any leads on this, though the output of /var/log/juju/machine-0.log looks interesting, as I see an unusual message saying "generated network config has no gateway". Will attach.

Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :
description: updated
Revision history for this message
George Kraft (cynerva) wrote :

I have not found a way to reproduce this reliably, however, I see it occasionally when I do `juju deploy cs:~containers/kubernetes-core` in an AWS model.

Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

This is still occurring for me with a newly bootstrapped controller on Juju 2.1-rc1.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

This seems like regression from recent network-related changes. We'll have a look.

Thank you for report!

Changed in juju:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.1.0
importance: Critical → High
tags: added: network regression
Changed in juju:
milestone: 2.1.0 → 2.2.0-alpha1
John A Meinel (jameinel)
Changed in juju:
assignee: nobody → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote :

This might be fixed by the fix for bug #1664409

Revision history for this message
John A Meinel (jameinel) wrote :

Are you doing anything different around OS series? Is it only when deploying the kubernetes charm?

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
George Kraft (cynerva) wrote :

> Are you doing anything different around OS series?

Not that I'm aware of. All machines are running xenial.

> Is it only when deploying the kubernetes charm?

I was able to reproduce it just now using the standard ubuntu charm:

$ juju deploy ubuntu
$ juju add-unit ubuntu --to lxd:0

Although again, it's inconsistent - the problem only reproduced on the third attempt.

Revision history for this message
George Kraft (cynerva) wrote :

To clarify, the most recent repro involved only the ubuntu charm. See attachment.

Revision history for this message
George Kraft (cynerva) wrote :

Grabbing /etc/network/interfaces since it was mentioned in the PR for bug #1664409.

Changed in juju:
status: Incomplete → Triaged
Revision history for this message
John A Meinel (jameinel) wrote :

So /etc/network/interfaces having "iface eth0 inet manual" is, indeed the symptom of the bug for #1664490 which is what you're seeing here.

I also have the "juju deploy cs:~jameinel/ubuntu-lite-nested" which was used to test some of the similar issues on LXD, but should work on AWS as well.

Are you able to test with a branch version, or only with released versions? (My fix for bug #1664490 should be in the next release which happens ~tommorrow.)

Changed in juju:
status: Triaged → Incomplete
milestone: 2.2.0-alpha1 → none
Revision history for this message
George Kraft (cynerva) wrote :

> Are you able to test with a branch version, or only with released versions?

I haven't run a branch version before, but I'll give it a shot and reach out on IRC if I can't figure it out. :)

Revision history for this message
George Kraft (cynerva) wrote :

I haven't hit this bug since I upgraded to 2.1-rc2 about a week ago. I think it's probably fixed. Thanks John!

Changed in juju:
status: Incomplete → Fix Committed
milestone: none → 2.2-rc1
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.