Canonical Juju

LXD with juju storage and local loopbacks

Bug #1605241 reported by Adam Stokes on 2016-07-21

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Expired	Medium	Unassigned

Bug Description

I am doing a bunch of deployments/teardowns in 2 seperate forms:

The first is,

The second is,

$ juju bootstrap yung localhost --upload-tools --config image
-stream=daily --config enable-os-upgrade=false --bootstrap-series=xenial
$ juju deploy <bundle>
$ juju destroy-controller --destroy-all-models
$ juju bootstrap yung localhost --upload-tools --config image
-stream=daily --config enable-os-upgrade=false --bootstrap-series=xenial
$ juju deploy <bundle>

There are several times in both cases where machines will stay in pending/allocating stage:
https://paste.ubuntu.com/20304642/

Digging through the machine log I spotted:

2016-07-21 12:31:18 WARNING juju.network network.go:430 cannot get "lxdbr0" addresses: route ip+net: no such network interface (ignoring)

Even though my network bridge is alive and well:
eth0 Link encap:Ethernet HWaddr b8:ae:ed:74:0b:07
          inet addr:172.16.0.29 Bcast:172.16.0.255 Mask:255.255.255.0
          inet6 addr: fe80::baae:edff:fe74:b07/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:12111864 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4906790 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:17525256923 (17.5 GB) TX bytes:374279915 (374.2 MB)
          Interrupt:20 Memory:f7000000-f7020000

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:2389 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2389 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1665510 (1.6 MB) TX bytes:1665510 (1.6 MB)

lxdbr0 Link encap:Ethernet HWaddr fe:f3:21:3a:72:73
          inet addr:10.1.78.1 Bcast:0.0.0.0 Mask:255.255.255.0
          inet6 addr: fe80::50bb:10ff:fe3e:53c3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:4619775 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9949792 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:443834026 (443.8 MB) TX bytes:17423911337 (17.4 GB)

openstack0 Link encap:Ethernet HWaddr 00:00:00:00:00:00
          inet addr:10.99.0.1 Bcast:0.0.0.0 Mask:255.255.255.0
          inet6 addr: fe80::7492:c9ff:fee9:43f/64 Scope:Link
          UP BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:1038 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:69592 (69.5 KB) TX bytes:648 (648.0 B)

veth8BTVT4 Link encap:Ethernet HWaddr fe:f3:21:3a:72:73
          inet6 addr: fe80::fcf3:21ff:fe3a:7273/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:44491 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57014 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:60053248 (60.0 MB) TX bytes:94542924 (94.5 MB)

Another thing I saw is:

2016-07-21 12:40:15 ERROR juju.worker.environ wait.go:58 loaded invalid environment configuration: Error inserting juju-dino into database: UNIQUE constraint failed: profiles.name

Another interesting bit is that you can no longer destroy the controller:

juju destroy-controller yung --destroy-all-models
WARNING! This command will destroy the "yung" controller.
This includes all machines, applications, data and other resources.

Continue [y/N]? y
Destroying controller
Waiting for hosted model resources to be reclaimed
Waiting on 2 models, 3 machines, 3 applications
Waiting on 2 models, 3 machines, 3 applications
Waiting on 1 model, 3 machines, 3 applications
Waiting on 1 model, 3 machines, 3 applications
Waiting on 1 model, 3 machines, 3 applications
..on it goes

This was filed as a seperate bug: https://bugs.launchpad.net/juju-core/+bug/1604931

Machine-0 log: http://paste.ubuntu.com/20304298/

See original description

Tags:

Adam Stokes (adam-stokes) on 2016-07-21

tags:	added: conjure
description:	updated

Revision history for this message

Tim Penhey (thumper) wrote on 2016-07-21:

Wild stab in the dark, but perhaps this needs to be checked:

2016-07-21 12:31:18 WARNING juju.network network.go:430 cannot get "lxdbr0" addresses: route ip+net: no such network interface (ignoring)

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.0-beta14

Alexis Bruemmer (alexis-bruemmer) on 2016-07-21

Changed in juju-core:
assignee:	nobody → Richard Harding (rharding)

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2016-07-24: Re: lxd instances not starting

I've seen this 2 when deploying a bundle which results in about 10 machines - 2 didn't come out of pending. "lxc list" showed them as running but they didn't even have ssh up. I got a shell using "lxc exec" and found that both containers were idle, only running initd, system-udevd and dhclient. It looked like cloud-init had run to some degree but hadn't done most of the work it would normally do.

I will attach the cloud init logs and the output from ps.

When I manually rebooted the containers, cloud-init ran again and they came up correctly.

summary:

- juju2beta12 lxd instances not starting
+ lxd instances not starting

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2016-07-24:

Output from "ps aux" from a stuck container Edit (644 bytes, text/plain)

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2016-07-24:

cloud-init.log from a stuck container Edit (33.9 KiB, text/plain)

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2016-07-24:

cloud-init-output.log from a stuck container Edit (2.1 KiB, text/plain)

Curtis Hovey (sinzui) on 2016-08-04

Changed in juju-core:
milestone:	2.0-beta14 → 2.0-beta15

Anastasia (anastasia-macmood) on 2016-08-10

Changed in juju-core:
milestone:	2.0-beta15 → 2.0-beta16

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

affects:	juju-core → juju
Changed in juju:
milestone:	2.0-beta16 → none
milestone:	none → 2.0-beta16

Curtis Hovey (sinzui) on 2016-08-25

Changed in juju:
milestone:	2.0-beta16 → 2.0-beta17

Curtis Hovey (sinzui) on 2016-09-01

Changed in juju:
milestone:	2.0-beta17 → 2.0-beta18

Curtis Hovey (sinzui) on 2016-09-09

Changed in juju:
milestone:	2.0-beta18 → 2.0-beta19

Anastasia (anastasia-macmood) on 2016-09-11

Changed in juju:
milestone:	2.0-beta19 → 2.0-rc1

Richard Harding (rharding) on 2016-09-20

Changed in juju:
milestone:	2.0-rc1 → 2.0.0

Alexis Bruemmer (alexis-bruemmer) on 2016-10-05

Changed in juju:
milestone:	2.0.0 → 2.1.0

Anastasia (anastasia-macmood) on 2017-02-02

Changed in juju:
assignee:	Richard Harding (rharding) → nobody

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-13:

Since this report has been originally filed, several fixes went in into both Juju 2.x (especially around networking) and LXD.

This must have been fixed as part of other work. I can no longer reproduce this behavior with Juju 2.1 (tip) and LXD 2.4.1. I have used apache-processing-mapreduce. All machines came out of 'pending' into 'started' and I can ssh into all of them.

If you are still experiencing the problem, maybe a particular bundle is causing issue? Feel free to re-open then and specify what bundle you were using.

I am marking this as Fix Committed against our current active milestone.

Changed in juju:
status:	Triaged → Fix Committed

Revision history for this message

Chris Holcombe (xfactor973) wrote on 2017-02-15:

I'm hitting this also on juju 2.0.3 and lxd 2.8. I'll try upgrading to juju 2.1

Revision history for this message

Chris Holcombe (xfactor973) wrote on 2017-02-15:

I found out what my issue was. LXD was having a problem with juju storage and local loopbacks. I was getting errors like:

gluster/6 brick/6 pending creating volume: could not create block file: allocating loop backing file "/var/lib/juju/storage/loop/volume-8-6": fallocate: fallocate failed: Operation not supported: exit status 1

The problem is juju debug-log and juju status don't tell you anything happened other than indicating the unit is stuck in allocating. I became suspicious when I deployed an ubuntu unit and it worked fine. I dug through the logs and found that the next log entry after my gluster units got stuck was a line about juju storage in the ubuntu unit. So I randomly ran `juju storage` and found out that my block devices were stuck. It would really help to have this logged in the regular logs :)

Revision history for this message

Chris Holcombe (xfactor973) wrote on 2017-02-15:

I should also add that I dropped back down to juju 2.0.3 because I thought maybe 2.1 wasn't solving my problem.

Anastasia (anastasia-macmood) on 2017-02-15

summary:	- lxd instances not starting + LXD with juju storage and local loopbacks
Changed in juju:
status:	Fix Committed → Triaged
milestone:	2.1.0 → 2.2.0

Curtis Hovey (sinzui) on 2017-03-24

Changed in juju:
milestone:	2.2-beta1 → 2.2-beta2

Curtis Hovey (sinzui) on 2017-03-30

Changed in juju:
milestone:	2.2-beta2 → 2.2-beta3

Canonical Juju QA Bot (juju-qa-bot) on 2017-04-28

Changed in juju:
milestone:	2.2-beta3 → 2.2-beta4

Canonical Juju QA Bot (juju-qa-bot) on 2017-05-11

Changed in juju:
milestone:	2.2-beta4 → 2.2-rc1

Revision history for this message

Tim Penhey (thumper) wrote on 2017-06-01:

#10

Taking milestone off as no-one is targetted to address this just now.

Adam, how often is this a real issue for you?

tags:	added: provisioner
Changed in juju:
importance:	High → Medium
milestone:	2.2-rc1 → none

Revision history for this message

Adam Stokes (adam-stokes) wrote on 2017-06-12:

#11

Tim, I haven't seen this issue happen since I filed the report we could probably mark this incomplete and if it bubbles up again I'll file a new bug

Anastasia (anastasia-macmood) on 2017-06-13

Changed in juju:
status:	Triaged → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-08-13:

#12

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status:	Incomplete → Expired

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.