br0 not brought up by cloud-init script with MAAS provider

Bug #1271144 reported by Nicola Larosa on 2014-01-21
126
This bug affects 21 people
Affects Status Importance Assigned to Milestone
juju-core
Critical
Andrew Wilkins
juju-core (Ubuntu)
Critical
Unassigned
Trusty
Critical
Dimiter Naydenov

Bug Description

Setup: a virtual OpenStack deployment on the cabeiri host in qalab.

There are three KVM VMs:

- virtmaas is the MAAS controller;
- virtjuju is the Juju bootstrap node (machine 0);
- virtstack is the OpenStack deployment target (machine 1).

Running the command:

test@virtmaas:~$ juju deploy --config=openstack.cfg ceph --to lxc:1

this error appears after a while:

test@virtmaas:~$ juju status
environment: maas
machines:
  "0":
    agent-state: started
    agent-version: 1.17.0.1
    dns-name: virtjuju.master
    instance-id: /MAAS/api/1.0/nodes/node-6df73c0a-7ed2-11e3-bac3-5254006e0119/
    series: precise
  "1":
    agent-state: started
    agent-version: 1.17.0.1
    dns-name: virtstack.master
    instance-id: /MAAS/api/1.0/nodes/node-ef9619e0-7f84-11e3-b750-5254006e0119/
    series: precise
    containers:
      1/lxc/0:
        agent-state-info: '(error: error executing "lxc-start": command get_init_pid
          failed to receive response)'
        instance-id: pending
        series: precise
services:
  ceph:
    charm: cs:precise/ceph-19
    exposed: false
    relations:
      mon:
      - ceph
    units:
      ceph/0:
        agent-state: pending
        machine: 1/lxc/0

On virtstack:

/var/log/juju/machine-1.log

2014-01-21 11:09:50 INFO juju runner.go:262 worker: start "lxc-provisioner"
2014-01-21 11:09:50 INFO juju.provisioner provisioner_task.go:114 Starting up provisioner task machine-1
2014-01-21 11:09:50 INFO juju.provisioner provisioner_task.go:298 found machine "1/lxc/0" pending provisioning
2014-01-21 11:09:50 INFO juju.provisioner.lxc lxc-broker.go:54 starting lxc container for machineId: 1/lxc/0
2014-01-21 11:10:22 ERROR juju.container.lxc lxc.go:129 container failed to start: error executing "lxc-start": command get_init_pid failed to receive response
2014-01-21 11:10:22 ERROR juju.provisioner.lxc lxc-broker.go:85 failed to start container: error executing "lxc-start": command get_init_pid failed to receive response
2014-01-21 11:10:22 ERROR juju.provisioner provisioner_task.go:399 cannot start instance for machine "1/lxc/0": error executing "lxc-start": command get_init_pid failed to receive response

/var/log/lxc/juju-machine-1-lxc-0.log: empty

Apparently br0, needed by MAAS, is not brought up by Juju's cloud-init script:

ubuntu@virtstack:~$ ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:6c:c6:c1
          inet addr:192.168.100.152 Bcast:192.168.100.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe6c:c6c1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:84169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26339 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:307004880 (307.0 MB) TX bytes:2763330 (2.7 MB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

lxcbr0 Link encap:Ethernet HWaddr 22:14:e3:a6:66:d0
          inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
          inet6 addr: fe80::2014:e3ff:fea6:66d0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:2016 (2.0 KB)

Bringing it up manually:

ubuntu@virtstack:~$ sudo bash -c "ifdown eth0; ifup eth0; ifup br0"
 * Disconnecting iSCSI targets
   ...done.
 * Stopping iSCSI initiator service
   ...done.
 * Starting iSCSI initiator service iscsid
   ...done.
 * Setting up iSCSI targets
   ...done.
ssh stop/waiting
ssh start/running, process 1369

Waiting for br0 to get ready (MAXWAIT is 32 seconds).
 * Setting up iSCSI targets
   ...done.
ssh stop/waiting
ssh start/running, process 1486

ubuntu@virtstack:~$ ifconfig
br0 Link encap:Ethernet HWaddr 52:54:00:6c:c6:c1
          inet addr:192.168.100.152 Bcast:192.168.100.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe6c:c6c1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:66 errors:0 dropped:0 overruns:0 frame:0
          TX packets:51 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:5632 (5.6 KB) TX bytes:5539 (5.5 KB)

eth0 Link encap:Ethernet HWaddr 52:54:00:6c:c6:c1
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:84338 errors:0 dropped:0 overruns:0 frame:0
          TX packets:26466 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:307020587 (307.0 MB) TX bytes:2778824 (2.7 MB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

lxcbr0 Link encap:Ethernet HWaddr 22:14:e3:a6:66:d0
          inet addr:10.0.3.1 Bcast:10.0.3.255 Mask:255.255.255.0
          inet6 addr: fe80::2014:e3ff:fea6:66d0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:24 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B) TX bytes:2016 (2.0 KB)

Deploying ceph again now works.

Related branches

Curtis Hovey (sinzui) on 2014-01-21
tags: added: local-provider lxc
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.0
Andreas Hasenack (ahasenack) wrote :
Download full text (3.7 KiB)

I get the same behavior with 1.17.3, but NOT 1.16.6. With 1.16.6, the br0 interface is up.

I bootstrap using maas, and then juju ssh 0. This is what I see:
$ ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:50:d4:28
          inet addr:10.0.5.104 Bcast:10.0.5.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe50:d428/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:14003 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6573 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:146427029 (146.4 MB) TX bytes:488564 (488.5 KB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:1063 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1063 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:300269 (300.2 KB) TX bytes:300269 (300.2 KB)

This is the bit in cloud-init where it calls "service networking restart":
Cloud-init v. 0.7 running 'modules:final' at Mon, 24 Feb 2014 22:37:12 +0000. Up 9.17 seconds.
stop: Unknown instance:
networking stop/waiting
+ install -D -m 644 /dev/null /var/lib/juju/nonce.txt
+ printf %s\n user-admin:bootstrap
Cloud-init v. 0.7 finished at Mon, 24 Feb 2014 22:37:12 +0000. Datasource DataSourceMAAS [http://10.0.5.10/MAAS/metadata/]. Up 9.34 seconds

Note that there are two networking scripts, but service ends up calling the upstart one:
# ls -la /etc/init/networking.conf /etc/init.d/networking
-rwxr-xr-x 1 root root 2797 Feb 13 2012 /etc/init.d/networking
-rw-r--r-- 1 root root 388 Apr 5 2012 /etc/init/networking.conf

The network config in /etc/network seems fine:
# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
source /etc/network/eth0.config

and
# cat /etc/network/eth0.config
iface eth0 inet manual

auto br0
iface br0 inet dhcp
  bridge_ports eth0

If I call "service networking restart" right now, br0 will show up:
root@n9a4m:~# service networking restart
stop: Unknown instance:
networking stop/waiting
root@n9a4m:~# ifconfig
br0 Link encap:Ethernet HWaddr 52:54:00:50:d4:28
          inet addr:10.0.5.104 Bcast:10.0.5.255 Mask:255.255.255.0
          inet6 addr: fe80::5054:ff:fe50:d428/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:31 errors:0 dropped:0 overruns:0 frame:0
          TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:3145 (3.1 KB) TX bytes:2644 (2.6 KB)

eth0 Link encap:Ethernet HWaddr 52:54:00:50:d4:28
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:14811 errors:0 dropped:0 overruns:0 frame:0
          TX packets:7078 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:14649204...

Read more...

Andreas Hasenack (ahasenack) wrote :

This happened to me on both KVM with maas, as well as real maas on real hardware.

summary: - br0 not brought up by cloud-init script
+ br0 not brought up by cloud-init script with MAAS provider
Andreas Hasenack (ahasenack) wrote :

Oh, and it's consistent. Happens everytime.

Adam Collard (adam-collard) wrote :

This looks very similar to bug 1274210 which is (allegedly ;) ) fixed in 1.17.3. Which is a duplicate of which I'll leave as an exercise to the reader.

tags: added: landscape
Andreas Hasenack (ahasenack) wrote :

Adam found the issue. bridge-utils is installed too late, *after* the first call to networking restart. So the bridge never gets setup.

Cloud-init v. 0.7 running 'modules:final' at Mon, 24 Feb 2014 22:33:10 +0000. Up 42.01 seconds.
stop: Unknown instance:
networking stop/waiting
(...)
The following NEW packages will be installed:
  bridge-utils
0 upgraded, 1 newly installed, 0 to remove and 3 not upgraded.
(...)

Andreas Hasenack (ahasenack) wrote :

And I just found out something important: this only happens with the bootstrap node. The other "normal" nodes get bridge-utils installed just before networking is restarted:
Processing triggers for man-db ...
Setting up bridge-utils (1.5-2ubuntu7) ...
Setting up msr-tools (1.2-3) ...
Setting up cpu-checker (0.7-0ubuntu1) ...
Setting up liberror-perl (0.17-1) ...
Setting up git-man (1:1.7.9.5-1) ...
Setting up git (1:1.7.9.5-1) ...
Cloud-init v. 0.7 running 'modules:final' at Tue, 25 Feb 2014 12:34:06 +0000. Up 215.06 seconds.
stop: Unknown instance:
networking stop/waiting

Curtis Hovey (sinzui) on 2014-02-25
Changed in juju-core:
milestone: 2.0 → 1.18.0
tags: added: maas
Mark Ramm (mark-ramm) on 2014-02-27
Changed in juju-core:
importance: High → Critical
tags: added: regression
John A Meinel (jameinel) on 2014-03-07
Changed in juju-core:
milestone: 1.18.0 → 1.17.5
status: Triaged → In Progress
assignee: nobody → Roger Peppe (rogpeppe)
Tycho Andersen (tycho-s) on 2014-03-11
tags: added: cloud-installer
Wayne Witzel III (wwitzel3) wrote :

Current trunk everything is working for me the same as on the proposed branch. The pastebin is the current status of my MAAS after running the commands below it.

http://paste.ubuntu.com/7082704/

juju bootstrap -e maas --upload-tools --debug
juju deploy --to lxc:0 ubuntu
juju add-unit ubuntu
juju add-unit --to lxc:1 ubuntu
juju ssh 1/lxc/0
juju ssh 0/lxc/0

Wayne Witzel III (wwitzel3) wrote :

After another round of testing I was able to replicate the error on trunk, but not on the fix branch.

http://paste.ubuntu.com/7084302/

Curtis Hovey (sinzui) on 2014-03-14
Changed in juju-core:
status: In Progress → Fix Committed
Curtis Hovey (sinzui) on 2014-03-14
Changed in juju-core:
status: Fix Committed → Fix Released
Andreas Hasenack (ahasenack) wrote :
Download full text (4.1 KiB)

This is still happening with 1.17.5: br0 is not up in the bootstrap node:

andreas@nsn7:~$ juju bootstrap -v
Flag --verbose is deprecated with the current meaning, use --show-log
2014-03-19 12:49:09 INFO juju.environs.bootstrap bootstrap.go:46 bootstrapping environment "scapestack"
2014-03-19 12:49:12 INFO juju.environs.tools tools.go:85 reading tools with major.minor version 1.17
2014-03-19 12:49:12 INFO juju.environs.tools tools.go:96 filtering tools by series: precise
2014-03-19 12:49:19 INFO juju.environs.bootstrap bootstrap.go:58 picked newest version: 1.17.5
Launching instance
2014-03-19 12:49:20 WARNING juju.provider.maas environ.go:192 picked arbitrary tools &{"1.17.5-precise-amd64" "https://streams.canonical.com/juju/tools/releases/juju-1.17.5-precise-amd64.tgz" "b070028dc9537327885bd3f8deb302f47122e5e0a5d3281fb1e6bb74cadcba1d" %!q(int64=5190393)}
 - /MAAS/api/1.0/nodes/node-ddcaf98c-ab86-11e3-8c77-2c59e54ace74/
Waiting for address
Attempting to connect to darby.scapestack:22
Attempting to connect to 10.96.1.3:22
2014-03-19 12:56:28 INFO juju.cloudinit.sshinit configure.go:39 Provisioning machine agent on ubuntu@10.96.1.3
Warning: Permanently added '10.96.1.3' (ECDSA) to the list of known hosts.
Logging to /var/log/cloud-init-output.log on remote host
Installing add-apt-repository
Adding apt repository: deb http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/cloud-tools main
Running apt-get update
Running apt-get upgrade
Installing package: git
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Installing package: --target-release 'precise-updates/cloud-tools' 'mongodb-server'
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' -o $bin/tools.tar.gz 'https://streams.canonical.com/juju/tools/releases/juju-1.17.5-precise-amd64.tgz'
Starting MongoDB server (juju-db)
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
2014-03-19 13:00:13 INFO juju.cmd supercommand.go:302 command finished
andreas@nsn7:~$ juju status
environment: scapestack
machines:
  "0":
    agent-state: started
    agent-version: 1.17.5
    dns-name: darby.scapestack
    instance-id: /MAAS/api/1.0/nodes/node-ddcaf98c-ab86-11e3-8c77-2c59e54ace74/
    series: precise
services: {}
andreas@nsn7:~$ juju ssh 0
ifconfig
Warning: Permanently added 'darby.scapestack,10.96.1.3' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 12.04.4 LTS (GNU/Linux 3.2.0-54-generic x86_64)

 * Documentation: https://help.ubuntu.com/

  System information as of Wed Mar 19 13:01:09 UTC 2014

  System load: 0.45 Processes: 101
  Usage of /: 0.2% of 916.89GB Users logged in: 0
  Memory usage: 2% IP address for eth0: 10.96.1.3
  Swap usage: 0%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

  Use Juju to deploy your cloud instances and workloads:
    https://juju.ubuntu.com/#cloud-precise

*** System r...

Read more...

Andreas Hasenack (ahasenack) wrote :

Attached is another bootstrap but with --debug this time. And the cloud log files.

Andreas Hasenack (ahasenack) wrote :
Andreas Hasenack (ahasenack) wrote :
Andreas Hasenack (ahasenack) wrote :
Andreas Hasenack (ahasenack) wrote :
Adam Collard (adam-collard) wrote :

Cloud-init v. 0.7 running 'modules:final' at Wed, 19 Mar 2014 10:24:03 +0000. Up 36.05 seconds.
Reading package lists...
Building dependency tree...
Reading state information...
E: Unable to locate package bridge-utils
stop: Unknown instance:
networking stop/waiting

Relevant snippet in the logs

Curtis Hovey (sinzui) on 2014-03-19
Changed in juju-core:
status: Fix Released → Triaged
milestone: 1.17.5 → 1.17.6
James Page (james-page) wrote :

Confirming (again) - seen with 1.17.5 on trusty - bootstrap node does not have bridge networking.

Raising distro task for tracking.

Changed in juju-core (Ubuntu):
importance: Undecided → Critical
Changed in juju-core (Ubuntu Trusty):
status: New → Triaged
Andreas Hasenack (ahasenack) wrote :

I applied this patch and it worked:
=== modified file 'provider/maas/environ.go'
--- provider/maas/environ.go revid:tarmac-20140314141156-nsn5oeamfi31t1ct
+++ provider/maas/environ.go 2014-03-19 16:54:28 +0000
@@ -266,7 +266,13 @@
  userdata, err := environs.ComposeUserData(
   args.MachineConfig,
   runCmd,
- "apt-get install bridge-utils",
+ "cat /etc/apt/sources.list",
+ runCmd,
+ "cat /etc/apt/sources.list.d/*.list",
+ runCmd,
+ "apt-get update",
+ runCmd,
+ "apt-get install -y bridge-utils",
   createBridgeNetwork(),
   linkBridgeInInterfaces(),
   "service networking restart",

(of course, the cat's are just for extra debugging)

cloud-init-log:
Cloud-init v. 0.7 running 'modules:config' at Wed, 19 Mar 2014 17:34:46 +0000. Up 35.09 seconds.
Generating locales...
  en_US.UTF-8... done
Generation complete.
Cloud-init v. 0.7 running 'modules:final' at Wed, 19 Mar 2014 17:34:50 +0000. Up 38.23 seconds.
deb http://us.archive.ubuntu.com//ubuntu precise main restricted universe multiverse
deb http://us.archive.ubuntu.com//ubuntu precise-updates main restricted universe multiverse
deb http://us.archive.ubuntu.com//ubuntu precise-security main restricted universe multiverse
#deb http://ppa.launchpad.net/maas-maintainers/maas-ephemeral-images/ubuntu precise main
Get:1 http://us.archive.ubuntu.com precise Release.gpg [198 B]
(...)
Unpacking bridge-utils (from .../bridge-utils_1.5-2ubuntu7_amd64.deb) ...
Processing triggers for man-db ...
Setting up bridge-utils (1.5-2ubuntu7) ...
stop: Unknown instance:
networking stop/waiting

Andreas Hasenack (ahasenack) wrote :

<ahasenack> hazmat: remember the br0 not coming up bug on the bootstrap node? There were two cloud-init configs involved, right? Because br0 does get up in services deployed to new machines, just not in the bootstrap one
<ahasenack> hinting that the one used for bootstrap is different than the rest
<ahasenack> there was a long discussion here in the channel when that was first filed
<hazmat> ahasenack, yes.. there was.. i thought it was addressed with 1.17.5
<ahasenack> hazmat: nope, still happening
<hazmat> ahasenack, nutshell was on bootstrap node previously bridge-utils wasn't installed because of a divergent code path for bootstrap when constructing cloud-init
<ahasenack> hazmat: in my case, a simple 'apt-get update' just before 'apt-get install bridge-utils' worked
<ahasenack> hazmat: so the fix was to just add "apt-get install bridge-utils" in bootstrap's cloud-init?
<hazmat> ahasenack, ah.. the package is old..
<ahasenack> hazmat: right
<ahasenack> hazmat: I think it was assumed that apt_update: was set to True in cloud-init
<ahasenack> hazmat: which it might as well be, but not for bootstrap's cloud-init

Tim Penhey (thumper) wrote :

Can someone please test this?
    lp:~thumper/juju-core/maas-no-restart-networking

I don't have a MAAS setup to hand, but according to the code above, this *should* work.

Now we go
     ifdown eth0
before messing with the network config, and
     ifup eth0
     ifup br0
after, and no 'restart networking'

Andrew Wilkins (axwalk) on 2014-03-20
Changed in juju-core:
assignee: Roger Peppe (rogpeppe) → Andrew Wilkins (axwalk)
status: Triaged → In Progress
Andrew Wilkins (axwalk) on 2014-03-20
Changed in juju-core:
status: In Progress → Fix Committed
Adam Collard (adam-collard) wrote :

The reason why the test in comment 8 was passing but we were still seeing errors is that the APT index was up to date for Wayne, but not for us - debootstrap vs. fastpath installer.

This might be worth noting for any future MAAS bugs, to be sure to test it in both scenarios.

Curtis Hovey (sinzui) on 2014-03-20
Changed in juju-core:
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package juju-core - 1.17.6-0ubuntu1

---------------
juju-core (1.17.6-0ubuntu1) trusty; urgency=medium

  * New upstream point release, including fixes for:
    - br0 not bought up by cloud-init with MAAS provider (LP: #1271144).
    - ppc64el enablement for juju/lxc (LP: #1273769).
    - juju userdata should not restart networking (LP: #1248283).
    - error detecting hardware characteristics (LP: #1276909).
    - juju instances not including the default security group (LP: #1129720).
    - juju bootstrap does not honor https_proxy (LP: #1240260).
  * d/control,rules: Drop BD on bash-completion, install bash-completion
    direct from upstream source code.
  * d/rules: Set HOME prior to generating man pages.
  * d/control: Drop alternative dependency on mongodb-server; juju now only
    works on trusty with juju-mongodb.
 -- James Page <email address hidden> Mon, 24 Mar 2014 16:05:44 +0000

Changed in juju-core (Ubuntu Trusty):
status: Triaged → Fix Released
Jonathan (jfrancis-p) wrote :

This appears to be happening again on juju-core 1.18.1. When I attempted to create a new lxc container on the bootstrapped node (machine: 0), I received a lxc-start error. After reviewing this bug report, I manually added in the br0 interface into /etc/network/interfaces and ran ifup br0. Following that, I was able to successfully add a lxc container to machine 0.

Andreas Hasenack (ahasenack) wrote :

As a counterpoint, it worked for me with juju 1.18.1:

$ juju status
environment: scapestack
machines:
  "0":
    agent-state: started
    agent-version: 1.18.1
    dns-name: some.node
    instance-id: /MAAS/api/1.0/nodes/node-some-uuid
    series: precise
services: {}

$ juju ssh 0 ifconfig br0
br0 Link encap:Ethernet HWaddr aa:aa:aa:aa:aa:aa
          inet addr:1.2.3.4 Bcast:1.2.3.255 Mask:255.255.255.0
(...)

Is your bootstrap node on precise or trusty?

Download full text (7.3 KiB)

Forgot to include that detail. Trusty

Sent from my iPhone

> On Apr 25, 2014, at 2:58 PM, Andreas Hasenack <email address hidden> wrote:
>
> As a counterpoint, it worked for me with juju 1.18.1:
>
> $ juju status
> environment: scapestack
> machines:
> "0":
> agent-state: started
> agent-version: 1.18.1
> dns-name: some.node
> instance-id: /MAAS/api/1.0/nodes/node-some-uuid
> series: precise
> services: {}
>
> $ juju ssh 0 ifconfig br0
> br0 Link encap:Ethernet HWaddr aa:aa:aa:aa:aa:aa
> inet addr:1.2.3.4 Bcast:1.2.3.255 Mask:255.255.255.0
> (...)
>
> Is your bootstrap node on precise or trusty?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1271144
>
> Title:
> br0 not brought up by cloud-init script with MAAS provider
>
> Status in juju-core:
> Fix Released
> Status in “juju-core” package in Ubuntu:
> Fix Released
> Status in “juju-core” source package in Trusty:
> Fix Released
>
> Bug description:
> Setup: a virtual OpenStack deployment on the cabeiri host in qalab.
>
> There are three KVM VMs:
>
> - virtmaas is the MAAS controller;
> - virtjuju is the Juju bootstrap node (machine 0);
> - virtstack is the OpenStack deployment target (machine 1).
>
> Running the command:
>
> test@virtmaas:~$ juju deploy --config=openstack.cfg ceph --to lxc:1
>
> this error appears after a while:
>
> test@virtmaas:~$ juju status
> environment: maas
> machines:
> "0":
> agent-state: started
> agent-version: 1.17.0.1
> dns-name: virtjuju.master
> instance-id: /MAAS/api/1.0/nodes/node-6df73c0a-7ed2-11e3-bac3-5254006e0119/
> series: precise
> "1":
> agent-state: started
> agent-version: 1.17.0.1
> dns-name: virtstack.master
> instance-id: /MAAS/api/1.0/nodes/node-ef9619e0-7f84-11e3-b750-5254006e0119/
> series: precise
> containers:
> 1/lxc/0:
> agent-state-info: '(error: error executing "lxc-start": command get_init_pid
> failed to receive response)'
> instance-id: pending
> series: precise
> services:
> ceph:
> charm: cs:precise/ceph-19
> exposed: false
> relations:
> mon:
> - ceph
> units:
> ceph/0:
> agent-state: pending
> machine: 1/lxc/0
>
> On virtstack:
>
> /var/log/juju/machine-1.log
>
> 2014-01-21 11:09:50 INFO juju runner.go:262 worker: start "lxc-provisioner"
> 2014-01-21 11:09:50 INFO juju.provisioner provisioner_task.go:114 Starting up provisioner task machine-1
> 2014-01-21 11:09:50 INFO juju.provisioner provisioner_task.go:298 found machine "1/lxc/0" pending provisioning
> 2014-01-21 11:09:50 INFO juju.provisioner.lxc lxc-broker.go:54 starting lxc container for machineId: 1/lxc/0
> 2014-01-21 11:10:22 ERROR juju.container.lxc lxc.go:129 container failed to start: error executing "lxc-start": command get_init_pid failed to receive response
> 2014-01-21 11:10:22 ERROR juju.provisioner.lxc lxc-broker.go:85 failed to start container: error executing "lxc-start": command get_init_pid failed to receive response
> 2014-01-21 11:10:...

Read more...

Ante Karamatić (ivoks) wrote :

Confirming this issue on trusty with juju 1.18.4. br0 is not created.

Ante Karamatić (ivoks) wrote :

So, my nodes don't have eth0, but p1p1, if that makes any difference. I've noticed that br0 was not configured, so I manually added it to /etc/network/interfaces. After I brought up the interface, I've deployed a charm within LXC container.

Nodes are deployed by MAAS.

So, obviously, this is still broken in 14.04. Should br0 always be set on a node?

Changed in juju-core (Ubuntu Trusty):
status: Fix Released → Confirmed
Andrew Wilkins (axwalk) wrote :

> So, my nodes don't have eth0, but p1p1, if that makes any difference.

This is most likely the key; the MAAS provider code assumes it's dealing with eth0. Thanks for the information.

Shiv Prasad Rao (shivrao) wrote :

Confirming this issue on trusty with juju 1.19.4 and maas 1.5.1

----
cloud-init still creates eth0.config:
iface eth0 inet manual

auto br0
iface br0 inet dhcp
  bridge_ports eth0

And this is how it looks in /etc/network/interfaces:

# The loopback network interface
auto lo
iface lo inet loopback

auto eth0
source /etc/network/eth0.config

--------
This causes the lxc containers to be stuck in pending state when services are deployed in lxc on this maas-node.
https://bugs.launchpad.net/juju-core/+bug/1280461

Andreas Hasenack (ahasenack) wrote :

We are deploying lxc to bootstrap daily, using 1.5.1+bzr2269-0ubuntu0.1 from trusty-updates. Which one are you using?

Shiv Prasad Rao (shivrao) wrote :

Sorry forgot to mention this: The management interface on this bootstrap node is eth2. And as reported earlier in the thread by Ante, br0 is not being created.

Some more info:
And i am using fast-path installer for this node.

MAAS version: 1.5.1+bzr2269-0ubuntu0.1

Jorge Niedbalski (niedbalski) wrote :

Hello @ivoks, @shivrao,

I opened the bug LP:#1337091 for addressing the problem of juju trying to bring a new bridge with an inteface != eth0, inteface should be configurable via the network-bridge configuration directive.

James Troup (elmo) on 2014-07-09
tags: added: canonical-is
Rick Masters (grick23) wrote :

There has been a lot of activity on this bug with more than one issue involved and so I just wanted to confirm:

This problem still exists with maas 1.5.2+bzr2282 and juju 1.18.1 (the current stable releases) when you bootstrap a node with trusty.

Therefore, it breaks Canonical's OpenStack "Golden Happy Path". What I mean by that is, if you went to Canonical's web site and followed the instructions for deploying OpenStack using the latest stable, updated software (trusty, maas, and juju), then you will fail due to this bug. Right? Or is there anyone out there who is able to bootstrap juju correctly (br0 comes up) on Trusty via MaaS right now with the latest software?

Also, it's very difficult to know what software component is at fault so you can report the problem or look for existing bugs. Even if you guessed it was juju, this bug doesn't even show up on the default list of bugs because it's marked "Fix Released" as it primary status.

BTW, I doubt that the fix for #1337091 will fix this issue with trusty because even if I force my new trusy NIC names (em1, etc) back to eth0 using biosdevname=0 boot option and manually configure /etc/network/interfaces and reboot, then br0 is still not brought up when it should. The result is that cloud-init-nonet complains about the network not being up. Note that immediately after cloud-init-nonet gives up, ifup -a is run (by whom is a total mystery as the parent process is 1) and then br0 finally comes up but its too late.

I tried to debug the problem but I cannot find any documentation as to what the boot sequence for networking is supposed to be when juju/cloud-init is involved. I notice that some hardware interfaces are brought up due to udev events, but the software bridges are not brought up until later. I guess that's why the cloud init scripts try to bring up br0 on their own. Except its not happening. Are the cloud-init scripts supposed to bring up br0 on every boot or just after the first boot when it munges the network config?

Since this bug involves different issues, it's not clear whether anyone is still working on this (the Trusty target is Unassigned) or is even aware that a problem still exists.

For all these reasons, this bug is very frustrating. It would be very helpful if someone could provide status as to what is being or will been done to resolve it. If folks are still not convinced that a real problem is still outstanding, I have equipment and time and can provide whatever information and will follow whatever instructions are necessary to change their minds. If I've done something wrong, I'm eager to determine what it was. Also, if it is unreasonable to expect 14.04 deployments to work anytime soon, please let me know and I'll go back to deploying 12.04 for now.

Andrew Wilkins (axwalk) wrote :

Rick,

Sorry for the frustration. It is certainly not unreasonable to expect 14.04 to be working.

br0 is only brought up on first boot, so rebooting would not have triggered that part of cloud-config again.

I believe we are looking at back-porting Juju 1.20.2 (the next stable release) into 14.04. There are some issues relating to Mongo replica sets in Juju 1.20.1 that we're working on, but it should be stable enough to try out. Would you please see if 1.20.1 makes any difference for you?

Rick Masters (grick23) wrote :

Thanks for the reply. I'm willing to try anything that is recommended.

The primary NIC on Trusty come up as em1, not eth0. Therefore, I've been assuming that the fix for #1337091 will, at least, be required to address this issue. Whether it will be sufficient is an open question, but I'm skeptical for the reasons explained in comment #32.

Note that #1337091 is targeted to 1.21 alpha and is apparently unreleased. So #1337091 would not be included in 1.20.1 would it? If that's the case, I think it is unlikely that 1.20.1 will work. That said, I'm still willing to try it.

Version 1.20.1 is not a version I've heard of before. Is it available now? Where can I get it and how can I install it?
Thanks!

Andreas Hasenack (ahasenack) wrote :

1.20.1 is in the juju stable ppa:

sudo add-apt-repository ppa:juju/stable

Andrew Wilkins (axwalk) wrote :

Sorry, I was thinking that lp:1337091 was released already. We will look at getting it into 1.20.2.

If you set the kernel parameter via MAAS [0] prior to first boot, then I would concur that it's probably not going to help. If, however, you modified GRUB after it came up and rebooted, I think that would not have helped due to the network configuration actions only happening on first boot.

[0] http://maas.ubuntu.com/docs/kernel-options.html

Andrew Wilkins (axwalk) on 2014-07-22
no longer affects: juju-core/1.20
danieru (samuraidanieru) wrote :

Hi,

I'm seeing this problem in juju-core/1.20.5.

During 'juju bootstrap' it finally fails with:

"ERROR bootstrap failed: waited for 10m0s without being able to connect: ssh: connect to host 192.168.122.105 port 22: No route to host"

And on the guest console it complains that br0 not brought up because it doesn't exist.

My environment:

I'm using the juju packages from the juju ppa. The exact package is '1.20.5-0ubuntu1~12.04.1~juju1'

The MAAS server is on trusty, with the default MAAS packages from the trusty repo. I'm running MAAS and it's nodes in KVM for testing purposes.

It looks like this bug is marked as fixed for juju-core/1.20. Is it possible to re-open it?

Andrew Wilkins (axwalk) wrote :

@danieru: what network interfaces show up? is the primary network interface called something other than "eth0"? If so, you'd need to set the "network-bridge" attribute in environments.yaml.

tags: added: cts
Kapil Thangavelu (hazmat) wrote :

just got a report of the same issue and resolution with 1.20.11 on orangebox maas. same s/br0/lxcbr0 fix for the container on machine 0. we tried again and got a slightly different issue on machine 2's container

Forensenic

status output -> https://pastebin.canonical.com/119751/
failed container log output -> https://pastebin.canonical.com/119742/
juju machine agent log -> https://pastebin.canonical.com/119746/
cloudinit log -> https://pastebin.canonical.com/119748/
ifconfig -> https://pastebin.canonical.com/119752/
container config -> https://pastebin.canonical.com/119753/

oddly on machine 0 the container had br0 for the bridge, its also failing on machine 2 but the lxc container appears to be targeted towards the correct bridge.

Kapil Thangavelu (hazmat) wrote :

on machine 0 (orangebox kvm maas instance)

 the ifconfig output is https://pastebin.canonical.com/119756/
the lxc conf is https://pastebin.canonical.com/119757/

the container there also failed to start.

Kevin Metz (pertinent) wrote :

Using 14.04, MAAS in a KVM, and Juju in a KVM. Same issue that lxe container launches with br0 instead of lxcvr0

juju status -> https://pastebin.canonical.com/119755/
Unable to find failed container log output
juju machine agent log -> https://pastebin.canonical.com/119746/
cloudinit log -> https://pastebin.canonical.com/119748/
ifconfig -> https://pastebin.canonical.com/119752/
container config -> https://pastebin.canonical.com/119753/

Same issue, lxc container does not launch

Kevin Metz (pertinent) wrote :

I've found a workaround for the issue to let me launch lxc containers via juju

After the first container fails to launch, ssh to the hypervisor
edit /var/lib/lxc/juju-trusty-lxc-template/config
change br0 -> lxcbr0

After editing the template config all subsequent lxc containers will launch via juju

Dimiter Naydenov (dimitern) wrote :

With this change that landed on juju-core trunk: https://github.com/juju/juju/pull/1046 and was also backported to the 1.21 and 1.20 branches, the issues with LXC/KVM containers that got stuck in "pending" state or failed to start due to incorrect networking (bridge name, interface to use, etc.) configuration should be fixed. The bridge name is now "juju-br0" for both LXC and KVM containers on MAAS.

Changed in juju-core (Ubuntu Trusty):
status: Confirmed → Fix Committed
assignee: nobody → Dimiter Naydenov (dimitern)
Mark W Wenning (mwenning) wrote :

I'm seeing the same error on my hardware MAAS cluster with the latest maas stable and juju stable . Failing bridge on the bootstrap node is called "juju-br0". I will post more info on Monday.

Mark Dickie (mark-dickie) wrote :

Same as comment #44 I'm running hardware MAAS and I get this on every deployment. If I let cloud-init error out I can get logged in and juju-br0 is up by that point but it doesn't come up until after cloud-init tries to do it's stuff. This makes the nodes I deploy unusable.

I'm going to see if I can come up with a fix and I'll post here if I do.

Dimiter Naydenov (dimitern) wrote :

@mark-dickie Can you please try to reproduce this with the latest juju-core built from source (master branch) to see if you still have an issue? If you do, please attach the following logs to help us analyze the issue better:

Before anything, please add "logging-config: <root>=TRACE" to your maas environment config in ~/environments.yaml.
The result of running $ juju bootstrap --upload-tools --debug &> ~/maas-bootstrap.log (you mentioned you're using the latest

From the bootstrap node:
/var/log/cloud-init.log
/var/log/cloud-init-output.log
/var/log/juju/machine-0.log
/var/log/juju/machine-0-lxc-0.log

Dimiter Naydenov (dimitern) wrote :

@mwenning - What's the output of $ juju version? Can you try what I've proposed in comment #46?

Mark Dickie (mark-dickie) wrote :

@dimitern Thanks for the swift reply. I'll do this now, for your information I'm currently running juju 1.21-beta3-trusty-amd64 from the juju-proposed ppa.

Mark Dickie (mark-dickie) wrote :

Also I've found that if I edit /etc/init/networking.conf and change the starts on line to:-

starts on stopped cloud-init-local

Then everything works fine although I realise that this is a very poor solution. Best I could manage with my limited upstart knowledge.

Dimiter Naydenov (dimitern) wrote :

@mark-dickie - I can't see 1.21-beta3 in https://launchpad.net/~juju/+archive/ubuntu/proposed are you sure you didn't mean https://launchpad.net/~juju/+archive/ubuntu/devel ?

Thanks for sharing the workaround - at least it gives me something to look into! :)

Mark Dickie (mark-dickie) wrote :

@dimitern

Quite right, it is the devel branch I'm using. I was trying to find a new version which might already have a fix. I'm not very up on building software based on go. I'm getting the following when I try to run "go install -v launchpad.net/juju-core/..."

go build launchpad.net/juju-core/...
# launchpad.net/juju-core/testing/filetesting
testing/filetesting/filetesting.go:194: cannot use checkers.Satisfies (type check.Checker) as type gocheck.Checker in function argument:
        check.Checker does not implement gocheck.Checker (wrong type for Info method)
                have Info() *check.CheckerInfo
                want Info() *gocheck.CheckerInfo
# launchpad.net/juju-core/worker/uniter/charm
worker/uniter/charm/bundles.go:52: cannot use &err (type *error) as type error in function argument:
        *error is pointer to interface, not interface
# launchpad.net/juju-core/utils/ssh
utils/ssh/ssh_gocrypto.go:84: undefined: ssh.ClientConn
make: *** [build] Error 2

Dimiter Naydenov (dimitern) wrote :

@mark-dickie: Ok, 1.21-beta3 actually should have the fix - no need to build from source. Can you:

1. Edit your ~/.juju/environments.yaml to include "logging-config: <root>=TRACE" for your maas environment.
2. Run $ juju bootstrap --debug &> ~/maas-bootstrap.log
3. Once the machine is up and you can SSH into it (via juju ssh 0 or otherwise), get /var/log/juju/*.log and /var/log/cloud*.log, zip them and attach them to the bug please.

Of course, provided you can reproduce the issue still.

Thanks!

Mark Dickie (mark-dickie) wrote :

Here is the bootstrap log created when I run "juju bootstrap --to db2.ed1.mavennetwork.co.uk --debug &> ~/maas-bootstrap.log"

Mark Dickie (mark-dickie) wrote :

And these are the juju and cloud logs from the node which was bootstrapped. This node is working fine right now but once it is rebooted I will get the cloud-init problem. I'll reboot it now and then post the logs after the problematic boot.

Mark Dickie (mark-dickie) wrote :

And lastly the logs from the node after it is rebooted. On this boot it takes a very long time to come up as cloud-init waits for the network bridge to come up and then eventually errors out.

Dimiter Naydenov (dimitern) wrote :

Thanks for the logs! None of them show anything alarming.
However something is wrong with your setup. How did you reboot the instance after the bootstrap was done? Once MAAS provisions a machine cloud-init should not run anymore - it's only used to boot the initial OS. Did you recommission or deallocate the node from MAAS?

Another request - since the initial bootstrap completes ok, can you please retry the same and attach the contents of /etc/network/interfaces and /etc/network/interfaces.d/*.* ? If the problem happens after reboot something might be wrong with how the interfaces are configured there.

Mark Dickie (mark-dickie) wrote :

I simply issued "sudo reboot" and this happened. So I assume then that I have a different issue from the original reporter? Should I open a separate bug and stop polluting this one?

In the meantime I'll pull the requested config files.

Mark Dickie (mark-dickie) wrote :

/etc/network/interfaces

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0

iface eth0 inet manual

auto juju-br0
iface juju-br0 inet dhcp
    bridge_ports eth0

/etc/network/interfaces.d is empty

Dimiter Naydenov (dimitern) wrote :

Yes, it appears you're having a separate issue. I'd appreciate if you open a new bug for it and we can continue there.

Alexander Gabert (alzxander) wrote :

hits me when doing an openstack autopilot install with MAAS and juju trying to work eth0 into the juju-br0 bridge.

but i have p1p1.

Alexander Gabert (alzxander) wrote :

i am using Ubuntu 14.10 LDS (landscape local server) and 14.04 as target on the machines bootstrapped by a 14.04 MAAS.

Dimiter Naydenov (dimitern) wrote :

Which version of juju are you using? What's the content of the lshw xml dump for the instance with the p1p1 NIC? The MAAS provider uses that lshw output generated during the node's commissioning to determine which is the primary NIC on that node, so it can be added into the bridge.

Curtis Hovey (sinzui) wrote :

This issue was fixed when trusty-updates got 1.22.6.

Changed in juju-core (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers