juju failed to deploy lxd when making bridge

Bug #1704376 reported by Seyeong Kim
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
High
Unassigned

Bug Description

This issue is affected from juju 2.2

When juju deploy charm to lxd on host, they are trying to bridge NIC.

if there is /etc/network/interfaces.d/eth0.cfg with below contents

pre-up [ -z `cat /proc/net/bonding/* | grep Slave | grep eth0` ] && true

deploying is failed with error msgs[1].

tested if we remove that content and deploy lxd, it works.

code tracing is here.
https://1drv.ms/u/s!AmUioEoZTo2Lg_p2YOJKaaUAXm0o2g

This issue introduced from this commit [4]

#####################
[1]
#####################
WARNING juju.provisioner provisioner_task.go:739 failed to start instance (failed to bridge devices: bridge activaction error: misplaced option), retrying in 10s (7 more attempts)
2017-07-13 12:20:37 TRACE juju.rpc.jsoncodec codec.go:225 -> {"request-id":2930,"type":"Provisioner","version":3,"request":"SetInstanceStatus","params":{"entities":[{"tag":"machine-2-lxd-1","status":"allocating","info":"failed to start instance (failed to bridge devices: bridge activaction error: misplaced option), retrying in 10s (7 more attempts)","data":null}]}}

#####################
[2]
#####################
https://pastebin.canonical.com/193850/

#####################
[3]
#####################
https://pastebin.canonical.com/193848/

#####################
[4]
#####################
commit 56461ed70ab337b4934642bd7bd3545869c8ac3c
Author: Andrew McDermott <email address hidden>
Date: Sun Dec 11 15:19:06 2016 +0000

Go implementation of bridge script

Seyeong Kim (seyeongkim)
tags: added: sts
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

removing post- pre- manually and retried deploying,

but had same error.

Seyeong Kim (seyeongkim)
summary: - juju failed to parse pre-up, pre-down when making bridge
+ juju failed to deploy lxd when making bridge
description: updated
Seyeong Kim (seyeongkim)
description: updated
Seyeong Kim (seyeongkim)
description: updated
Seyeong Kim (seyeongkim)
description: updated
Changed in juju:
status: New → Triaged
importance: Undecided → High
tags: added: bridge network
Revision history for this message
John A Meinel (jameinel) wrote :

I don't think you're supposed to have just a bare "pre-up" statement such as that, because then it isn't clear what device it is supposed to be associated with (it starts to depend on sort order, import order, etc.)

In your example here:
  https://pastebin.canonical.com/193850/

What device are you trying to associate those with? Are all of those supposed to be tied to ens8? (If they are, then when we bridge ens8 we should then be trying to apply all of those pieces to br-ens8.)

It does sound like we could try to add a bit more context around what lines are failing. (misplaced option doesn't tell you what the wrong option was, or how to fix it.)

Revision history for this message
John A Meinel (jameinel) wrote :

BTW, looking at the raw pastebins, I noticed something funny:
https://pastebin.canonical.com/193850/plain/

If you have the wrong character encoding, those show up as:
iface lo inet loopback
    dns-nameservers 8.8.8.8
    dns-search maas

I wonder if the actual problem is that your editor was inserting characters that aren't plain space (0x20) or plain tab (0x09).
From the looks of it, those are:
u'\xa0'
http://www.utf8-chartable.de/unicode-utf8-table.pl?start=128&number=128&utf8=string-literal&unicodeinhtml=hex

That is "Unicode Non-breaking Space", which is probably *not* accepted in a eth0.cfg.
Certainly it isn't a character that we had been supporting parsing. I'm a bit surprised that the configuration was working. (It could be that the python parser is doing a unicode regex \s and that handles non-ascii characters ok.)

Can you just replace the '\xa0' non-breaking-space characters with plain '\x20' spaces?
I don't know how that file is getting written. I'm guessing its some sort of generated template, and you ended up with '\xa0' in your template file.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I don't think it's encoding issue.

I tried to removed contents on /etc/network/interfaces.d/eth0.cfg

pre-up [ -z `cat /proc/net/bonding/* | grep Slave | grep eth0` ] && true

and it worked.

I couldn't find any clue which part insert this line so far.

but this line seems key so far.

Thanks

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

ah sorry if i made confusion.

I attached /etc/network/interfaces as pastebin,

and /etc/network/interfaces.d/eth0.cfg has only one line

pre-up [ -z `cat /proc/net/bonding/* | grep Slave | grep eth0` ] && true

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

https://pastebin.canonical.com/194078/plain/

this is new attached one.

I passed this string this machine to that, so it seems there was problem between them.

Changed in juju:
status: Incomplete → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :
Download full text (3.7 KiB)

My first point was that a bare line in interfaces.d/eth0.cfg without it having an associated stanza is improper configuration.
A "pre-up" line is done in association with a device. (When bringing up a device, run this script before you finish bringing it up.)

Otherwise, *when do you want that script to run* ?

Something like:
iface eth0 inet dhcp
    pre-up /usr/local/sbin/firewall $IFACE

Tells us that "when you bring up eth0, run this script"

Now, it may be that the way the actual /e/n/i parser runs, means that whenever it hits an include statement, it just continues its current parsing, so you could get this behavior with:
iface eth0 inet dhcp
  source interfaces.d/eth0.cfg

However, you're actually just relying on the "source interfaces.d/*" line.
Which means that maybe it gets associated with a device you want, but as soon as anything changes in that file, it may not get run at the right time.

Given the content of: https://pastebin.canonical.com/194078/

I think you're missing how pre-up and post-up are meant to be used. Specifically, you have lots of routes being added, but you aren't grouping the routes-being-added with the device-that-will-carry those routes.

For example:
auto ens3
iface ens3 inet static
    address 10.2.1.146/23
    gateway 10.2.0.88
    dns-nameservers 90.147.165.82
    mtu 1500
...
auto ens8
iface ens8 inet static
    address 10.2.4.166/24
    dns-nameservers 10.3.4.210
    mtu 1500
post-up route add -net 10.3.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
pre-down route del -net 10.3.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
post-up route add -net 10.4.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
pre-down route del -net 10.4.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
post-up route add -net 10.3.4.0 netmask 255.255.255.0 gw 10.2.4.1 metric 0 || true
pre-down route del -net 10.3.4.0 netmask 255.255.255.0 gw 10.2.4.1 metric 0 || true

That specifically says "when I ifup ens3" run *no* scripts. But when I "if up ens8" add routes for 10.3/16 via 10.2.0.1 and 10.4 via 10.2.0.1 and 10.3.4.0 via 10.2.4.1.

However, if you did:
 ifdown ens3
 ifup ens8
You would end up fairly broken, because now you have routes that say "if you want to get to 10.3.0.0/16 you should talk to 10.2.0.1", but the most likely device they would have wanted to use to *talk* to 10.2.0.1 was ens3 which is currently down.

More likely, what you actually want is:

auto ens3
iface ens3 inet static
    address 10.2.1.146/23
    gateway 10.2.0.88
    dns-nameservers 90.147.165.82
    mtu 1500
    post-up route add -net 10.3.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
    pre-down route del -net 10.3.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
    post-up route add -net 10.4.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true
    pre-down route del -net 10.4.0.0 netmask 255.255.254.0 gw 10.2.0.1 metric 0 || true

auto ens8
iface ens8 inet static
    address 10.2.4.166/24
    dns-nameservers 10.3.4.210
    mtu 1500
    post-up route add -net 10.3.4.0 netmask 255.255.255.0 gw 10.2.4.1 metric 0 || true
    pre-down route del -net 10.3.4.0 netmask 255.255.255.0 gw 10.2.4.1 metric 0 || tru...

Read more...

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

Actually I didn't touch configuration file because MAAS did it.

Should I report this to MAAS instead of Juju?

but juju 2.1 is fine. this issue is from juju 2.2

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1704376] Re: juju failed to deploy lxd when making bridge

The parser in 2.1 completely ignores all includes. We added support for
reading sourced files, which is why we're seeing it now.

On Fri, Jul 21, 2017 at 12:33 PM, Seyeong Kim <email address hidden>
wrote:

> Actually I didn't touch configuration file because MAAS did it.
>
> Should I report this to MAAS instead of Juju?
>
> but juju 2.1 is fine. this issue is from juju 2.2
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1704376
>
> Title:
> juju failed to deploy lxd when making bridge
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1704376/+subscriptions
>

Changed in juju:
status: Confirmed → Triaged
Revision history for this message
Seyeong Kim (seyeongkim) wrote :

I confirmed that the contents of file /etc/network/interfaces.d/eth0.cfg

pre-up [ -z `cat /proc/net/bonding/* | grep Slave | grep eth0` ] && true

is manually set by preseed.

but still, 2.2 need to handle this or ignore this as 2.1 did.

Please give me your opinion.

Thanks.

Revision history for this message
Witold Krecicki (wpk) wrote :

@xtrusia could you paste the complete e/n/i setup as outputted by
grep "" -r /etc/network/interfaces{,.d/}

?

Revision history for this message
Seyeong Kim (seyeongkim) wrote :

hey wpk,

pre-up [ -z `cat /proc/net/bonding/* | grep Slave | grep eth0` ] && true

this is all

Revision history for this message
Tim Penhey (thumper) wrote :

@xtrusia what exactly is this line supposed to do?

This line is not appear to be attached to any particular interface, what are your expectations of this line?

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Biju Mon V (vbijumon) wrote :

i was trying to conjure-up I am Also facing the same issue.

Error from machine 0 :
Location:

/var/log/juju$ sudo cat machine-0.log

2017-08-31 14:31:36 WARNING juju.provisioner provisioner_task.go:739 failed to start instance (failed to bridge devices: bridge activaction error: bridge activation failed: Killed old client process
2017-08-31 14:31:49 WARNING juju.provisioner provisioner_task.go:739 failed to start instance (failed to bridge devices: bridge activaction error: bridge activation failed: Killed old client process
2017-08-31 14:32:01 ERROR juju.provisioner provisioner_task.go:707 cannot start instance for machine "0/lxd/2": failed to bridge devices: bridge activaction error: bridge activation failed: Killed old client process

juju status error:

0/lxd/3 down pending xenial failed to bridge devices: bridge activaction error: bridge activation failed: Killed old client process
Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/ens2f0/0c:c4:7a:19:69:5c
Sending on LPF/ens2f0/0c:c4:7a:19:69:5c
Sending on Socket/fallback
DHCPRELEASE on ens2f0 to 10.205.76.4 port 67 (xid=0xcf0222c)
RTNETLINK answers: Operation not permitted
RTNETLINK answers: Operation not permitted
Bringing up bridged interfaces failed, see system logs and /etc/network/interfaces.new
Internet Systems Consortium DHCP Client 4.3.3
Copyright 2004-2015 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/ens2f0/0c:c4:7a:19:69:5c
Sending on LPF/ens2f0/0c:c4:7a:19:69:5c
Sending on Socket/fallback
DHCPDISCOVER on ens2f0 to 255.255.255.255 port 67 interval 3 (xid=0xddb8be72)
DHCPREQUEST of 10.205.76.6 on ens2f0 to 255.255.255.255 port 67 (xid=0x72beb8dd)
DHCPOFFER of 10.205.76.6 from 10.205.76.4
DHCPACK of 10.205.76.6 from 10.205.76.4
bound to 10.205.76.6 -- renewal in 286 seconds.
RTNETLINK answers: Operation not permitted
RTNETLINK answers: Operation not permitted

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.