Juju-2.2 does not create interfaces in an LXD for all spaces

Bug #1698443 reported by David Lawson
42
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Witold Krecicki
2.2
Fix Released
Critical
Witold Krecicki

Bug Description

I have a juju environment with metal in multiple spaces. LXDs deployed to those metal hosts appear to only get a single interface on a single space, regardless of what space bindings are requested. The host machine does have bridges created for all spaces, but only a single bridge is attached to by the LXD.

$ juju spaces
Space Subnets
management 10.189.128.0/20
openstack 10.189.144.0/20
            10.189.160.0/20

The LXD was deployed with:

$ juju deploy ./neutron-gateway --to lxd:1 --bind "management data=openstack"

The network config for the host: https://pastebin.canonical.com/191170/

Network config for the LXD: https://pastebin.canonical.com/191171/

Is there some prerequisite for the metal to allow LXDs on it to have multiple interfaces? I'm deploying the metal without space bindings because its interfaces are managed by MaaS and are going to exist in both spaces regardless. This may be related to lp:1681902 if that hasn't been fixed, it seems like it may display this kind of behaviour.

David Lawson (deej)
description: updated
Revision history for this message
David Lawson (deej) wrote :

It's possible this is a regression from 2.1? One of the bootstack guys reports that he has it worked with juju 2.1 and a config that is effectively identical to mine.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

FYI this seems to be a juju 2.2 regression, I had exactly
the same issue, downgrading to juju 2.1 let the LXDs get
1 interface per requested bind, my juju-deployer excerpt
for this service:

    ceph-mon:
      charm: ceph-mon
      num_units: 3
      to: [ "lxc:storage=0", "lxc:storage=1", "lxc:storage=2" ]
      bindings:
        public: 02-ceph-access-space
        cluster: 03-ceph-replica-space
        "": 00-mgmt-space
      options:
        monitor-count: 3
        fsid: [...]

* FTR w/juju-2.1 ends deploying OK:

- lxc list
| juju-d32283-17-lxd-0 | RUNNING | 172.40.0.7 (eth2) | | PERSISTENT | 0 |
| | | 172.35.1.4 (eth1) | | | |
| | | 172.31.1.143 (eth0) | | | |

- machine-17.log:
2017-06-16 22:11:43 INFO juju.provisioner provisioner_task.go:782 started machine 17/lxd/0 as instance juju-d32283-17-lxd-0 with hardware <nil>, network config [{DeviceIndex:0 MACAddress:00:16:3e:xx:xx:xx CIDR:172.31.0.0/16 MTU:1500 ProviderId:7349 ProviderSubnetId:5 ProviderSpaceId: ProviderAddressId:20326 ProviderVLANId:5027 VLANTag:0 InterfaceName:eth0 ParentInterfaceName:br-bond0 InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:172.31.1.143 DNSServers:[172.30.60.65 172.31.1.2] DNSSearchDomains:[maas] GatewayAddress:172.31.1.2 Routes:[]} {DeviceIndex:0 MACAddress:00:16:3e:xx:xx:xx CIDR:172.35.0.0/16 MTU:9000 ProviderId:7350 ProviderSubnetId:7 ProviderSpaceId: ProviderAddressId:20328 ProviderVLANId:5038 VLANTag:0 InterfaceName:eth1 ParentInterfaceName:br-bond2 InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:172.35.1.4 DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[]} {DeviceIndex:0 MACAddress:00:16:3e:xx:xx:xx CIDR:172.40.0.0/16 MTU:9000 ProviderId:7351 ProviderSubnetId:8 ProviderSpaceId: ProviderAddressId:20330 ProviderVLANId:5039 VLANTag:600 InterfaceName:eth2 ParentInterfaceName:br-bond2.600 InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:172.40.0.7 DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[]}], volumes [], volume attachments map[], subnets to zones map[]

* while juju-2.2 does:

- lxc list:
| juju-17dbcf-17-lxd-3 | RUNNING | 172.31.1.114 (eth0) | | PERSISTENT | 0 |

- machine-17.log:
2017-06-16 20:45:19 INFO juju.provisioner provisioner_task.go:782 started machine 17/lxd/3 as instance juju-17dbcf-17-lxd-3 with hardware <nil>, network config [{DeviceIndex:0 MACAddress:00:16:3e:xx:xx:xx CIDR:172.31.0.0/16 MTU:1500 ProviderId:7345 ProviderSubnetId:5 ProviderSpaceId: ProviderAddressId:20232 ProviderVLANId:5027 VLANTag:0 InterfaceName:eth0 ParentInterfaceName:br-bond0 InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:172.31.1.117 DNSServers:[172.30.60.65 172.31.1.2] DNSSearchDomains:[maas] GatewayAddress:172.31.1.2 Routes:[]}], volumes [], volume attachments map[], subnets to zones map[]

- it's worth noting that juju *does* create the needed bridges
  to support the extra LXDs VIFs on respective host ifaces

tags: added: canonical-bootstack
JuanJo Ciarlante (jjo)
summary: - Juju does not create interfaces in an LXD for all spaces
+ Juju-2.2 does not create interfaces in an LXD for all spaces
Witold Krecicki (wpk)
Changed in juju:
assignee: nobody → Witold Krecicki (wpk)
tags: added: 4010
Witold Krecicki (wpk)
Changed in juju:
status: New → In Progress
importance: Undecided → High
John A Meinel (jameinel)
Changed in juju:
importance: High → Critical
milestone: none → 2.3-alpha1
tags: added: containers eda maas-provider network
Revision history for this message
Witold Krecicki (wpk) wrote :
Revision history for this message
Witold Krecicki (wpk) wrote :
tags: added: cpe cpe-sa
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Just bumped into this one while testing our OpenStack bundle with spaces. How come this wasn't caught before releasing? Are deployments with spaces such as for OpenStack not tested?

Revision history for this message
Ante Karamatić (ivoks) wrote :

Hm, I've deployed more than 20 OpenStacks during the alpha, beta and GA state of 2.2. I have not hit this issue.

I notice you used juju-deployer. I always deployed only with juju (directly or with bundles).

Revision history for this message
Witold Krecicki (wpk) wrote : Re: [Bug 1698443] Re: Juju-2.2 does not create interfaces in an LXD for all spaces

This bug was introduced in april

23.06.2017 1:52 PM "Ante Karamatić" <email address hidden>
napisał(a):

> Hm, I've deployed more than 20 OpenStacks during the alpha, beta and GA
> state of 2.2. I have not hit this issue.
>
> I notice you used juju-deployer. I always deployed only with juju
> (directly or with bundles).
>
> --
> You received this bug notification because you are a bug assignee.
> Matching subscriptions: juju-bugs
> https://bugs.launchpad.net/bugs/1698443
>
> Title:
> Juju-2.2 does not create interfaces in an LXD for all spaces
>
> Status in juju:
> In Progress
> Status in juju 2.2 series:
> Fix Released
>
> Bug description:
> I have a juju environment with metal in multiple spaces. LXDs
> deployed to those metal hosts appear to only get a single interface on
> a single space, regardless of what space bindings are requested. The
> host machine does have bridges created for all spaces, but only a
> single bridge is attached to by the LXD.
>
> $ juju spaces
> Space Subnets
> management 10.189.128.0/20
> openstack 10.189.144.0/20
> 10.189.160.0/20
>
> The LXD was deployed with:
>
> $ juju deploy ./neutron-gateway --to lxd:1 --bind "management
> data=openstack"
>
> The network config for the host:
> https://pastebin.canonical.com/191170/
>
> Network config for the LXD: https://pastebin.canonical.com/191171/
>
> Is there some prerequisite for the metal to allow LXDs on it to have
> multiple interfaces? I'm deploying the metal without space bindings
> because its interfaces are managed by MaaS and are going to exist in
> both spaces regardless. This may be related to lp:1681902 if that
> hasn't been fixed, it seems like it may display this kind of
> behaviour.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1698443/+subscriptions
>
> Launchpad-Notification-Type: bug
> Launchpad-Bug: product=juju; milestone=2.3-alpha1; status=In Progress;
> importance=Critical; <email address hidden>;
> Launchpad-Bug: product=juju; productseries=2.2; milestone=2.2.1;
> status=Fix Released; importance=Critical; assignee=witold.krecicki@
> canonical.com;
> Launchpad-Bug-Tags: 4010 canonical-bootstack containers cpe cpe-sa eda
> maas-provider network
> Launchpad-Bug-Information-Type: Public
> Launchpad-Bug-Private: no
> Launchpad-Bug-Security-Vulnerability: no
> Launchpad-Bug-Commenters: deej ivoks jjo szeestraten wpk
> Launchpad-Bug-Reporter: David Lawson (deej)
> Launchpad-Bug-Modifier: Ante Karamatić (ivoks)
> Launchpad-Message-Rationale: Assignee
> Launchpad-Message-For: wpk
> Launchpad-Subscription: juju-bugs
>

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I have one environment where the same situation is reproducible even on 2.1.2 or 2.1.3.

I will try 2.2.1 and see if it works with 2.2.1 once I am able to VPN into it.

The provisioner logs with v.2.1.2 from a machine log are as follows:

2017-06-22 22:29:07 INFO juju.provisioner provisioner_task.go:406 found machine pending provisioning id:0/lxd/0, details:0/lxd/0
2017-06-22 22:29:07 INFO juju.provisioner provisioner_task.go:249 provisioner-harvest-mode is set to destroyed; unknown instances not stopped []
2017-06-22 22:31:29 INFO juju.network bridge.go:70 bridgescript command=/usr/bin/python2 - --interfaces-to-bridge="enp4s0f0 enp4s0f0.2730" --activate --bridge-prefix=br- --reconfigure-delay=0 /etc/network/interfaces <<'EOF'
<script-redacted>
EOF
2017-06-22 22:31:34 INFO juju.network bridge.go:75 bridgescript result=0, timeout=false
2017-06-22 22:31:35 WARNING juju.provisioner provisioner_task.go:739 failed to start instance (unable to find host bridge for space(s) "public-space" for container "0/lxd/0"), retrying in 10s (3 more attempts)

It complains about the lack of a bridge related to the public space although enp4s0f0.2730 interface with the respective bridge is present on a node and vlan 2730 is associated with the public-space.

---

If it still does not work for me there, I will take a look at the postgres db for MAAS and try to find out what's the reason for it: juju itself does not store provider-specific state more than needed at a given moment so IDs might be messed up in MAAS as I have changed vlan <-> space associations before on this installation. Those are my unconfirmed assumptions anyway.

Revision history for this message
John A Meinel (jameinel) wrote :

I'm pretty sure that is "bridge names too long" which is a different bug,
also fixed (I believe) in 2.2.

https://bugs.launchpad.net/juju/+bug/1672327

The issue is that br-enp4s0f0.2730 was the default naming scheme in 2.1
which ends up being over the 15 character maximum for device names. Which
means we would *try* to create a bridge, but then fail, and then there
would be no bridge to create the container. That should have been fixed for
2.2.beta4.

I'm pretty sure 2.2 doesn't suffer from "bridge names too long", but might
instead be failing because of this bug. (Which should be fixed in 2.2.1).

On Fri, Jun 23, 2017 at 4:36 PM, Dmitrii Shcherbakov <
<email address hidden>> wrote:

> I have one environment where the same situation is reproducible even on
> 2.1.2 or 2.1.3.
>
> I will try 2.2.1 and see if it works with 2.2.1 once I am able to VPN
> into it.
>
> The provisioner logs with v.2.1.2 from a machine log are as follows:
>
> 2017-06-22 22:29:07 INFO juju.provisioner provisioner_task.go:406 found
> machine pending provisioning id:0/lxd/0, details:0/lxd/0
> 2017-06-22 22:29:07 INFO juju.provisioner provisioner_task.go:249
> provisioner-harvest-mode is set to destroyed; unknown instances not stopped
> []
> 2017-06-22 22:31:29 INFO juju.network bridge.go:70 bridgescript
> command=/usr/bin/python2 - --interfaces-to-bridge="enp4s0f0
> enp4s0f0.2730" --activate --bridge-prefix=br- --reconfigure-delay=0
> /etc/network/interfaces <<'EOF'
> <script-redacted>
> EOF
> 2017-06-22 22:31:34 INFO juju.network bridge.go:75 bridgescript result=0,
> timeout=false
> 2017-06-22 22:31:35 WARNING juju.provisioner provisioner_task.go:739
> failed to start instance (unable to find host bridge for space(s)
> "public-space" for container "0/lxd/0"), retrying in 10s (3 more attempts)
>
> It complains about the lack of a bridge related to the public space
> although enp4s0f0.2730 interface with the respective bridge is present
> on a node and vlan 2730 is associated with the public-space.
>
> ---
>
> If it still does not work for me there, I will take a look at the
> postgres db for MAAS and try to find out what's the reason for it: juju
> itself does not store provider-specific state more than needed at a
> given moment so IDs might be messed up in MAAS as I have changed vlan
> <-> space associations before on this installation. Those are my
> unconfirmed assumptions anyway.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1698443
>
> Title:
> Juju-2.2 does not create interfaces in an LXD for all spaces
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1698443/+subscriptions
>

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

John,

Right, I remember that - should have counted before trying - thanks for the reminder!

echo -n 'br-enp4s0f0.2730' | wc -c
16

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Manual verification in case anybody is looking for it:

https://paste.ubuntu.com/24934816/

Witold Krecicki (wpk)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Ben (bjenkins-x) wrote :

I am not sure if my problem is a variation of this bug. I am using JUJU 2.2.1 and MAAS 2.2.1.

If I deploy a physical machine with MAAS using 'juju add-machine server-01' and then try to add a charm to that machine using LXD with 'juju deploy --config config.yaml charm --to lxd:0 --bind "service=space"' the container will start but the physical server will not get the new bridges needed to fully launch the container. If you go to e/n/i on the physical server and move interfaces.new to interfaces and reboot everything will begin to work just fine and charm will complete the deploy. I cannot find a clever way to force the bridges to come up without a reboot and if you do not manually move the interfaces.new file then the bridges will never come up. Have tried with many types of physical servers. None of my NICs have names longer that 15 characters but the symptoms listed here are exactly what I see (no interfaces for the containers). I also get an interface cannot be found error in juju status. JUJU 2.1 did not have this issue. Hope this helps, if not I will open a new report.

Revision history for this message
Witold Krecicki (wpk) wrote :

@bjenkins-x Could you please deploy the application and, with e/n/interfaces still intact, do ifdown device_to_be_bridged and paste the output of this command? The bridger script will fail if any step fails, and doing ifdown on the 'to-be-bridged' device is the last step before moving e/n/interfaces.new to e/n/interfaces .

Revision history for this message
Ben (bjenkins-x) wrote :

ubuntu@controller01:/etc/network$ sudo ifdown enp2s0f0.11
Removed VLAN -:enp2s0f0.11:-
Cannot find device "enp2s0f0.11"

Here is /etc/network/interfaces section for the VLAN
auto enp2s0f0.11
iface enp2s0f0.11 inet static
    address 10.11.0.3/16
    vlan-raw-device enp2s0f0
    mtu 9000
    vlan_id 11

iface enp2s0f0.11 inet6 static
    address xxxx:xxxx:xxxx:3:0:1:0:1/64

After the physical server is pushed and a charm is pushed you will get 2 routes per subnet

ubuntu@controller01:~$ route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 10.1.0.1 0.0.0.0 UG 0 0 0 br-enp2s0f0
10.0.225.0 * 255.255.255.0 U 0 0 0 lxdbr0
10.1.0.0 * 255.255.0.0 U 0 0 0 br-enp2s0f0
10.1.0.0 * 255.255.0.0 U 0 0 0 enp2s0f0
10.11.0.0 * 255.255.0.0 U 0 0 0 br-enp2s0f0.11
10.11.0.0 * 255.255.0.0 U 0 0 0 enp2s0f0.11

and the juju status error

1/lxd/0 down pending xenial failed to bridge devices: bridge activaction error: bridge activation failed: ifdown:
interface enp2s0f0 not configured
Cannot find device "enp2s0f0.11"
/etc/network/if-pre-up.d/mtuipv6: line 9: /sys/class/net/enp2s0f0.11/mtu: No such file or directory
/etc/network/if-pre-up.d/mtuipv6: line 10: /proc/sys/net/ipv6/conf/enp2s0f0.11/mtu: No such file or directory
ifup: recursion detected for interface enp2s0f0 in parent-lock phase
ifup: waiting for lock on /run/network/ifstate.enp2s0f0
RTNETLINK answers: File exists
Bringing up bridged interfaces failed, see system logs and /etc/network/interfaces.new
RTNETLINK answers: File exists
/etc/network/if-pre-up.d/mtuipv6: line 9: /sys/class/net/enp2s0f0.11/mtu: No such file or directory
/etc/network/if-pre-up.d/mtuipv6: line 10: /proc/sys/net/ipv6/conf/enp2s0f0.11/mtu: No such file or directory
ifup: recursion detected for interface enp2s0f0 in parent-lock phase
ifup: waiting for lock on /run/network/ifstate.enp2s0f0

Revision history for this message
Witold Krecicki (wpk) wrote :

What series of Ubuntu is this machine running?
Could you deploy this machine using MAAS directly and there check if ifdown works? We suspect a bug in ifupdown scripts, but we need to confirm it.

Revision history for this message
Ben (bjenkins-x) wrote :

It is running 16.04 LTS. I did a fresh MAAS deploy and ran ifdown again

ubuntu@server01:/etc/network$ sudo ifdown enp2s0f0.11
Removed VLAN -:enp2s0f0.11:-
Cannot find device "enp2s0f0.11"

After the ifdown command the interface is dropped even though it shows that it cannot find the interface. ifup brings it back up but with errors.

ubuntu@server01:/etc/network$ sudo ifup enp2s0f0.11
/etc/network/if-pre-up.d/mtuipv6: line 9: /sys/class/net/enp2s0f0.11/mtu: No such file or directory
/etc/network/if-pre-up.d/mtuipv6: line 10: /proc/sys/net/ipv6/conf/enp2s0f0.11/mtu: No such file or directory
Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
ifup: recursion detected for interface enp2s0f0 in parent-lock phase
Added VLAN with VID == 11 to IF -:enp2s0f0:-
Waiting for DAD... Done

Sorry this may be a MAAS problem. I will redeploy without IPv6 and see if I get the same errors.

Revision history for this message
Ben (bjenkins-x) wrote :

Opened bug #1703689 with MAAS team. Without IPv6 everything works fine with charms. Sorry to hijack this bug.

Revision history for this message
Subhranshu Dwivedi (subhranshu) wrote :

Seems like dist-upgrade resolves this issue.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.