VLANs on an unconfigured parent device error with "cannot set link-layer device addresses of machine "0": invalid address

Bug #1566791 reported by Andrew McDermott
20
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Dimiter Naydenov

Bug Description

I have the following /e/n/i:

ubuntu@maas19-node4:/var/log/juju$ cat /etc/network/interfaces
auto lo
iface lo inet loopback
    dns-nameservers 10.17.20.200
    dns-search maas19

iface eth0 inet manual

auto br-eth0
iface br-eth0 inet static
    address 10.17.20.212/24
    gateway 10.17.20.1
    mtu 1500
    bridge_ports eth0
    bridge_stp off
    bridge_maxwait 0
    dns-nameservers 10.17.20.200

auto eth1
iface eth1 inet manual
    mtu 1500

auto eth2
iface eth2 inet manual
    mtu 1500

auto eth3
iface eth3 inet manual
    mtu 1500

iface eth1.16 inet manual
    address 10.245.184.100/24
    mtu 1500
    vlan_id 16
    vlan-raw-device eth1

auto br-eth1.16
iface br-eth1.16 inet static
    address 10.245.184.100/24
    mtu 1500
    bridge_ports eth1.16
    bridge_stp off
    bridge_maxwait 0

iface eth1.17 inet manual
    address 10.245.187.100/24
    mtu 1500
    vlan_id 17
    vlan-raw-device eth1

auto br-eth1.17
iface br-eth1.17 inet static
    address 10.245.187.100/24
    mtu 1500
    bridge_ports eth1.17
    bridge_stp off
    bridge_maxwait 0

so the VLANS (16 and 17) are on eth1 but eth1 is unconfigured (i.e., "manual"). With this setup I get the following error repeated:

2016-04-06 10:58:39 ERROR juju.worker.dependency engine.go:522 "machiner" manifold worker returned unexpected error: cannot update observed network config: cannot set link-layer device addresses of machine "0": invalid address "10.245.184.100/24": DeviceName "br-eth1.16" on machine "0" not found (not found)

This should be a valid configuration as eth1 is auto so can pass traffic.

Revision history for this message
Andrew McDermott (frobware) wrote :

It's possible to workaround this issue by ensuring that a VLANs' parent device is always configured with an address.

Changed in juju-core:
importance: Undecided → High
Changed in juju-core:
status: New → Triaged
Changed in juju-core:
milestone: none → 2.0-beta4
Changed in juju-core:
milestone: 2.0-beta4 → 2.0.0
Changed in juju-core:
milestone: 2.0.0 → 2.0-beta8
Changed in juju-core:
milestone: 2.0-beta8 → 2.0-beta9
Changed in juju-core:
status: Triaged → In Progress
assignee: nobody → Dimiter Naydenov (dimitern)
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta9 → 2.0-beta10
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta10 → 2.0-beta11
Revision history for this message
Matthew Rees (matthew-rees) wrote :

I am also experiencing this issue using the ppa:maas/next and ppa:juju/devel. It prevents any deployments from targeting machine 0 as the deployment gets stuck waiting for the agent.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

As a workaround until the fix is done, you can ensure all physical interfaces on the node have addresses, esp. if they have VLAN interfaces with addresses on top.

Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta11 → 2.0-beta12
Changed in juju-core:
milestone: 2.0-beta12 → 2.0-beta13
tags: added: 2.0
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta13 → 2.0-beta14
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta14 → 2.0-beta15
Changed in juju-core:
milestone: 2.0-beta15 → 2.0-beta16
affects: juju-core → juju
Changed in juju:
milestone: 2.0-beta16 → none
milestone: none → 2.0-beta16
Changed in juju:
status: In Progress → Triaged
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta16 → 2.0-beta17
Ante Karamatić (ivoks)
tags: added: 4010 cpec
Revision history for this message
Ante Karamatić (ivoks) wrote :

I'm not sure the workaround works. I still end up with a silly message:

2016-08-26 04:26:07 WARNING juju.provisioner lxd-broker.go:62 failed to prepare container "33/lxd/1" network config: cannot get subnet "172.16.0.0/24" used by address "172.16.0.2" of host machine device "br-bond1": subnet "172.16.0.0/24" not found

And I do have an IP on the underlying interface (juju created the bridge and moved the IP):

auto bond1
iface bond1 inet manual
    bond-lacp_rate fast
    bond-xmit_hash_policy layer3+4
    bond-miimon 100
    mtu 1500
    bond-mode 802.3ad
    hwaddress 3c:fd:fe:9d:18:3c
    bond-slaves none

auto br-bond1
iface br-bond1 inet static
    address 172.16.0.2/24
    hwaddress 3c:fd:fe:9d:18:3c
    bridge_ports bond1

And, instead of attaching container to three bridges I have, juju attaches is to lxdbr0 which it just created:

2016-08-26 04:26:07 INFO juju.container.lxd lxd.go:152 instance "juju-75ccbb-33-lxd-1" configured with map[eth0:map[type:nic nictype:bridged name:eth0 parent:lxdbr0]] network devices

IMHO, this should be a Critical bug.

Revision history for this message
Ante Karamatić (ivoks) wrote :

I can confirm with multiple tests. Workaround just turns ERROR into a WARNING. However, containers get attached only to lxdbr0 and not to actual bridges on the system.

Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta17 → 2.0-beta18
Changed in juju:
status: Triaged → In Progress
Revision history for this message
Andrew McDermott (frobware) wrote :

I have modified the bridge script to bridge all interfaces:

  https://github.com/frobware/juju/tree/master-bridge-all-interfaces

This is not the complete fix for this bug; Dimiter is working the rest of bug.

Revision history for this message
Andrew McDermott (frobware) wrote :
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

A couple of PRs, necessary for fixing this bug have been reviewed and landed already:
1. http://reviews.vapour.ws/r/5586/
2. http://reviews.vapour.ws/r/5597/

There will be at least one more prerequisite PR, which I'm currently working on, before the actual fix.

Revision history for this message
Andrew McDermott (frobware) wrote :

My PR in comment #7 should not be merged until the other PRs listed in comment #8 land.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Next step proposed:
3) http://reviews.vapour.ws/r/5598/ should land soon

I have 3 more follow-ups lined up as necessary to land in order to consider the bug fixed.

Changed in juju:
milestone: 2.0-beta18 → 2.0-rc1
Revision history for this message
Dimiter Naydenov (dimitern) wrote :

Last 3 PRs fixing the bug are propose in a stacked branch:
https://github.com/juju/juju/pull/6258

We're currently testing it hard on multiple configurations of MAAS and aim to land it by EOW, if all tests pass OK.

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

gomaasapi updated to allow bridging containers on host nodes' bridges without an address: https://github.com/juju/gomaasapi/pull/59

Me and frobware have successfully tested different network configurations on MAAS 1.9 and 2.0 with Juju 2.0-rc1, the branch (https://github.com/juju/juju/pull/6258) which has all the fixes for this bug is currently pre-tested with our CI infrastructure, and if all goes well we'll land it by EOD (or tomorrow, if we hit issues).

Revision history for this message
Andrew McDermott (frobware) wrote :

I pushed a new branch 'lp-1566791-bridge-all'[1] to juju/juju so that we can get a CI run. If that pans out OK then we plan to merge the branch as the fix for this bug.

[1] https://github.com/juju/juju/tree/lp-1566791-bridge-all

Changed in juju:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.