maas: incomplete bridge configuration for VLANs

Bug #1784834 reported by James Page
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Incomplete
Low
Unassigned
Netplan
New
Undecided
Unassigned
netplan.io (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Juju: 2.4.1
MAAS: 2.3.3
Ubuntu: 18.04 with latest updates

Relatively simple four machine deployment; machines have connectivity via two physical nics, on with two spaces configured, and the second with just a single space configured (see cloud-init-netplan.yaml for provisioned information).

I think I'm seeing a race between juju reworking the base netplan configuration with bridges, and LXD containers being started which results in a lack of bridges to two spaces on some machines (see
juju-98-netplan.yaml).

On machines where I see missing bridges I also see:

ubuntu@node-urey:~$ ls -lrt /etc/netplan/
total 12
-rw-r--r-- 1 root root 1712 Aug 1 09:40 50-cloud-init.yaml.bak.1533116635
-rw-r--r-- 1 root root 1166 Aug 1 09:43 99-juju.yaml.bak.1533116651
-rw-r--r-- 1 root root 1166 Aug 1 09:44 98-juju.yaml

vs

ubuntu@node-pytheas:/etc/netplan$ ls -lrt
total 16
-rw-r--r-- 1 root root 1713 Aug 1 09:40 50-cloud-init.yaml.bak.1533116672
-rw-r--r-- 1 root root 1270 Aug 1 09:44 99-juju.yaml.bak.1533116688
-rw-r--r-- 1 root root 1270 Aug 1 09:44 98-juju.yaml.bak.1533116704
-rw-r--r-- 1 root root 1270 Aug 1 09:45 99-juju.yaml

on one where the bridges have been configured (see juju-99-netplan.yaml).

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
description: updated
description: updated
Revision history for this message
James Page (james-page) wrote :

Also seeing this in the machine logs on each server:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x10fad84]

goroutine 378 [running]:
github.com/juju/juju/container/lxd.(*Server).CreateContainerFromSpec(0xc4203becc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /workspace/_build/src/github.com/juju/juju/container/lxd/container.go:203 +0x84
github.com/juju/juju/container/lxd.(*containerManager).CreateContainer(0xc4206a6900, 0xc421316780, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /workspace/_build/src/github.com/juju/juju/container/lxd/manager.go:102 +0x18f
github.com/juju/juju/worker/provisioner.(*lxdBroker).StartInstance(0xc4203bed00, 0x3563760, 0xc42074a2c0, 0xc42086a180, 0x24, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /workspace/_build/src/github.com/juju/juju/worker/provisioner/lxd-broker.go:135 +0x831
github.com/juju/juju/worker/provisioner.(*provisionerTask).startMachine(0xc42071c580, 0xc4205ada10, 0x0, 0x0, 0x0, 0xc42061af30, 0xc42029ad80)
        /workspace/_build/src/github.com/juju/juju/worker/provisioner/provisioner_task.go:1071 +0x320
github.com/juju/juju/worker/provisioner.(*provisionerTask).startMachines.func1(0xc4206e8d90, 0xc42071c580, 0xc420379400, 0x2, 0x2, 0xc4205ada10, 0x0, 0x0, 0x0, 0x0)
        /workspace/_build/src/github.com/juju/juju/worker/provisioner/provisioner_task.go:920 +0x8d
created by github.com/juju/juju/worker/provisioner.(*provisionerTask).startMachines
        /workspace/_build/src/github.com/juju/juju/worker/provisioner/provisioner_task.go:918 +0x2fb

Revision history for this message
James Page (james-page) wrote :

The error in #4 is seen across all units AFAICT

Revision history for this message
James Page (james-page) wrote :

Status output:

http://paste.ubuntu.com/p/V5DbKfrKWP/

You can see that a lxd unit failed due to missing bridge configuration

Revision history for this message
Joseph Phillips (manadart) wrote :

The panics in #4 were addressed before 2.4.1 was released, but later than the accepted release tag.
They will not be present in the 2.4 edge build.

See:
https://bugs.launchpad.net/juju/+bug/1779897

Changed in juju:
milestone: none → 2.4.2
importance: Undecided → High
status: New → Triaged
Revision history for this message
James Page (james-page) wrote :

OK so I think I'm seeing something external to Juju causing this, because I switched back to 2.3.8 and saw exactly the same symptoms.

AFAICT the bridging calls fail for some reason when bridging the VLAN interface which is associated with the primary network interfaces - start state:

   eno1 (XX.YY.ZZ.AA)
     eno1.2726 (YY.ZZ.AA.BB)

to

   br-eno1 (XX.YY.ZZ.AA) -> eno1
     br-eno1.2726 (YY.ZZ.AA.BB) -> eno1.2726

eno1.2726 disappears during the operation, leaving the bridge carrier-less and nonfunctional.

Restarting systemd-networkd clears the issue and the interfaces are created successfully.

Raising a netplan task for this bug.

Revision history for this message
James Page (james-page) wrote :

OK so I tested with a deployment from bionic-proposed, and I've not seen this issue on the first run - will re-validate again in +24hrs

Revision history for this message
James Page (james-page) wrote :

A lucky run I guess; repeated the test and had two machines exhibit the symptom.

I think this is something todo with the specific hierarchy of interfaces I have configured, and netplan applying them to systemd-networkd; If I restart systemd-networkd, the ip configuration is as I expect, but running netplan apply results in eno1.2726 disappearing.

Revision history for this message
Richard Harding (rharding) wrote :

With that in mind I'm going to mark incomplete and if we come back from the netplan folks that Juju should be doing something differently we can narrow down and correct the issue on our end.

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netplan.io (Ubuntu):
status: New → Confirmed
Changed in juju:
milestone: 2.4.2 → none
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.