LXD unit binding to incorrect MAAS space with no subnets crashes with error: runtime error: invalid memory address or nil pointer dereference

Bug #1994124 reported by Trent Lloyd
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Simon Richardson

Bug Description

Application deployments to an LXD container are failing for me with the following error

[juju status]
Machine State Address Inst id Series AZ Message
0 started 10.230.63.159 harhall focal default Deployed
0/lxd/0 down pending focal runtime error: invalid memory address or nil pointer dereference

This appears to be caused by using the alpha space which exists by default but has no subenets

(1) MAAS installation has a single space called "oam" (with a single subnet)

$ juju spaces
Name Space ID Subnets
alpha 0
oam 1 10.230.56.0/21

(2) The bundle was incorrectly binding each applications default space to "alpha"

app_name:
  bindings:
    '': oam

Changing the bindings to oam deploys containers successfully (they would also deploy when specifyign no binding). This was using a freshly deployed controller on 2.9.35 (latest at time of writing)

The full error/crash was on the controller log (was not in the metal machine logs):

2022-10-25 03:02:24 CRITICAL juju.rpc server.go:557 panic running request {MethodCaller:0xc000d07f50 transformErrors:0x2b76980 hdr:{RequestId:487 Request:{Type:Provisioner Version:11 Id: Action:PrepareContainerInterfaceInfo} Error: ErrorCode: ErrorInfo:map[] Version:1}} with arg {Entities:[{Tag:machine-1-lxd-4}]}: runtime error: invalid memory address or nil pointer dereference
goroutine 461632 [running]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
github.com/juju/juju/rpc.(*Conn).runRequest.func1()
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/rpc/server.go:558 +0x105
panic({0x4dd1aa0, 0x9103ed0})
        /usr/local/go/src/runtime/panic.go:884 +0x212
github.com/juju/juju/provider/maas.(*maasEnviron).createAndPopulateDevice(0xc002bdb400, {{0xc002eba1f8, 0x13}, {0x0, 0x0}, {0xc002d1c348, 0x11}, {0x587a5e0, 0x4}, {0xc001b8ec00, ...}, ...})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/provider/maas/devices.go:414 +0x8db
github.com/juju/juju/provider/maas.(*maasEnviron).allocateContainerAddresses2(0xc002bdb400, {0x7?, 0xc0059612d8?}, {0xc000780990, 0x6}, {{0xc000780ae0?, 0x7?}}, {0xc001b8ec00, 0x5, 0x5})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/provider/maas/environ.go:2336 +0x34c
github.com/juju/juju/provider/maas.(*maasEnviron).AllocateContainerAddresses(0xc002bdb400, {0x623fd40, 0xc004bbefd8}, {0xc000780990, 0x6}, {{0xc000780ae0?, 0x30?}}, {0xc001b8ec00, 0x5, 0x5})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/provider/maas/environ.go:2183 +0x15a
github.com/juju/juju/apiserver/facades/agent/provisioner.(*prepareOrGetContext).ProcessOneContainer(0xc003a4ff20, {0x6264d10?, 0xc002bdb400}, {0x623fd40, 0xc004bbefd8}, {0x6214478, 0xc000cae540}, 0x0, {0x625d3f0, 0xc000642678}, ...)
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/facades/agent/provisioner/provisioner.go:1043 +0x343
github.com/juju/juju/apiserver/facades/agent/provisioner.(*ProvisionerAPI).processEachContainer(0xc0012d9700, {{0xc004753340?, 0x0?, 0x0?}}, {0x6224090, 0xc003a4ff20})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/facades/agent/provisioner/provisioner.go:972 +0x3ca
github.com/juju/juju/apiserver/facades/agent/provisioner.(*ProvisionerAPI).prepareOrGetContainerInterfaceInfo(0x4aa07e?, {{0xc004753340?, 0x0?, 0x4?}}, 0x0)
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/facades/agent/provisioner/provisioner.go:1069 +0xb4
github.com/juju/juju/apiserver/facades/agent/provisioner.(*ProvisionerAPI).PrepareContainerInterfaceInfo(0x8?, {{0xc004753340?, 0x2?, 0x468759?}})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/facades/agent/provisioner/provisioner.go:892 +0x25
reflect.Value.call({0x573e9e0?, 0xc00012bc38?, 0x40db47?}, {0x5879e80, 0x4}, {0xc0046eb668, 0x1, 0x1617a5d?})
        /usr/local/go/src/reflect/value.go:584 +0x8c5
reflect.Value.Call({0x573e9e0?, 0xc00012bc38?, 0x4f56c40?}, {0xc0046eb668?, 0x0?, 0xc00336ad78?})
        /usr/local/go/src/reflect/value.go:368 +0xbc
github.com/juju/rpcreflect.newMethod.func8({0x623ab48, 0xc004753380}, {0x573e9e0?, 0xc00012bc38?, 0xc0056f2d80?}, {0x4f56c40?, 0xc0046eb5c0?, 0xc003c5bc00?})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/vendor/github.com/juju/rpcreflect/type.go:344 +0xce
github.com/juju/juju/apiserver.(*srvCaller).Call(0xc000d07f50, {0x623ab48, 0xc004753380}, {0x0?, 0xc0003c9620?}, {0x4f56c40?, 0xc0046eb5c0?, 0x40d7ff?})
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/apiserver/root.go:188 +0xa6
github.com/juju/juju/rpc.(*Conn).runRequest(0xc000c90280, {{0x6223ee0, 0xc000d07f50}, 0x5ac55e0, {0x1e7, {{0xc00227db70, 0xb}, 0xb, {0x0, 0x0}, ...}, ...}}, ...)
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/rpc/server.go:571 +0x1b6
created by github.com/juju/juju/rpc.(*Conn).handleRequest
        /home/jenkins/workspace/build-juju/build/src/github.com/juju/juju/rpc/server.go:475 +0x651

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9.37
status: New → Triaged
importance: Undecided → Critical
Changed in juju:
milestone: 2.9.37 → 2.9.38
Revision history for this message
Harry Pidcock (hpidcock) wrote :

This kind of looks like the primary nic doesn't have a VLAN?

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
Changed in juju:
milestone: 2.9.38 → 2.9.39
Revision history for this message
Simon Richardson (simonrichardson) wrote :

PR to prevent the panic, it's not quite obvious what the correct solution is here, without speaking to Joe. https://github.com/juju/juju/pull/15025

Changed in juju:
status: Triaged → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote (last edit ):

The real issue looks to be further upstream in linkLayerDevicesForSpaces.

See https://github.com/juju/juju/blob/94f389f505178439f7d0b0136e1e0e51502be5d0/network/containerizer/bridgepolicy.go#L287

As the comment describes, we include devices without addresses in the default (alpha) space.

Later when we iterate over these to create container NICs paired with parents, we assign names "eth" + i in PopulateContainerLinkLayerDevices.

After that we choose "eth0" as the primary NIC.

What I think is happening here is that there is a bridge on the host that is not configured, which happens to be at index 0 based on the sorting by name.

What we should do to fix it is omit devices without addresses from consideration (see the linked line above).

Changed in juju:
assignee: Joseph Phillips (manadart) → Simon Richardson (simonrichardson)
status: In Progress → Fix Committed
Changed in juju:
milestone: 2.9.39 → 2.9.42
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.