apiaddress not in agent.conf when adding manual machine

Bug #1964513 reported by Erik Lönroth
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Nicolas Vinuesa

Bug Description

I have a controller juju 2.9.25 in a lxd cloud.

When I'm adding a physical machine manually:

    juju add-machine ssh:ubuntu@192.168.2.3

Then, the machine is never exiting pending from juju status. (The agent gets into trouble somehow)

See discourse post about all this: https://discourse.charmhub.io/t/adding-manual-machine-fails-on-address-not-valid/5823/

When I add the apiaddress manually to the agent.conf and restart the machine agent

    systemctl restart jujud-machine-0.service

Then, the apipassword key is gone from the config and a new error enters:

If I add the entries TWICE... Then, it seems that the agent manages to connect, but then something happens and I'm back at zero.

Here is the logs from such an event following a restart of the machine agent:

2022-03-10 19:55:06 INFO juju.cmd supercommand.go:56 running jujud [2.9.25 0 695d9bc09df0725a168df52707336bb6a3a92ff7 gc go1.17]
2022-03-10 19:55:06 DEBUG juju.cmd supercommand.go:57 args: []string{"/var/lib/juju/tools/machine-4/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "4", "--debug"}
2022-03-10 19:55:06 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 16
2022-03-10 19:55:06 DEBUG juju.agent agent.go:592 read agent config, format "2.0"
2022-03-10 19:55:06 INFO juju.agent.setup agentconf.go:128 setting logging config to "<root>=INFO"
2022-03-10 19:55:06 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 2.9.25 have already been run.
2022-03-10 19:55:06 INFO juju.api apiclient.go:673 connection established to "wss://192.168.2.224:17070/model/fb0a48a3-72b3-430f-88c0-ed246604a5eb/api"
2022-03-10 19:55:06 INFO juju.worker.apicaller connect.go:163 [fb0a48] "machine-4" successfully connected to "192.168.2.224:17070"
2022-03-10 19:55:06 INFO juju.api apiclient.go:673 connection established to "wss://192.168.2.224:17070/model/fb0a48a3-72b3-430f-88c0-ed246604a5eb/api"
2022-03-10 19:55:06 INFO juju.worker.apicaller connect.go:163 [fb0a48] "machine-4" successfully connected to "192.168.2.224:17070"
2022-03-10 19:55:06 INFO juju.worker.upgrader upgrader.go:244 desired agent binary version: 2.9.25
2022-03-10 19:55:06 INFO juju.worker.deployer nested.go:159 new context: units "", stopped ""
2022-03-10 19:55:06 INFO juju.worker.migrationminion worker.go:140 migration phase is now: NONE
2022-03-10 19:55:06 INFO juju.worker.logger logger.go:120 logger worker started
2022-03-10 19:55:06 INFO juju.worker.diskmanager diskmanager.go:67 block devices changed: []storage.BlockDevice{storage.BlockDevice{DeviceName:"loop0", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x6e, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/core/12725", SerialId:""}, storage.BlockDevice{DeviceName:"loop1", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x37, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/core18/2284", SerialId:""}, storage.BlockDevice{DeviceName:"loop2", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x3d, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/core20/1361", SerialId:""}, storage.BlockDevice{DeviceName:"loop3", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x37, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/core18/2253", SerialId:""}, storage.BlockDevice{DeviceName:"loop4", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x0, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/jq/6", SerialId:""}, storage.BlockDevice{DeviceName:"loop5", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x3d, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/core20/1328", SerialId:""}, storage.BlockDevice{DeviceName:"loop6", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x43, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/lxd/21835", SerialId:""}, storage.BlockDevice{DeviceName:"loop7", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x2b, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/snapd/14978", SerialId:""}, storage.BlockDevice{DeviceName:"loop8", DeviceLinks:[]string(nil), Label:"", UUID:"", HardwareId:"", WWN:"", BusAddress:"", Size:0x43, FilesystemType:"squashfs", InUse:true, MountPoint:"/snap/lxd/22526", SerialId:""}, storage.BlockDevice{DeviceName:"loop9", DeviceLinks:[]string{"/dev/disk/by-label/juju-btrfs", "/dev/disk/by-uuid/acbc28ce-45e7-491b-a105-5fccd73f7a44"}, Label:"juju-btrfs", UUID:"acbc28ce-45e7-491b-a105-5fccd73f7a44", HardwareId:"", WWN:"", BusAddress:"", Size:0x60db, FilesystemType:"btrfs", InUse:true, MountPoint:"", SerialId:""}, storage.BlockDevice{DeviceName:"sda", DeviceLinks:[]string{"/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000", "/dev/disk/by-id/scsi-3600508b1001c01ece85c00bfdede757f", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0", "/dev/disk/by-id/wwn-0x600508b1001c01ece85c00bfdede757f", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0"}, Label:"", UUID:"", HardwareId:"scsi-3600508b1001c01ece85c00bfdede757f", WWN:"0x600508b1001c01ece85c00bfdede757f", BusAddress:"scsi@2:1.0.0", Size:0x4459c, FilesystemType:"", InUse:true, MountPoint:"", SerialId:"3600508b1001c01ece85c00bfdede757f"}, storage.BlockDevice{DeviceName:"sda1", DeviceLinks:[]string{"/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part1", "/dev/disk/by-id/scsi-3600508b1001c01ece85c00bfdede757f-part1", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0-part1", "/dev/disk/by-id/wwn-0x600508b1001c01ece85c00bfdede757f-part1", "/dev/disk/by-partuuid/284c0cd2-b98e-49c6-8583-1c8d3502b760", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0-part1"}, Label:"", UUID:"", HardwareId:"scsi-3600508b1001c01ece85c00bfdede757f", WWN:"0x600508b1001c01ece85c00bfdede757f", BusAddress:"scsi@2:1.0.0", Size:0x1, FilesystemType:"", InUse:false, MountPoint:"", SerialId:"3600508b1001c01ece85c00bfdede757f"}, storage.BlockDevice{DeviceName:"sda2", DeviceLinks:[]string{"/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part2", "/dev/disk/by-id/scsi-3600508b1001c01ece85c00bfdede757f-part2", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0-part2", "/dev/disk/by-id/wwn-0x600508b1001c01ece85c00bfdede757f-part2", "/dev/disk/by-partuuid/4328ed3d-5cb8-4bbe-a46f-a4d0c3b35ae6", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0-part2", "/dev/disk/by-uuid/c3c46a1c-de95-4ccc-bac8-03c15896ca2a"}, Label:"", UUID:"c3c46a1c-de95-4ccc-bac8-03c15896ca2a", HardwareId:"scsi-3600508b1001c01ece85c00bfdede757f", WWN:"0x600508b1001c01ece85c00bfdede757f", BusAddress:"scsi@2:1.0.0", Size:0x400, FilesystemType:"ext4", InUse:true, MountPoint:"/boot", SerialId:"3600508b1001c01ece85c00bfdede757f"}, storage.BlockDevice{DeviceName:"sda3", DeviceLinks:[]string{"/dev/disk/by-id/lvm-pv-uuid-QdesTc-ouRX-s4fP-qCcB-bjAA-OoGV-JjT9fm", "/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_00000000-part3", "/dev/disk/by-id/scsi-3600508b1001c01ece85c00bfdede757f-part3", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0-part3", "/dev/disk/by-id/wwn-0x600508b1001c01ece85c00bfdede757f-part3", "/dev/disk/by-partuuid/1428fd62-5554-4aa6-bd46-885dd0566e85", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:0-part3"}, Label:"", UUID:"QdesTc-ouRX-s4fP-qCcB-bjAA-OoGV-JjT9fm", HardwareId:"scsi-3600508b1001c01ece85c00bfdede757f", WWN:"0x600508b1001c01ece85c00bfdede757f", BusAddress:"scsi@2:1.0.0", Size:0x44199, FilesystemType:"LVM2_member", InUse:true, MountPoint:"", SerialId:"3600508b1001c01ece85c00bfdede757f"}, storage.BlockDevice{DeviceName:"sdb", DeviceLinks:[]string{"/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_01000000", "/dev/disk/by-id/scsi-3600508b1001c0ec3cce625c6c0c5813c", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0", "/dev/disk/by-id/wwn-0x600508b1001c0ec3cce625c6c0c5813c", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:1"}, Label:"", UUID:"", HardwareId:"scsi-3600508b1001c0ec3cce625c6c0c5813c", WWN:"0x600508b1001c0ec3cce625c6c0c5813c", BusAddress:"scsi@2:1.0.1", Size:0x8baec, FilesystemType:"", InUse:true, MountPoint:"", SerialId:"3600508b1001c0ec3cce625c6c0c5813c"}, storage.BlockDevice{DeviceName:"sdb1", DeviceLinks:[]string{"/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_01000000-part1", "/dev/disk/by-id/scsi-3600508b1001c0ec3cce625c6c0c5813c-part1", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0-part1", "/dev/disk/by-id/wwn-0x600508b1001c0ec3cce625c6c0c5813c-part1", "/dev/disk/by-label/lxdhosts", "/dev/disk/by-partlabel/zfs-a7a91d3dfe14c8c1", "/dev/disk/by-partuuid/b106e153-04ca-b24a-96ba-bd76476ad7e1", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:1-part1", "/dev/disk/by-uuid/13632863783802810807"}, Label:"lxdhosts", UUID:"13632863783802810807", HardwareId:"scsi-3600508b1001c0ec3cce625c6c0c5813c", WWN:"0x600508b1001c0ec3cce625c6c0c5813c", BusAddress:"scsi@2:1.0.1", Size:0x8bae3, FilesystemType:"zfs_member", InUse:true, MountPoint:"", SerialId:"3600508b1001c0ec3cce625c6c0c5813c"}, storage.BlockDevice{DeviceName:"sdb9", DeviceLinks:[]string{"/dev/disk/by-id/scsi-0HP_LOGICAL_VOLUME_01000000-part9", "/dev/disk/by-id/scsi-3600508b1001c0ec3cce625c6c0c5813c-part9", "/dev/disk/by-id/scsi-SHP_LOGICAL_VOLUME_500143802258BED0-part9", "/dev/disk/by-id/wwn-0x600508b1001c0ec3cce625c6c0c5813c-part9", "/dev/disk/by-partuuid/5203134c-6a2d-ce46-adea-73db0e666772", "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:1:0:1-part9"}, Label:"", UUID:"", HardwareId:"scsi-3600508b1001c0ec3cce625c6c0c5813c", WWN:"0x600508b1001c0ec3cce625c6c0c5813c", BusAddress:"scsi@2:1.0.1", Size:0x8, FilesystemType:"", InUse:false, MountPoint:"", SerialId:"3600508b1001c0ec3cce625c6c0c5813c"}}
2022-03-10 19:55:07 INFO juju.worker.upgradeseries worker.go:161 no series upgrade lock present
2022-03-10 19:55:07 INFO juju.cmd.jujud.runner runner.go:556 start "4-container-watcher"
2022-03-10 19:55:07 INFO juju.worker.machiner machiner.go:162 setting addresses for "machine-4" to [local-machine:127.0.0.1 local-machine:::1]
2022-03-10 19:55:07 INFO juju.container.lxd manager.go:70 Availability zone will be empty for this container manager
2022-03-10 19:55:07 INFO juju.worker.authenticationworker worker.go:103 "machine-4" key updater worker started
2022-03-10 19:55:07 INFO juju.cmd.jujud.runner runner.go:386 runner is dying
2022-03-10 19:55:07 INFO juju.worker.logger logger.go:136 logger worker stopped
2022-03-10 19:55:07 INFO juju.cmd.jujud.runner runner.go:587 stopped "4-container-watcher", err: connection is shut down
2022-03-10 19:55:07 ERROR juju.cmd.jujud.runner runner.go:459 fatal "4-container-watcher": connection is shut down
2022-03-10 19:55:07 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [fb0a48] "machine-4" cannot open api: validating info for opening an API connection: missing addresses not valid
2022-03-10 19:55:07 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [fb0a48] "machine-4" cannot open api: validating info for opening an API connection: missing addresses not valid
2022-03-10 19:55:11 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [fb0a48] "machine-4" cannot open api: validating info for opening an API connection: missing addresses not valid
apipassword: kkHnTKwUr/hf7cWst0Ijbb8k
apiaddresses:
- 192.168.2.224:170702022-03-10 19:55:15 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [fb0a48] "machine-4" cannot open api: validating info for opening an API connection: missing addresses not valid

Please note that the agent.conf has both apiaddress and apipassword keys removed after this event...

description: updated
description: updated
description: updated
description: updated
Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

2022-03-10 20:13:20 INFO juju.cmd.jujud.runner runner.go:556 start "4-container-watcher"
2022-03-10 20:13:20 INFO juju.container.lxd manager.go:70 Availability zone will be empty for this container manager
2022-03-10 20:13:20 INFO juju.worker.upgradeseries worker.go:161 no series upgrade lock present
2022-03-10 20:13:20 ERROR juju.worker.dependency engine.go:693 "instance-mutater" manifold worker returned unexpected error: cannot start machine instancemutater worker: websocket: close sent
2022-03-10 20:13:20 INFO juju.worker.authenticationworker worker.go:100 starting key updater worker: connection is shut down
2022-03-10 20:13:20 INFO juju.cmd.jujud.runner runner.go:587 stopped "4-container-watcher", err: worker "4-container-watcher" exited: connection is shut down
2022-03-10 20:13:20 INFO juju.worker.logger logger.go:136 logger worker stopped
2022-03-10 20:13:20 ERROR juju.worker.dependency engine.go:693 "reboot-executor" manifold worker returned unexpected error: connection is shut down
2022-03-10 20:13:20 ERROR juju.worker.dependency engine.go:693 "fan-configurer" manifold worker returned unexpected error: connection is shut down
2022-03-10 20:13:20 INFO juju.cmd.jujud.runner runner.go:386 runner is dying
2022-03-10 20:13:20 ERROR juju.cmd.jujud.runner runner.go:459 fatal "4-container-watcher": worker "4-container-watcher" exited: connection is shut down
2022-03-10 20:13:20 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [fb0a48] "machine-4" cannot open api: validating info for opening an API connection: missing addresses not valid

Revision history for this message
John A Meinel (jameinel) wrote :

This might be related to https://bugs.launchpad.net/juju/+bug/1888453 but as it is 2.9.25 it certainly not exactly the old bug.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1964513] Re: apiaddress not in agent.conf when adding manual machine

This was brought up as an 'lxd controller' adding a manually provisioned
machine, using bridged networking and LXD containers.
I believe we have some code around how we handle 'lxdbr0' to avoid cases
(when *not* bridged) where a machine would report its local bridge
addresses as potential routable addresses, only to have them
duplicated/confusing conversations outside of that machine. It may be that
something around that filtering is triggering, causing us to not trust the
real addresses for the controller. (the 'ip a s' for the controller machine
doesn't give any hint why we wouldn't be reporting the eth0 address, though)

On Thu, Mar 10, 2022 at 3:30 PM John A Meinel <email address hidden>
wrote:

> This might be related to https://bugs.launchpad.net/juju/+bug/1888453
> but as it is 2.9.25 it certainly not exactly the old bug.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1964513
>
> Title:
> apiaddress not in agent.conf when adding manual machine
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1964513/+subscriptions
>
>

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

What can I do to help out here since I'm dependent on getting this hardware up and monitored with juju.

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

Any chance to get help on this as I can't monitor my hosts still...

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

# I've tried with latest version (2.9.27)

# The machine has started the agent:

root@iceberg:/var/log/juju# systemctl status jujud-machine-12.service
● jujud-machine-12.service - juju agent for machine-12
     Loaded: loaded (/etc/systemd/system/jujud-machine-12.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-03-25 22:18:05 UTC; 59s ago
   Main PID: 3196067 (bash)
      Tasks: 21 (limit: 464208)
     Memory: 23.1M
     CGroup: /system.slice/jujud-machine-12.service
             ├─3196067 bash /etc/systemd/system/jujud-machine-12-exec-start.sh
             └─3196072 /var/lib/juju/tools/machine-12/jujud machine --data-dir /var/lib/juju --machine-id 12 --debug

Mar 25 22:18:05 iceberg systemd[1]: Started juju agent for machine-12.

# but the machine is still in penging

12 pending 192.168.2.2 manual:192.168.2.2 focal Manually provisioned machine

# In /var/log/juju/machine-12.log

2022-03-25 22:20:31 ERROR juju.worker.dependency engine.go:693 "api-caller" manifold worker returned unexpected error: [fb0a48] "machine-12" cannot open api: validating info for opening an API connection: missing addresses not valid

Revision history for this message
Ian Booth (wallyworld) wrote :

I tried to reproduce this on tip of 2.9 without luck.
I had an LXD controller and manually started another LXD machine which I used with
juju add-machine ssh:...

apiaddresses value in agent.conf was there and remained that way.

The only was I can see that the apiaddresses would be set to empty is via the SetAPIHostPorts() api call. There's a debug line that prints what the new address value is; you can turn on DEBUG for the "juju.agent" and "juju.worker.apiaddressupdater" loggers and you'll see lines like

DEBUG juju.worker.apiaddressupdater updating API hostPorts to [[10.46.214.88:17070 127.0.0.1:17070 [::1]:17070]]
DEBUG juju.agent API server address details [["10.46.214.88:17070" "127.0.0.1:17070" "[::1]:17070"]] written to agent config as ["10.46.214.88:17070"]

(I can't recall if you need to do this in the controller model or the model where the machine is added; won't hurt to set logging-config in both just to be sure)

The api addresses are updated whenever an address detail on any controller changes. All network addresses on the controller are gathered and processed. Assuming no juju management space has been configured, this in effect means all controller IP addresses are used.

There's 2 levels of filtering - the worker routine filters out any lxc bridge addresses, and the final agent api call filters out any addresses which are not private cloud, ie addresses reachable by other cloud nodes.

So in the above case, localhost addresses are removed and just the 10.* address is written to agent.conf

So let's see what the logging shows and maybe that will help narrow down the point at which the address list goes to empty.

Revision history for this message
John A Meinel (jameinel) wrote :

He mentioned in a different thread that he was running this controller in
an LXD container that was using Bridged networking, and I *believe* it was
trying to provision the host that was holding the LXD container. (so
provisioned a bridged LXD container on the machine, and then adding the
host as a manual machine.)
It is likely (IMO) that the code around filtering out the LXD bridge is
filtering out the non-local LXD bridge just because it is the lxd bridge.

On Sun, Mar 27, 2022 at 10:40 PM Ian Booth <email address hidden>
wrote:

> I tried to reproduce this on tip of 2.9 without luck.
> I had an LXD controller and manually started another LXD machine which I
> used with
> juju add-machine ssh:...
>
> apiaddresses value in agent.conf was there and remained that way.
>
> The only was I can see that the apiaddresses would be set to empty is
> via the SetAPIHostPorts() api call. There's a debug line that prints
> what the new address value is; you can turn on DEBUG for the
> "juju.agent" and "juju.worker.apiaddressupdater" loggers and you'll see
> lines like
>
> DEBUG juju.worker.apiaddressupdater updating API hostPorts to [[
> 10.46.214.88:17070 127.0.0.1:17070 [::1]:17070]]
> DEBUG juju.agent API server address details [["10.46.214.88:17070" "
> 127.0.0.1:17070" "[::1]:17070"]] written to agent config as ["
> 10.46.214.88:17070"]
>
> (I can't recall if you need to do this in the controller model or the
> model where the machine is added; won't hurt to set logging-config in
> both just to be sure)
>
> The api addresses are updated whenever an address detail on any
> controller changes. All network addresses on the controller are gathered
> and processed. Assuming no juju management space has been configured,
> this in effect means all controller IP addresses are used.
>
> There's 2 levels of filtering - the worker routine filters out any lxc
> bridge addresses, and the final agent api call filters out any addresses
> which are not private cloud, ie addresses reachable by other cloud
> nodes.
>
> So in the above case, localhost addresses are removed and just the 10.*
> address is written to agent.conf
>
> So let's see what the logging shows and maybe that will help narrow down
> the point at which the address list goes to empty.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1964513
>
> Title:
> apiaddress not in agent.conf when adding manual machine
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1964513/+subscriptions
>
>

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

My physical machine (iceberg) is a lxd host.

iceberg has a bridge (lxdbr0) with a static IPv4-address. 192.168.2.2

lxdbr0 is "unmanaged" - so all lxc-containers gets their IP:s from my router dhcp, which is also the default gw: 192.168.2.1

The juju-controller is running in a lxd-container on the iceberg lxd host.

The issue occurs when I try add iceberg as a manual machine to a model in this controller.

Revision history for this message
John A Meinel (jameinel) wrote :

I *think* if you just changed it so that you weren't using lxdbr0 it
probably would work, since Juju is explicitly filtering addresses from that
bridge for other historical reasons (cases where LXD did not do a good job
generating unique bridge addresses, causing us to run into problems when
the remote machine also had an lxd bridge whose address range overlapped
with your local LXD controller.)

On Tue, Mar 29, 2022 at 6:05 PM Erik Lönroth <email address hidden>
wrote:

> My physical machine (iceberg) is a lxd host.
>
> iceberg has a bridge (lxdbr0) with a static IPv4-address. 192.168.2.2
>
> lxdbr0 is "unmanaged" - so all lxc-containers gets their IP:s from my
> router dhcp, which is also the default gw: 192.168.2.1
>
> The juju-controller is running in a lxd-container on the iceberg lxd
> host.
>
> The issue occurs when I try add iceberg as a manual machine to a model
> in this controller.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1964513
>
> Title:
> apiaddress not in agent.conf when adding manual machine
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1964513/+subscriptions
>
>

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

Can you explain more explicity what you mean by "not using lxdbr0"?

Rename it? I don't understand...

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

I've tried:

* adding a dummy ip, adding it to the lxdbr0 - same issue.
* adding "just" the ip of the dummy ip (not part of the bridge) - same issue.
* 2.9.27 - same issue.

Do I need to completely reconfigure the lxd host network to not use lxdbr0 as default network?

If so, all my containers would likely be messed up and/or cause major side effects for juju and/or lxd?

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

I still can't add my machine and I don't fully grasp the consequence of changing the bridge to my running environment.

If I change the bridge, will I lose my running containers?

Will my environment break?

Will this bug be accepted?

I really would like to know since this the default bridge name and would affect everyone that uses lxd with unmanaged bridged-mode which also happens to be the recommended way to run lxd as far as I have understood.

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

So basically, I have a bridge currently like this managed by netplan + my router providing dhcp over vlan2:

# This is the network config written by 'subiquity'
network:
  version: 2
  ethernets:
    ens5f0:
      dhcp4: no
      dhcp6: no
  bridges:
    lxdbr0:
      interfaces: [ vlan2 ]
      addresses: [ 192.168.2.2/24 ]
      gateway4: 192.168.2.1
      nameservers:
        addresses:
        - 192.168.2.1
        search:
        - mydomain
      parameters:
        stp: true
        forward-delay: 4
      dhcp4: no
      dhcp6: no

  vlans:
    vlan2:
      id: 2
      link: ens5f0
      dhcp4: no
      dhcp6: no

lxc has the following default profile:

lxc profile show default
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr0
    type: nic
  root:
    path: /
    pool: lxdhosts
    type: disk
name: default
used_by:
- /1.0/instances/juju-fe5353-0
- /1.0/instances/juju-f4cf5f-4
- /1.0/instances/caspians-dator
...
...
...
- /1.0/instances/juju-75f241-0
- /1.0/instances/juju-75f241-1

So, how can I change the name of my bridge and not blow up my containers which already uses the parent: lxdbr0

I imagine the clients surviving a process like:

1. Change the bridge name to lets say BR001 in netplan.
2. add BR001 to lxc
3. change the default profile in lxc to make use of BR001
4. try start/restart some existing units.
5. try spawn some new units.
6. hope for the best and reboot the lxd-host

Is this the path? Anything more?

Revision history for this message
Erik Lönroth (erik-lonroth) wrote :

I'm running out of options so I did:

1. changed the lxdbr0 interface name in netplan to -> br0
2. edited the default profile for lxd to use the br0 instead of the lxdbr0
3. This caused lxd to automatically move the lxc containers to the new br0
4. The lxdbr0 was left behind empty, but I didn't remove it with brctl delbr lxdbr0...

I then tried again to do the manual juju ssh-add machine ssh:ubuntu@192.168.2.2

But, that failed.

I then saw some references to lxdbr0 for some reason in the machine-0.log

2022-04-30 14:08:01 DEBUG juju.network network.go:181 "lxdbr0" has addresses [192.168.2.2/24]
2022-04-30 14:08:01 DEBUG juju.network network.go:178 cannot get "virbr0" addresses: route ip+net: no such network interface (ignoring)
2022-04-30 14:08:01 DEBUG juju.network network.go:130 filtering "lxdbr0" address local-cloud:192.168.2.224 for machine
2022-04-30 14:08:01 DEBUG juju.network network.go:127 including address local-machine:127.0.0.1 for machine
2022-04-30 14:08:01 DEBUG juju.network network.go:127 including address local-machine:::1 for machine
2022-04-30 14:08:01 DEBUG juju.network network.go:196 addresses after filtering: [local-machine:127.0.0.1 local-machine:::1]

.... so, I removed the lxdbr0 with

    brctl delbr lxdbr0

* Then rebooted the controll (To perhaps get rid of potential leftovers.)

* Then removed the traces of juju machine agent on the host: 192.168.2.2
    /usr/sbin/remove-juju-services

* Then I tried again:

    juju add-machine ssh:ubuntu@192.168.2.2 --debug

Success!

So, Yes, the renaming of the lxdbr0 -> br0 worked.

However this is a major issue since lxdbr0 is the default bridge name for lxd today ant this will happen again.

Also, this workaround caused some lxc containers to have to be rebooted as not all of then survived the name-change.

Changed in juju:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Nicolas Vinuesa (nvinuesa)
tags: added: lxd-provider lxdbr network
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.