Canonical Juju

network-get: incorrect resolution of interface name when LXD/bridges in use

Bug #1939018 reported by James Page on 2021-08-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Joseph Phillips	Canonical Juju 2.9.14
	Flannel Charm	New	Undecided	Unassigned

Bug Description

Juju 2.9.9
Ubuntu Focal
Charmed K8S (bundle attached)

The flannel charm uses the network_get helper to determine the interface name for the 'cni' binding.

In my deployment, LXD containers are used on the k8s worker nodes to host other services required in the deployment which causes creation of a bridge for the underlying interface:

$ juju run --unit kubernetes-worker/0 "ip addr"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-enp1s0 state UP group default qlen 1000
    link/ether 52:54:00:03:01:01 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:03:01:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe03:102/64 scope link
       valid_lft forever preferred_lft forever
4: br-enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:6e:b3:cd:14:41 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.146/24 brd 10.0.0.255 scope global br-enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::48d9:8aff:fef4:c8c3/64 scope link
       valid_lft forever preferred_lft forever
5: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:6a:d7:ee brd ff:ff:ff:ff:ff:ff
    inet 10.119.34.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
7: 0lxd0-0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-enp1s0 state UP group default qlen 1000
    link/ether ae:29:a4:0b:7b:66 brd ff:ff:ff:ff:ff:ff link-netnsid 0
9: 0lxd1-0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-enp1s0 state UP group default qlen 1000
    link/ether 2a:6e:b3:cd:14:41 brd ff:ff:ff:ff:ff:ff link-netnsid 1
11: 0lxd2-0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-enp1s0 state UP group default qlen 1000
    link/ether 52:8d:07:d5:60:fe brd ff:ff:ff:ff:ff:ff link-netnsid 2

$ juju run --unit kubernetes-worker/0 "network-get cni"
bind-addresses:
- mac-address: "52:54:00:03:01:01"
  interface-name: enp1s0
  addresses:
  - hostname: ""
    address: 10.0.0.146
    cidr: 10.0.0.0/24
  macaddress: "52:54:00:03:01:01"
  interfacename: enp1s0
egress-subnets:
- 10.0.0.146/32
ingress-addresses:
- 10.0.0.146

the flannel charm thus passes enp1s0 to the flanneld which then fails to start because the IP is bound to the bridge, and not the underlying interface.

Tags:

Revision history for this message

James Page (james-page) wrote on 2021-08-05:

k8s-flannel-ceph.yaml Edit (2.4 KiB, text/html)

James Page (james-page) on 2021-08-05

tags:

added: o7k-k8s

James Page (james-page) on 2021-08-05

summary:

- incorrect resolution of interface name when LXD/bridges in use
+ network-get: incorrect resolution of interface name when LXD/bridges in
+ use

Revision history for this message

Joseph Phillips (manadart) wrote on 2021-08-12:

What is the output of show-machine for the host of kubernetes-worker/0?

When we bridge interfaces of a machine in order to provision containers on it, Juju triggers a re-detection of its network interfaces.

I would expect subsequent network-get calls to use this updated info and meet the expected behaviour here.

Revision history for this message

Joseph Phillips (manadart) wrote on 2021-08-12:

And the underlying machine is an O7k VM?

Changed in juju:
status:	New → Triaged
importance:	Undecided → High
assignee:	nobody → Joseph Phillips (manadart)

Revision history for this message

Joseph Phillips (manadart) wrote on 2021-08-19:

This happens because although we do re-detect the network config on machine and correctly move the address to the bridge, when the instance-poller runs, the provider still thinks the IP is attached to the NIC.

Rather than attempting to address this by coordinating between the machiner and instance-poller, we can handle it in the server-side logic for network-get.

Changed in juju:
status:	Triaged → In Progress
milestone:	none → 2.9.12

Canonical Juju QA Bot (juju-qa-bot) on 2021-08-25

Changed in juju:
milestone:	2.9.12 → 2.9.13

Revision history for this message

Joseph Phillips (manadart) wrote on 2021-08-26:

Addressed by https://github.com/juju/juju/pull/13282.

Changed in juju:
status:	In Progress → Fix Committed

Ian Booth (wallyworld) on 2021-09-10

Changed in juju:
milestone:	2.9.13 → 2.9.14

Canonical Juju QA Bot (juju-qa-bot) on 2021-09-13

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

k8s-flannel-ceph.yaml Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.