network-get: incorrect resolution of interface name when LXD/bridges in use

Bug #1939018 reported by James Page
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Joseph Phillips
Flannel Charm
New
Undecided
Unassigned

Bug Description

Juju 2.9.9
Ubuntu Focal
Charmed K8S (bundle attached)

The flannel charm uses the network_get helper to determine the interface name for the 'cni' binding.

In my deployment, LXD containers are used on the k8s worker nodes to host other services required in the deployment which causes creation of a bridge for the underlying interface:

$ juju run --unit kubernetes-worker/0 "ip addr"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br-enp1s0 state UP group default qlen 1000
    link/ether 52:54:00:03:01:01 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:03:01:02 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::5054:ff:fe03:102/64 scope link
       valid_lft forever preferred_lft forever
4: br-enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2a:6e:b3:cd:14:41 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.146/24 brd 10.0.0.255 scope global br-enp1s0
       valid_lft forever preferred_lft forever
    inet6 fe80::48d9:8aff:fef4:c8c3/64 scope link
       valid_lft forever preferred_lft forever
5: lxdbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:16:3e:6a:d7:ee brd ff:ff:ff:ff:ff:ff
    inet 10.119.34.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
7: 0lxd0-0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-enp1s0 state UP group default qlen 1000
    link/ether ae:29:a4:0b:7b:66 brd ff:ff:ff:ff:ff:ff link-netnsid 0
9: 0lxd1-0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-enp1s0 state UP group default qlen 1000
    link/ether 2a:6e:b3:cd:14:41 brd ff:ff:ff:ff:ff:ff link-netnsid 1
11: 0lxd2-0@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br-enp1s0 state UP group default qlen 1000
    link/ether 52:8d:07:d5:60:fe brd ff:ff:ff:ff:ff:ff link-netnsid 2

$ juju run --unit kubernetes-worker/0 "network-get cni"
bind-addresses:
- mac-address: "52:54:00:03:01:01"
  interface-name: enp1s0
  addresses:
  - hostname: ""
    address: 10.0.0.146
    cidr: 10.0.0.0/24
  macaddress: "52:54:00:03:01:01"
  interfacename: enp1s0
egress-subnets:
- 10.0.0.146/32
ingress-addresses:
- 10.0.0.146

the flannel charm thus passes enp1s0 to the flanneld which then fails to start because the IP is bound to the bridge, and not the underlying interface.

Tags: o7k-k8s
Revision history for this message
James Page (james-page) wrote :
James Page (james-page)
tags: added: o7k-k8s
James Page (james-page)
summary: - incorrect resolution of interface name when LXD/bridges in use
+ network-get: incorrect resolution of interface name when LXD/bridges in
+ use
Revision history for this message
Joseph Phillips (manadart) wrote :

What is the output of show-machine for the host of kubernetes-worker/0?

When we bridge interfaces of a machine in order to provision containers on it, Juju triggers a re-detection of its network interfaces.

I would expect subsequent network-get calls to use this updated info and meet the expected behaviour here.

Revision history for this message
Joseph Phillips (manadart) wrote :

And the underlying machine is an O7k VM?

Changed in juju:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Joseph Phillips (manadart) wrote :

This happens because although we do re-detect the network config on machine and correctly move the address to the bridge, when the instance-poller runs, the provider still thinks the IP is attached to the NIC.

Rather than attempting to address this by coordinating between the machiner and instance-poller, we can handle it in the server-side logic for network-get.

Changed in juju:
status: Triaged → In Progress
milestone: none → 2.9.12
Changed in juju:
milestone: 2.9.12 → 2.9.13
Revision history for this message
Joseph Phillips (manadart) wrote :
Changed in juju:
status: In Progress → Fix Committed
Ian Booth (wallyworld)
Changed in juju:
milestone: 2.9.13 → 2.9.14
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.