network-get starts returning flannel address instead of host nic address

Bug #1897115 reported by Nobuto Murata
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Joseph Phillips

Bug Description

juju version: 2.8.3-bionic-amd64
provider: vSphere

Charmed Kubernetes deployment doesn't settle and is not usable with "waiting: Waiting to retry addon deployment" in kubernetes-master. The root cause is Juju believes that /32 addresses from an vxlan interface of flannel in kubernetes-master units are the ones other services should use instead of the "main" /26 addresses the units have. By the nature of /32 in IPv4, nobody can reach to it.

How to reproduce:

1. Bootstrap with local/LXD profier

2. Define a separate bridge as follows:

$ lxc network create lxdbr-rfc6598 \
    ipv4.address=100.64.0.1/26 \
    ipv4.dhcp.ranges=100.64.0.11-100.64.0.62 \
    ipv4.nat=true \
    ipv6.address=none \

3. Define a profile with it:

$ lxc profile create juju-rfc6598

$ lxc profile edit juju-rfc6598 <<EOF
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: lxdbr-rfc6598
    type: nic
EOF

4. Create a model matching with the name of the profile:

$ juju add-model rfc6598

5. Make sure the logging level is DEBUG or more, and deploy:

$ juju model-config logging-config
<root>=DEBUG

$ juju deploy ./reproducer.yaml

reproducer.yaml:
https://bugs.launchpad.net/juju/+bug/1897115/+attachment/5414828/+files/reproducer.yaml

6. Once the model settles, wait for 10 or 20 minutes until "observed network config updated" event is triggered.

> DEBUG juju.worker.machiner machiner.go:181 observed network config updated

[Actual]

The unit will have two IP addresses, one is /26 and the other is /32 on flannel.1. Juju's network-get will select /32 as the ingress-address.

$ juju show-machine 0
...
    network-interfaces:
      eth0:
        ip-addresses:
        - 100.64.0.48
        mac-address: 00:16:3e:51:24:68
        gateway: 100.64.0.1
        is-up: true
      flannel.1:
        ip-addresses:
        - 10.1.27.0
        mac-address: 7e:f7:e2:9a:11:b6
        is-up: true

$ juju run --unit kubernetes-master/0 -- ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
flannel.1 UNKNOWN 10.1.27.0/32 fe80::7cf7:e2ff:fe9a:11b6/64
eth0@if31 UP 100.64.0.48/26 fe80::216:3eff:fe51:2468/64

$ juju run --unit kubernetes-master/0 -- network-get kube-api-endpoint --ingress-address
10.1.27.0

$ juju run --unit kubernetes-master/0 -- network-get kube-api-endpoint
bind-addresses:
- macaddress: 7e:f7:e2:9a:11:b6
  interfacename: flannel.1
  addresses:
  - hostname: ""
    address: 10.1.27.0
    cidr: 10.1.27.0/32 <<<<<<<<<<
egress-subnets:
- 10.1.27.0/32
ingress-addresses:
- 10.1.27.0

[Expected]

Juju will pick up /26 one instead of /32 which is not reachable from other units at all.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote :

Subscribing ~field-high.

I personally cannot reproducible it on my testbed. I created a manual provider env which also doesn't support network spaces, added vxlan interface and /32 address on it like what flannel does, and restarted jujud on the unit, but Juju still returns my expected IP address. So not sure what actually triggers this behavior. I've attached juju-crashdump above, but some details like actual IP addresses may differ since the customer retried deployments multiple times, but no luck.

Revision history for this message
Nobuto Murata (nobuto) wrote :

For example in 4/baremetal/var/log/juju/unit-kubeapi-load-balancer-0.log, initially 133.X.X.X address will be picked up by Juju:

2020-09-23 10:09:06 INFO juju-log Wrote vhost config {'host': '127.0.0.1', 'port': 443, 'server_name': '_', 'services': [{'service_name': 'kubernetes-master', 'hosts': [{'hostname': '133.X.X.X', 'private-address': '133.X.X.X', 'port': '6443'}, {'hostname': '172.16.8.0', 'private-address': '172.16.8.0', 'port': '6443'}]}], 'server_certificate': '/srv/kubernetes/server.crt', 'server_key': '/srv/kubernetes/server.key', 'proxy_read_timeout': 600} to apilb.conf

But once flannel is fully ready, Juju will push out 133.X.X.X address in favor of flannel's /32 address.

2020-09-23 10:23:18 INFO juju-log Wrote vhost config {'host': '127.0.0.1', 'port': 443, 'server_name': '_', 'services': [{'service_name': 'kubernetes-master', 'hosts': [{'hostname': '172.16.13.0', 'private-address': '172.16.13.0', 'port': '6443'}, {'hostname': '172.16.8.0', 'private-address': '172.16.8.0', 'port': '6443'}]}], 'server_certificate': '/srv/kubernetes/server.crt', 'server_key': '/srv/kubernetes/server.key', 'proxy_read_timeout': 600} to apilb.conf

Revision history for this message
Nobuto Murata (nobuto) wrote :

I think this may be an equivalent line in 6/baremetal/var/log/juju/machine-6.log:

2020-09-23 10:22:00 DEBUG juju.worker.machiner machiner.go:181 observed network config updated for "machine-6" to [{DeviceIndex:1 MACAddress: CIDR:127.0.0.0/8 MTU:65536 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:lo ParentInterfaceName: InterfaceType:loopback Disabled:false NoAutoStart:false ConfigType:loopback Address:127.0.0.1 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:1 MACAddress: CIDR:::1/128 MTU:65536 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:lo ParentInterfaceName: InterfaceType:loopback Disabled:false NoAutoStart:false ConfigType:loopback Address:::1 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:2 MACAddress:XX:XX:XX:XX:XX:XX CIDR:133.X.X.X/26 MTU:1500 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:ens192 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:133.X.X.X Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress:133.X.X.X Routes:[] IsDefaultGateway:true NetworkOrigin:machine} {DeviceIndex:2 MACAddress:XX:XX:XX:XX:XX:XX CIDR: MTU:1500 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:ens192 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:manual Address: Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress:133.X.X.X Routes:[] IsDefaultGateway:true NetworkOrigin:machine} {DeviceIndex:3 MACAddress:XX:XX:XX:XX:XX:XX CIDR:172.16.13.0/32 MTU:1450 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:flannel.1 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:172.16.13.0 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:3 MACAddress:XX:XX:XX:XX:XX:XX CIDR: MTU:1450 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:flannel.1 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:manual Address: Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine}]

Revision history for this message
John A Meinel (jameinel) wrote :

I'm a bit surprised to see all the devices listed 2x, but maybe it is just IPv4 vs IPv6, since the first case is 127.* and then ::1, and the second is 172.* and nothing. (But if it was just IPv6, then why would we report a device at all if it didn't have an address? Maybe we start thinking about the link local fe80 but then discard it late?)

My best guess is that 172.16.8.0 is considered ScopeCloudLocal but so is 10.*. Since both of them come from RFC 1918 (https://tools.ietf.org/html/rfc1918) as private addresses. (As are 192.168/16 addresses)

In your local testing, are you also only using Private addresses for both the host machines and for the flannel bridge?

I don't see why we wouldn't be returning both addresses from the wider 'network-get' call.

summary: - network-get returns /32 address and units cannot talk to each other
+ network-get starts returning flannel address instead of host nic address
Revision history for this message
John A Meinel (jameinel) wrote :

Since we can't reproduce this yet, I'll mark it incomplete, but it is something that seems quite confusing.

Changed in juju:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Nobuto Murata (nobuto) wrote :

> My best guess is that 172.16.8.0 is considered ScopeCloudLocal but so is 10.*. Since both of them come from RFC 1918 (https://tools.ietf.org/html/rfc1918) as private addresses. (As are 192.168/16 addresses)

Does network-get treat RFC 1918 addresses in priority to non RFC 1918 address? Or the other way around, put lower priority on non RFC 1918 addresses?

If that's the case, that may explain the behavior that we see both addresses in the machine status, but only one address in network-get.

  "6":
    juju-status:
      current: started
      since: 23 Sep 2020 18:58:14+09:00
      version: 2.8.3
    dns-name: 133.X.X.X
    ip-addresses:
    - 133.X.X.X
    - 172.16.13.0
    instance-id: juju-d7d00e-6
...
    series: bionic
    network-interfaces:
      ens192:
        ip-addresses:
        - 133.X.X.X
        mac-address: XX:XX:XX:XX:XX:XX
        gateway: 133.X.X.X
        is-up: true
      flannel.1:
        ip-addresses:
        - 172.16.13.0
        mac-address: 06:4a:27:1b:ea:39
        is-up: true

> In your local testing, are you also only using Private addresses for both the host machines and for the flannel bridge?

I tested it with the following address mapping (both inside RFC 1918).

10.1.59.0 (flannel.1)
10.0.8.82 (eth0)

So the key is to use a global IP and flannel/32 IP? The issue is reproducible in the specific customer environment. Can you please investigate it further while coming up with a firm reproducer in a test env?

I see some test cases here with network-get. Would it be possible to test this scenario by extending it? > "a global IP and flannel/32 IP"
https://github.com/juju/juju/blob/2.8/worker/uniter/runner/jujuc/network-get_test.go

Changed in juju:
status: Incomplete → New
Revision history for this message
Nobuto Murata (nobuto) wrote :
Download full text (3.4 KiB)

Okay, I believe I have a reproducer now. /32 is chosen thus units cannot communicate each other.

$ juju run --unit kubernetes-master/0 -- ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
flannel.1 UNKNOWN 10.1.27.0/32 fe80::7cf7:e2ff:fe9a:11b6/64
eth0@if31 UP 100.64.0.48/26 fe80::216:3eff:fe51:2468/64

$ juju run --unit kubernetes-master/0 -- network-get kube-api-endpoint --ingress-address
10.1.27.0

$ juju run --unit kubernetes-master/0 -- network-get kube-api-endpoint
bind-addresses:
- macaddress: 7e:f7:e2:9a:11:b6
  interfacename: flannel.1
  addresses:
  - hostname: ""
    address: 10.1.27.0
    cidr: 10.1.27.0/32 <<<<<<<<<<
egress-subnets:
- 10.1.27.0/32
ingress-addresses:
- 10.1.27.0

2020-09-28 01:02:45 DEBUG juju.worker.machiner machiner.go:181 observed network config updated for "machine-0" to [{DeviceIndex:1 MACAddress: CIDR:127.0.0.0/8 MTU:65536 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:lo ParentInterfaceName: InterfaceType:loopback Disabled:false NoAutoStart:false ConfigType:loopback Address:127.0.0.1 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:1 MACAddress: CIDR:::1/128 MTU:65536 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:lo ParentInterfaceName: InterfaceType:loopback Disabled:false NoAutoStart:false ConfigType:loopback Address:::1 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:2 MACAddress:7e:f7:e2:9a:11:b6 CIDR:10.1.27.0/32 MTU:1450 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:flannel.1 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:10.1.27.0 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:2 MACAddress:7e:f7:e2:9a:11:b6 CIDR: MTU:1450 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:flannel.1 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:manual Address: Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false NetworkOrigin:machine} {DeviceIndex:30 MACAddress:00:16:3e:51:24:68 CIDR:100.64.0.0/26 MTU:1500 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderSpaceId: ProviderAddressId: ProviderVLANId: VLANTag:0 InterfaceName:eth0 ParentInterfaceName: InterfaceType:ethernet Disabled:false NoAutoStart:false ConfigType:static Address:100.64.0.48 Addresses:[] ShadowAddresses:[] DNSServers:[] DNSSearchDomains:[] GatewayAddress:100.64.0.1 Routes:[] IsDefaultGateway:true NetworkOrigin:machine} {DeviceIndex:30 MACAddress:00:16:3e:51:24:68 CIDR: MTU:1500 ProviderId: ProviderNetworkId: ProviderSubnetId: ProviderS...

Read more...

Revision history for this message
Nobuto Murata (nobuto) wrote :
Nobuto Murata (nobuto)
description: updated
Nobuto Murata (nobuto)
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

Escalating to ~field-critical.

It looks like there is no workaround except for asking to change the network assignment fundamentally which is not feasible. FWIW, vSphere provider doesn't support `juju spaces` thus there would be no way to select the network explicitly.

Let me know the reproducer in the bug description is incomplete.

Revision history for this message
Nobuto Murata (nobuto) wrote :

I think I've found a workaround. In this specific case, assigning "global" IPs to flannel does the trick to avoid the RFC-1918 vs non RFC-1918 case. 133.X.X.X won over 100.64.0.0/20, but I don't know what happens if the real IP is 90.Y.Y.Y or something (no idea how Juju sorts and determine the main IP address).

In any case, I'm downgrading this to ~field-high since we have one possible workaround to keep going.

$ ip -br a
lo UNKNOWN 127.0.0.1/8 ::1/128
flannel.1 UNKNOWN 100.64.8.0/32 fe80::9c37:6eff:fe35:e8b7/64
eth0@if55 UP 133.X.X.X/26 fe80::216:3eff:fef5:bd0a/64

$ juju run --unit kubernetes-master/0 -- network-get kube-api-endpoint --ingress-address
133.X.X.X

Revision history for this message
Pen Gale (pengale) wrote :

Thank you for all the detailed troubleshooting info and reproducer, Nobuto. We're juggling a lot of things with the 2.9 beta and 2.8.4 release, but this is on our radar, and on the list of things to fix as soon as we can.

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
status: New → In Progress
Revision history for this message
Joseph Phillips (manadart) wrote :

The reproducer is going to have different behaviour on LXD compared to vSphere.

LXD now has subnet discovery, which affects how network-get works. In particular, the entries in the machine's link-layer device collection have their subnet data resolved so that they can in turn be chosen by space. You can verify this by running "juju subnets".

When I attempt to execute "juju run --unit kubernetes-master/0 -- network-get kube-api-endpoint", I can see controller model logs that look like this:

machine-0: 11:35:50 DEBUG juju.apiserver.uniter Looking for address from [static address "10.161.87.77" of device "eth0" on machine "0" static address "10.1.99.0" of device "flannel.1" on machine "0" loopback address "127.0.0.1" of device "lo" on machine "0" loopback address "::1" of device "lo" on machine "0"] in spaces [0]
machine-0: 11:35:50 DEBUG juju.apiserver.uniter skipping static address "10.1.99.0" of device "flannel.1" on machine "0": not linked to a known subnet (subnet "10.1.99.0/32" not found)

What happens here is that only the non-flannel address is returned - the flannel subnet was not discovered.

On vSphere, there is no subnet discovery so the fallback would tend to indicate that the flannel IP has been set as the machine's preferred private address. Is it possible to verify this? One would need to look at the "machines" collection in Mongo.

Revision history for this message
Joseph Phillips (manadart) wrote :

Damn. Let me go back to that. My model is from the edge branch.

Revision history for this message
Joseph Phillips (manadart) wrote :

I've reproduced.

It is as I suggested - after some time, the flannel address is chosen as the preferred private address and fallen back to in this case.

Revision history for this message
Joseph Phillips (manadart) wrote :

This is a symptom of the fact that vSphere does not implement subnet discovery.

When network-get runs, we attempt to resolve addresses in spaces via their subnets. When we can't do this, there is a fall-back to the unit's (via its machine) preferred local-cloud address.

In this case, after flannel comes up and the machine updates its link-layer devices and addresses, it changes the machine's preferred local cloud address, because it is in the RFC-1918 address space and the model's subnet is interpreted as public.

This can be resolved by implementing subnet discovery for vSphere, which I am investigating now.

Revision history for this message
Joseph Phillips (manadart) wrote :

Can you tell me the address allocation method for this particular deployment?

I.e. DHCP, IP pool etc.

Revision history for this message
Nobuto Murata (nobuto) wrote :

> Can you tell me the address allocation method for this particular deployment?
>
> I.e. DHCP, IP pool etc.

It was DHCP. Since it's the requirement of vSphere provider as far as I understand.
https://juju.is/docs/vsphere-cloud

Revision history for this message
Pen Gale (pengale) wrote :

We have roadmap work scheduled for next cycle to support spaces in vsphere, which should solve this bug.

We're blocked on a solution in the meantime.

Changed in juju:
milestone: none → 3.0.0
status: In Progress → Triaged
Changed in juju:
milestone: 3.0.0 → 3.0.1
Changed in juju:
milestone: 3.0.1 → 3.0.2
Changed in juju:
milestone: 3.0.2 → 3.0.3
Changed in juju:
milestone: 3.0.3 → 3.0.4
Revision history for this message
John Puskar (jpuskar-amtrust) wrote :

Is there any sort of manual workaround I can use to fix-up a charmed k8s worker if I change it's IP, and then juju status reports the canal/flannal IP instead of the host ip? Even if it's a mongodb upsert?

Revision history for this message
Joseph Phillips (manadart) wrote :

If you've changed the IP on-machine, restarting jujud should cause the machine worker to update its addresses. Let me know if this is not the case.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.