Different networking behavior with 2nd IP on 2nd NIC

Bug #2101860 reported by James Falcon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
New
Undecided
Unassigned
netplan.io (Ubuntu)
New
Undecided
Unassigned
systemd (Ubuntu)
New
Undecided
Unassigned

Bug Description

This integration test is currently failing on plucky:
https://github.com/canonical/cloud-init/blob/a136a979dd5a5084f71f5def5abf6d80d7cc263a/tests/integration_tests/test_networking.py#L313

This test will launch an ec2 instance, attach a 2nd NIC, and then assign two (total) IP addresses to that NIC. We then check that all 3 IP addresses can `nc` to the SSH port with the public IPs. This works consistently on Oracular and below, but it does not work on Plucky. The secondary IP on the secondary NIC can no longer connect. If I compare the generated netplan configurations between the two instances along with the generated networkd configurations, they are identical (minus different IP addresses). I can even downgrade netplan on plucky to the oracular version, but it still doesn't work on plucky. I have also manually verified that nc does work on the internal secondary IP address.

Edit: A lot of the comments here is me dumping my debug journey, but the main problem seems to be that a policy based routing rule is often missing from the rule db. Calling `netplan apply` (with no changes to the configuration) will sometimes fix it, but most of the time calling `netplan apply` leaves it without the necessary rule.

cloud-init's rendered Netplan configuration on plucky:
# cat /etc/netplan/50-cloud-init.yaml
network:
  version: 2
  ethernets:
    ens5:
      match:
        macaddress: "06:f1:2f:0b:5f:7d"
      dhcp4: true
      dhcp4-overrides:
        route-metric: 100
      dhcp6: true
      dhcp6-overrides:
        route-metric: 100
      set-name: "ens5"
    ens6:
      match:
        macaddress: "06:f0:7b:f6:b6:5f"
      addresses:
      - "192.168.15.230/20"
      dhcp4: true
      dhcp4-overrides:
        use-routes: true
        route-metric: 200
      dhcp6: true
      dhcp6-overrides:
        use-routes: true
        route-metric: 200
      set-name: "ens6"
      routes:
      - table: 101
        to: "0.0.0.0/0"
        via: "192.168.0.1"
      - scope: "link"
        table: 101
        to: "192.168.0.0/20"
      - scope: "link"
        table: 101
        to: "2600:1f16:67f:f200:0:0:0:0/64"
      routing-policy:
      - table: 101
        from: "192.168.12.6"
      - table: 101
        from: "192.168.15.230"
      - table: 101
        from: "2600:1f16:67f:f200:25ac:a1c8:2d0d:ae87"

cloud-init's rendered Netplan configuration on oracular:
root@ip-192-168-1-227:/home/ubuntu# cat /etc/netplan/50-cloud-init.yaml
network:
  version: 2
  ethernets:
    ens5:
      match:
        macaddress: "06:34:a4:59:1c:8f"
      dhcp4: true
      dhcp4-overrides:
        route-metric: 100
      dhcp6: true
      dhcp6-overrides:
        route-metric: 100
      set-name: "ens5"
    ens6:
      match:
        macaddress: "06:48:2a:91:51:79"
      addresses:
      - "192.168.2.219/20"
      dhcp4: true
      dhcp4-overrides:
        use-routes: true
        route-metric: 200
      dhcp6: true
      dhcp6-overrides:
        use-routes: true
        route-metric: 200
      set-name: "ens6"
      routes:
      - table: 101
        to: "0.0.0.0/0"
        via: "192.168.0.1"
      - scope: "link"
        table: 101
        to: "192.168.0.0/20"
      - scope: "link"
        table: 101
        to: "2600:1f16:67f:f200:0:0:0:0/64"
      routing-policy:
      - table: 101
        from: "192.168.12.51"
      - table: 101
        from: "192.168.2.219"
      - table: 101
        from: "2600:1f16:67f:f200:9225:d84d:9f5e:462d"

plucky networkd configurations:
root@ip-192-168-11-242:/home/ubuntu# cat /run/systemd/network/10-netplan-ens5.link
[Match]
PermanentMACAddress=06:f1:2f:0b:5f:7d

[Link]
Name=ens5
WakeOnLan=off
root@ip-192-168-11-242:/home/ubuntu# cat /run/systemd/network/10-netplan-ens5.network
[Match]
PermanentMACAddress=06:f1:2f:0b:5f:7d
Name=ens5

[Network]
DHCP=yes
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true
root@ip-192-168-11-242:/home/ubuntu# cat /run/systemd/network/10-netplan-ens6.link
[Match]
PermanentMACAddress=06:f0:7b:f6:b6:5f

[Link]
Name=ens6
WakeOnLan=off
root@ip-192-168-11-242:/home/ubuntu# cat /run/systemd/network/10-netplan-ens6.network
[Match]
PermanentMACAddress=06:f0:7b:f6:b6:5f
Name=ens6

[Network]
DHCP=yes
LinkLocalAddressing=ipv6
Address=192.168.15.230/20

[Route]
Destination=0.0.0.0/0
Gateway=192.168.0.1
Table=101

[Route]
Destination=192.168.0.0/20
Scope=link
Table=101

[Route]
Destination=2600:1f16:67f:f200:0:0:0:0/64
Scope=link
Table=101

[RoutingPolicyRule]
From=192.168.12.6
Table=101

[RoutingPolicyRule]
From=192.168.15.230
Table=101

[RoutingPolicyRule]
From=2600:1f16:67f:f200:25ac:a1c8:2d0d:ae87
Table=101

[DHCP]
RouteMetric=200
UseMTU=true

Revision history for this message
James Falcon (falcojr) wrote (last edit ):
Download full text (8.4 KiB)

plucky networkctl output:
root@ip-192-168-11-242:/home/ubuntu# networkctl status
● Interfaces: 1, 2, 3
       State: routable
Online state: online
     Address: 192.168.11.242 on ens5
              192.168.12.6 on ens6
              192.168.15.230 on ens6
              2600:1f16:67f:f200:9786:9334:1f2c:8d8 on ens5
              2600:1f16:67f:f200:25ac:a1c8:2d0d:ae87 on ens6
              fe80::4f1:2fff:fe0b:5f7d on ens5
              fe80::4f0:7bff:fef6:b65f on ens6
     Gateway: 192.168.0.1 on ens5
              192.168.0.1 on ens6
              fe80::4f8:4ff:feed:10b4 on ens5
              fe80::4f8:4ff:feed:10b4 on ens6
         DNS: 192.168.0.2

Mar 10 16:05:00 ip-192-168-11-242 systemd[1]: Starting systemd-networkd-wait-online.service - Wait for Network to be Configured...
Mar 10 16:05:00 ip-192-168-11-242 systemd-networkd[607]: ens6: Configuring with /run/systemd/network/10-netplan-ens6.network.
Mar 10 16:05:00 ip-192-168-11-242 systemd-networkd[607]: ens5: Link UP
Mar 10 16:05:00 ip-192-168-11-242 systemd-networkd[607]: ens5: Gained carrier
Mar 10 16:05:00 ip-192-168-11-242 systemd-networkd[607]: ens5: DHCPv4 address 192.168.11.242/20, gateway 192.168.0.1 acquired from 192.168.0.1
Mar 10 16:05:00 ip-192-168-11-242 systemd-networkd[607]: ens6: DHCPv4 address 192.168.12.6/20, gateway 192.168.0.1 acquired from 192.168.0.1
Mar 10 16:05:01 ip-192-168-11-242 systemd-networkd[607]: ens5: Gained IPv6LL
Mar 10 16:05:01 ip-192-168-11-242 systemd-networkd[607]: ens6: DHCPv6 address 2600:1f16:67f:f200:25ac:a1c8:2d0d:ae87/128 (valid for 7min 29s, preferred for 2min 19s)
Mar 10 16:05:01 ip-192-168-11-242 systemd[1]: Finished systemd-networkd-wait-online.service - Wait for Network to be Configured.
Mar 10 16:05:02 ip-192-168-11-242 systemd-networkd[607]: ens5: DHCPv6 address 2600:1f16:67f:f200:9786:9334:1f2c:8d8/128 (valid for 7min 29s, preferred for 2min 19s)

root@ip-192-168-11-242:/home/ubuntu# networkctl status ens6
● 3: ens6
                   Link File: /usr/lib/systemd/network/99-default.link
                Network File: /run/systemd/network/10-netplan-ens6.network
                       State: routable (configured)
                Online state: online
                        Type: ether
                        Path: pci-0000:00:06.0
                      Driver: ena
                      Vendor: Amazon.com, Inc.
                       Model: Elastic Network Adapter (ENA)
           Alternative Names: enp0s6
                              enx06f07bf6b65f
            Hardware Address: 06:f0:7b:f6:b6:5f
                         MTU: 9001 (min: 128, max: 9216)
                       QDisc: mq
IPv6 Address Generation Mode: eui64
    Number of Queues (Tx/Rx): 2/2
                     Address: 192.168.12.6 (DHCPv4 via 192.168.0.1)
                              192.168.15.230
                              2600:1f16:67f:f200:25ac:a1c8:2d0d:ae87
                              fe80::4f0:7bff:fef6:b65f
                     Gateway: 192.168.0.1
                              fe80::4f8:4ff:feed:10b4
                         DNS: 192.168.0.2...

Read more...

description: updated
James Falcon (falcojr)
description: updated
Revision history for this message
James Falcon (falcojr) wrote (last edit ):

Plucky routes:
ubuntu@ip-192-168-11-242:~$ ip route
default via 192.168.0.1 dev ens5 proto dhcp src 192.168.11.242 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.6 metric 1003 mtu 9001
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.11.242 metric 100
192.168.0.0/20 dev ens6 proto dhcp scope link src 192.168.12.6 metric 1003 mtu 9001
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.11.242 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.12.6 metric 200
192.168.0.2 dev ens5 proto dhcp scope link src 192.168.11.242 metric 100
192.168.0.2 dev ens6 proto dhcp scope link src 192.168.12.6 metric 200

ubuntu@ip-192-168-11-242:~$ ip route show table 101
default via 192.168.0.1 dev ens6 proto static
192.168.0.0/20 dev ens6 proto static scope link

Oracular routes:
root@ip-192-168-1-227:/home/ubuntu# ip route
default via 192.168.0.1 dev ens5 proto dhcp src 192.168.1.227 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.51 metric 200
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.51 metric 1003 mtu 9001
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.1.227 metric 100
192.168.0.0/20 dev ens6 proto dhcp scope link src 192.168.12.51 metric 1003 mtu 9001
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.1.227 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.12.51 metric 200
192.168.0.2 dev ens5 proto dhcp scope link src 192.168.1.227 metric 100
192.168.0.2 dev ens6 proto dhcp scope link src 192.168.12.51 metric 200
root@ip-192-168-1-227:/home/ubuntu# ip route show table 101
default via 192.168.0.1 dev ens6 proto static
192.168.0.0/20 dev ens6 proto static scope link

It appears that Oracular contained an additional (seemingly unnecessary) route. After running
"ip route add default via 192.168.0.1 dev ens6 proto dhcp src 192.168.12.6 metric 200"
to make plucky the same as Oracular, I'm still seeing the same problem.

Revision history for this message
James Falcon (falcojr) wrote :

It appears Plucky is missing the proper rule:
root@ip-192-168-11-242:~# ip rule
0: from all lookup local
32765: from 192.168.15.230 lookup 101 proto static
32766: from all lookup main
32767: from all lookup default

Compared to Oracular:
root@ip-192-168-1-227:/usr/lib/systemd/network# ip rule
0: from all lookup local
32764: from 192.168.12.51 lookup 101 proto static
32765: from 192.168.2.219 lookup 101 proto static
32766: from all lookup main
32767: from all lookup default

Revision history for this message
Lukas Märdian (slyon) wrote :

Good find in comment #3. Adding this policy-based routing (PBR) rule manually seems to fix the issue:

# ip rule add from 192.168.12.6 lookup 101 proto static

But it's defined in Netplan config:
"""
      routing-policy:
      - table: 101
        from: "192.168.12.6"
"""

And rendered to systemd-networkd:
"""
[RoutingPolicyRule]
From=192.168.12.6
Table=101
"""

When removing the "use-routes" dhcp-overrides for ens6 and running "netplan apply" it seems to be working. Are those DHCP routes needed? This sounds like a race condition. Maybe the static routes/rules are set, but then dropped when the DHCPv4 response is received and routes/rules are reconfigured.

I wonder why this is not happening in Oracular, though. Can you confirm timing of DHCPv4 config vs static routes/rule config from systemd-networkd debug log?

Revision history for this message
James Falcon (falcojr) wrote (last edit ):

> When removing the "use-routes" dhcp-overrides for ens6 and running "netplan apply" it seems to be working.

The default for this is true, so unless the documentation is wrong, I don't think that changing this is actually doing anything.

There is certainly some racy behavior though. Using a script[1] to check connectivity of the IPs, I ran `netplan apply; check_ips.sh` several times without changing any configuration. Sometimes everything worked fine. Other times, I get a failure. This happens regardless of if `use-routes` is true or false.

E.g.,:
root@ip-192-168-11-242:/home/ubuntu# netplan apply; ./check_ips.sh
Checking 192.168.11.242 on port 22...
Connection to 192.168.11.242 22 port [tcp/ssh] succeeded!
Checking 18.116.64.168 on port 22...
Connection to 18.116.64.168 22 port [tcp/ssh] succeeded!
Checking 192.168.12.6 on port 22...
Connection to 192.168.12.6 22 port [tcp/ssh] succeeded!
Checking 3.147.240.160 on port 22...
Connection to 3.147.240.160 22 port [tcp/ssh] succeeded!
Checking 192.168.15.230 on port 22...
Connection to 192.168.15.230 22 port [tcp/ssh] succeeded!
Checking 3.146.111.120 on port 22...
nc: connect to 3.146.111.120 port 22 (tcp) timed out: Operation now in progress
root@ip-192-168-11-242:/home/ubuntu# netplan apply; ./check_ips.sh
Checking 192.168.11.242 on port 22...
Connection to 192.168.11.242 22 port [tcp/ssh] succeeded!
Checking 18.116.64.168 on port 22...
Connection to 18.116.64.168 22 port [tcp/ssh] succeeded!
Checking 192.168.12.6 on port 22...
Connection to 192.168.12.6 22 port [tcp/ssh] succeeded!
Checking 3.147.240.160 on port 22...
Connection to 3.147.240.160 22 port [tcp/ssh] succeeded!
Checking 192.168.15.230 on port 22...
Connection to 192.168.15.230 22 port [tcp/ssh] succeeded!
Checking 3.146.111.120 on port 22...
Connection to 3.146.111.120 22 port [tcp/ssh] succeeded!

[1] check_ips.sh:
#!/bin/bash

# 3 pairs of private ip followed by corresponding public ip
IPS=("192.168.11.242" "18.116.64.168" "192.168.12.6" "3.147.240.160" "192.168.15.230" "3.146.111.120")

# Define the port to check
PORT=22

# Loop through each IP and run the command
for IP in "${IPS[@]}"; do
    echo "Checking $IP on port $PORT..."
    nc -w 1 -zv "$IP" "$PORT"
done

Revision history for this message
James Falcon (falcojr) wrote :

networkd debug logs attached from after reboot on plucky

Revision history for this message
James Falcon (falcojr) wrote :

networkd debug logs attached from after reboot on oracular

Revision history for this message
James Falcon (falcojr) wrote :

Also adding netplan and systemd here as the rendered networkd config doesn't change, and the netplan apply call is racy.

James Falcon (falcojr)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.