netplan/systemd-networkd: route metric not applied to routes to the local subnet

Bug #2055397 reported by Alberto Contreras
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Invalid
Undecided
Unassigned
netplan.io (Ubuntu)
Invalid
Undecided
Unassigned
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Committed
Undecided
Unassigned

Bug Description

[SRU TEMPLATE]

[DESCRIPTION]

Cloud-init introduced a feature to configure policy routing on AWS EC2 instances with multiple NICs in
https://github.com/canonical/cloud-init/commit/0ca5f31043e2d98eab31a43d9dde9bdaef1435cb targeting v24.1.

Cloud-init generates the following netplan config:

```
$ cat /etc/netplan/50-cloud-init.yaml
network:
    ethernets:
        ens5:
            dhcp4: true
            dhcp4-overrides: &id001
                route-metric: 100
            dhcp6: true
            dhcp6-overrides: *id001
            match:
                macaddress: 0a:c8:ab:90:c2:fb
            set-name: ens5
        ens6:
            dhcp4: true
            dhcp4-overrides:
                route-metric: 200
                use-routes: true
            dhcp6: false
            match:
                macaddress: 0a:c6:55:a1:dc:3b
            routes:
            - table: 101
                to: 0.0.0.0/0
                via: 192.168.0.1
            - table: 101
                to: 192.168.0.0/20
            routing-policy:
            - from: 192.168.10.212
                table: 101
            set-name: ens6
    version: 2
```

Which renders the following systemd-networkd config files:

```
$ cat 10-netplan-ens5.link
[Match]
MACAddress=0a:c8:ab:90:c2:fb

[Link]
Name=ens5
WakeOnLan=off

$ cat 10-netplan-ens5.network
[Match]
MACAddress=0a:c8:ab:90:c2:fb
Name=ens5

[Network]
DHCP=yes
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true

$ cat 10-netplan-ens6.link
[Match]
MACAddress=0a:c6:55:a1:dc:3b

[Link]
Name=ens6
WakeOnLan=off

$ cat 10-netplan-ens6.network
[Match]
MACAddress=0a:c6:55:a1:dc:3b
Name=ens6

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6

[Route]
Destination=0.0.0.0/0
Gateway=192.168.0.1
Table=101

[Route]
Destination=192.168.0.0/20
Scope=link
Table=101

[RoutingPolicyRule]
From=192.168.10.212
Table=101

[DHCP]
RouteMetric=200
UseMTU=true
```

Which configures the instance with the following state in Ubuntu Focal:

```
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:c8:ab:90:c2:fb brd ff:ff:ff:ff:ff:ff
    inet 192.168.12.94/20 brd 192.168.15.255 scope global dynamic ens5
       valid_lft 2087sec preferred_lft 2087sec
    inet6 2a05:d012:ea0:c500:6d12:2b20:5fef:a502/128 scope global dynamic noprefixroute
       valid_lft 440sec preferred_lft 130sec
    inet6 fe80::8c8:abff:fe90:c2fb/64 scope link
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:c6:55:a1:dc:3b brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.212/20 brd 192.168.15.255 scope global dynamic ens6
       valid_lft 2083sec preferred_lft 2083sec
    inet6 fe80::8c6:55ff:fea1:dc3b/64 scope link
       valid_lft forever preferred_lft forever

$ ip route show
default via 192.168.0.1 dev ens5 proto dhcp src 192.168.12.94 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.10.212 metric 200
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.12.94
192.168.0.0/20 dev ens6 proto kernel scope link src 192.168.10.212
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.12.94 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.10.212 metric 200

$ ip rule show
0: from all lookup local
0: from 192.168.10.212 lookup 101
32766: from all lookup main
32767: from all lookup default

$ ip route show table 101
default via 192.168.0.1 dev ens6 proto static onlink
192.168.0.0/20 dev ens6 proto static scope link
```

The issue here is that the instance is not reachable from the same subnet via the private ipv4 of the primary NIC,
packets are routed to egress via ens6 and dropped.

The cause is that interface metrics are not applied to local subnet routes with systemd 245 (245.4-4ubuntu3.23).
On newer systemd versions, as in Jammy, the metrics are correctly applied.
Correcting them manually fixes the issue in Focal.

Expected main route table:

default via 192.168.0.1 dev ens5 proto dhcp src 192.168.12.94 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.10.212 metric 200
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.12.94 metric 100
192.168.0.0/20 dev ens6 proto kernel scope link src 192.168.10.212 metric 200
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.12.94 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.10.212 metric 200

It looks like the upstream systemd issue and PR fixing this problem are:

https://github.com/systemd/systemd/issues/928
https://github.com/systemd/systemd/pull/19344

[TESTING]

As described above.

[REGRESSION POTENTIAL]

The backport targets Focal.
The fixing patches are touching network related code, regression potential would regard networking part of systemd,
especially in address configuration.
In particualar:

* https://github.com/systemd/systemd/commit/aa550d2a51b025681ff8399e597338d35f540195
This patch adds sd_netlink_message_append_s* functions and types without modifying existing code.

* https://github.com/systemd/systemd/commit/0e7bb99ff919bf8e6030ab7c3c178b87caf166a2
This one just adds missing address types

* https://github.com/systemd/systemd/commit/c4ff0629dd450a40c5733b759eda08e6a032fae3
This one is adds the RouteMetric option for [Address]. Adds code to address_configure() function.

* https://github.com/systemd/systemd/commit/415deef9c3e97211c862f39aceabf8e1f1485a41#
This one adds the RouteMetric option to [DHCPv4]

[OTHER]

The upstream patches fixing this issue are the following :

https://github.com/systemd/systemd/commit/aa550d2a51b025681ff8399e597338d35f540195
https://github.com/systemd/systemd/commit/0e7bb99ff919bf8e6030ab7c3c178b87caf166a2
https://github.com/systemd/systemd/commit/c4ff0629dd450a40c5733b759eda08e6a032fae3
https://github.com/systemd/systemd/commit/415deef9c3e97211c862f39aceabf8e1f1485a41

They originate in PR [1] and backported for focal in MR [2].
There's also a test package in [3].

[1] https://github.com/systemd/systemd/pull/19344
[2] https://code.launchpad.net/~joalif/ubuntu/+source/systemd/+git/systemd/+ref/lp2055397
[3] https://launchpad.net/~joalif/+archive/ubuntu/systemd-focal

Related branches

no longer affects: systemd (Ubuntu)
no longer affects: systemd (Ubuntu)
no longer affects: cloud-init (Ubuntu Focal)
no longer affects: netplan.io (Ubuntu Focal)
Changed in systemd (Ubuntu):
status: New → Fix Released
Changed in cloud-init (Ubuntu):
status: New → Invalid
Revision history for this message
Alberto Contreras (aciba) wrote :

Marking cloud-init as invalid as I think there is no workaround to fix this issue.
Adding netplan.io for reference and awareness.

After confirmation from systemd-networkd, I wonder if it would be feasible / reasonable to backport https://github.com/systemd/systemd/pull/19344 to focal.

Revision history for this message
Alberto Contreras (aciba) wrote :

Attaching cloud-init logs, which include journal logs.

Revision history for this message
Alberto Contreras (aciba) wrote :

cloud-init upstream PR adding test coverage for this issue: https://github.com/canonical/cloud-init/pull/4982

Revision history for this message
Lukas Märdian (slyon) wrote :

Thanks for the heads-up for Netplan! IIUC this will be fixed by a systemd SRU, so closing it as "Invalid" for Netplan. Please re-open if you feel there is something to do on our side.

Changed in netplan.io (Ubuntu):
status: New → Invalid
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Alberto, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.24 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: New → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Alberto Contreras (aciba) wrote :
Download full text (3.3 KiB)

Thanks for the work here. I can confirm that systemd 245.4-4ubuntu-3.24 properly renders route metrics and fixes the routing issue that this bug refers to. Applying the following diff to cloud-init to install systemd's version from focal-proposed and make extra assertions results in the test passing, see attached log.

```diff
diff --git a/tests/integration_tests/modules/test_hotplug.py b/tests/integration_tests/modules/test_hotplug.py
index 8c7bc7839..82d4a2cd1 100644
--- a/tests/integration_tests/modules/test_hotplug.py
+++ b/tests/integration_tests/modules/test_hotplug.py
@@ -299,7 +299,6 @@ def test_multi_nic_hotplug(setup_image, session_cloud: IntegrationCloud):
         verify_clean_log(log_content)

-@pytest.mark.skipif(CURRENT_RELEASE <= FOCAL, reason="See LP: #2055397")
 @pytest.mark.skipif(PLATFORM != "ec2", reason="test is ec2 specific")
 def test_multi_nic_hotplug_vpc(setup_image, session_cloud: IntegrationCloud):
     """Tests that additional secondary NICs are routable from local
@@ -308,6 +307,19 @@ def test_multi_nic_hotplug_vpc(setup_image, session_cloud: IntegrationCloud):
     with session_cloud.launch(
         user_data=USER_DATA
     ) as client, session_cloud.launch() as bastion:
+ assert client.execute("""\
+sudo sh -c "echo 'deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed restricted main multiverse universe' >> /etc/apt/sources.list.d/proposed-repositories.list"
+ """).ok
+ assert client.execute("sudo apt update").ok
+ assert client.execute("sudo apt upgrade systemd -y").ok
+ systemd_resp = client.execute("apt policy systemd | grep Installed | cut -d ':' -f 2 | tr -d ' '")
+ assert systemd_resp.ok
+ assert systemd_resp.stdout == "245.4-4ubuntu3.24"
+
+ assert client.execute("cloud-init clean --logs")
+ client.restart()
+ wait_for_cloud_init(client)
+
         ips_before = _get_ip_addr(client)
         primary_priv_ip4 = ips_before[1].ip4
         primary_priv_ip6 = ips_before[1].ip6
@@ -343,18 +355,23 @@ def test_multi_nic_hotplug_vpc(setup_image, session_cloud: IntegrationCloud):
         assert r.ok, r.stdout
         r = bastion.execute(f"ping -c1 {secondary_priv_ip4}")
         assert r.ok, r.stdout
- r = bastion.execute(f"ping -c1 {primary_priv_ip6}")
+ r = bastion.execute(f"ping -c3 {primary_priv_ip6}")
         assert r.ok, r.stdout
- r = bastion.execute(f"ping -c1 {secondary_priv_ip6}")
+ r = bastion.execute(f"ping -c3 {secondary_priv_ip6}")
         assert r.ok, r.stdout

+ ip_route_show = client.execute("ip route show")
+ assert ip_route_show.ok, ip_route_show.stderr
+ for route in ip_route_show.splitlines():
+ assert "metric" in route, "Expected metric to be configured in route"
+
         # Remove new NIC
         client.instance.remove_network_interface(secondary_priv_ip4)
         _wait_till_hotplug_complete(client, expected_runs=2)

         # ping to primary NIC works
         assert bastion.execute(f"ping -c1 {primary_priv_ip4}").ok
- assert bastion.execute(f"ping -c1 {primary_priv_ip6}").ok
+ assert bastion.execute(f"ping -c3 {primary_...

Read more...

Revision history for this message
Alberto Contreras (aciba) wrote :

Cloud-init PR to re-enable the test: https://github.com/canonical/cloud-init/pull/5492

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.