Comment 0 for bug 2055397

Revision history for this message
Alberto Contreras (aciba) wrote :

Cloud-init introduced a feature to configure policy routing on AWS EC2 instances with multiple NICs in
https://github.com/canonical/cloud-init/commit/0ca5f31043e2d98eab31a43d9dde9bdaef1435cb targeting v24.1.

Cloud-init generates the following netplan config:

```
$ cat /etc/netplan/50-cloud-init.yaml
network:
    ethernets:
        ens5:
            dhcp4: true
            dhcp4-overrides: &id001
                route-metric: 100
            dhcp6: true
            dhcp6-overrides: *id001
            match:
                macaddress: 0a:c8:ab:90:c2:fb
            set-name: ens5
        ens6:
            dhcp4: true
            dhcp4-overrides:
                route-metric: 200
                use-routes: true
            dhcp6: false
            match:
                macaddress: 0a:c6:55:a1:dc:3b
            routes:
            - table: 101
                to: 0.0.0.0/0
                via: 192.168.0.1
            - table: 101
                to: 192.168.0.0/20
            routing-policy:
            - from: 192.168.10.212
                table: 101
            set-name: ens6
    version: 2
```

Which renders the following systemd-networkd config files:

```
$ cat 10-netplan-ens5.link
[Match]
MACAddress=0a:c8:ab:90:c2:fb

[Link]
Name=ens5
WakeOnLan=off

$ cat 10-netplan-ens5.network
[Match]
MACAddress=0a:c8:ab:90:c2:fb
Name=ens5

[Network]
DHCP=yes
LinkLocalAddressing=ipv6

[DHCP]
RouteMetric=100
UseMTU=true

$ cat 10-netplan-ens6.link
[Match]
MACAddress=0a:c6:55:a1:dc:3b

[Link]
Name=ens6
WakeOnLan=off

$ cat 10-netplan-ens6.network
[Match]
MACAddress=0a:c6:55:a1:dc:3b
Name=ens6

[Network]
DHCP=ipv4
LinkLocalAddressing=ipv6

[Route]
Destination=0.0.0.0/0
Gateway=192.168.0.1
Table=101

[Route]
Destination=192.168.0.0/20
Scope=link
Table=101

[RoutingPolicyRule]
From=192.168.10.212
Table=101

[DHCP]
RouteMetric=200
UseMTU=true
```

Which configures the instance with the following state in Ubuntu Focal:

```
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:c8:ab:90:c2:fb brd ff:ff:ff:ff:ff:ff
    inet 192.168.12.94/20 brd 192.168.15.255 scope global dynamic ens5
       valid_lft 2087sec preferred_lft 2087sec
    inet6 2a05:d012:ea0:c500:6d12:2b20:5fef:a502/128 scope global dynamic noprefixroute
       valid_lft 440sec preferred_lft 130sec
    inet6 fe80::8c8:abff:fe90:c2fb/64 scope link
       valid_lft forever preferred_lft forever
3: ens6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:c6:55:a1:dc:3b brd ff:ff:ff:ff:ff:ff
    inet 192.168.10.212/20 brd 192.168.15.255 scope global dynamic ens6
       valid_lft 2083sec preferred_lft 2083sec
    inet6 fe80::8c6:55ff:fea1:dc3b/64 scope link
       valid_lft forever preferred_lft forever

$ ip route show
default via 192.168.0.1 dev ens5 proto dhcp src 192.168.12.94 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.10.212 metric 200
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.12.94
192.168.0.0/20 dev ens6 proto kernel scope link src 192.168.10.212
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.12.94 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.10.212 metric 200

$ ip rule show
0: from all lookup local
0: from 192.168.10.212 lookup 101
32766: from all lookup main
32767: from all lookup default

$ ip route show table 101
default via 192.168.0.1 dev ens6 proto static onlink
192.168.0.0/20 dev ens6 proto static scope link
```

The issue here is that the instance is not reachable from the same subnet via the private ipv4 of the primary NIC,
packets are routed to egress via ens6 and dropped.

The cause is that interface metrics are not applied to local subnet routes with systemd 245 (245.4-4ubuntu3.23).
On newer systemd versions, as in Jammy, the metrics are correctly applied.
Correcting them manually fixes the issue in Focal.

Expected main route table:

default via 192.168.0.1 dev ens5 proto dhcp src 192.168.12.94 metric 100
default via 192.168.0.1 dev ens6 proto dhcp src 192.168.10.212 metric 200
192.168.0.0/20 dev ens5 proto kernel scope link src 192.168.12.94 metric 100
192.168.0.0/20 dev ens6 proto kernel scope link src 192.168.10.212 metric 200
192.168.0.1 dev ens5 proto dhcp scope link src 192.168.12.94 metric 100
192.168.0.1 dev ens6 proto dhcp scope link src 192.168.10.212 metric 200

It looks like the upstream systemd issue and PR fixing this problem are:

https://github.com/systemd/systemd/issues/928
https://github.com/systemd/systemd/pull/19344