systemd-resolved switches primary interface for name resolution after suspend/resume cycle

Bug #2060778 reported by Csillagszemű Pityesz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Netplan
Invalid
Undecided
Unassigned
systemd (Ubuntu)
New
Undecided
Unassigned

Bug Description

Hardware/network: PC with single NIC, multiple vlans:
 - untagged: default LAN, should be used by the host
 - vlan 15: VM network, should only be used by the VM(s) running on the host

Netplan config:

network:
  version: 2
  renderer: networkd
  ethernets:
    lan:
      match:
        macaddress: "XX:XX:XX:XX:XX:XX"
      set-name: lan
      mtu: 9000
      dhcp4: yes
      dhcp6: yes
      ipv6-privacy: true
  bridges:
    vm-br0:
      dhcp4: yes
      interfaces: [vm]
      dhcp4-overrides:
        route-metric: 200
  vlans:
    vm:
      id: 15
      link: lan

(Using networkd as the renderer some apps [App Store, Settings/Online Accounts,...] thinks I'm offline in ubuntu.)
The main problem is with this setup is that after resume the route metrics get mixed up, and the host tries to use the VM network as its default route.
Adding a 'dhcp4-overrides: {route-metric: 10}' stanza to LAN - as the netplan documentation suggests - results in the interfaces not coming up. (Issuing 'netplan try' results in 'Warning: The unit file, source configuration file or drop-ins of netplan-ovs-cleanup.service changed on disk. Run 'systemctl daemon-reload' to reload units.'.)

Revision history for this message
Csillagszemű Pityesz (hippi-viking) wrote (last edit ):

Sorry, this was overlooked on my side, it is not a bug:

'If both dhcp4 and dhcp6 are true, the networkd back end requires that dhcp4-overrides and dhcp6-overrides contain the same keys and values. If the values do not match, an error will be shown and the network configuration will not be applied.'

Though the original (potentially nasty) issue still persists (even after setting route metrics manually): route metrics are ignored after resume, so routes are mixed up, and a bad default is used!

(So is the 'offline-when-using-networkd' issue.)

Revision history for this message
Csillagszemű Pityesz (hippi-viking) wrote :

Update: adding a 'dhcp4-overrides: {use-routes: false}' stanza to 'vm-br0' correctly omits the default route being created on the host. BUT still after resume when trying to ping the upstream router the host tries to do that through 'vm-br0' even though the default route says it should do it through 'lan'!
Manually running 'sudo netplan apply' after resume of course solves the issue, but shouldn't it be working correctly without manual intervention - which is not feasible - in the first place? Is this netplan or systemd trying to cut corners again?

Revision history for this message
Lukas Märdian (slyon) wrote :

Are you using NetworkManager in parallel here? Could you please provide the output of `nmcli dev`? The applications you mention (App Store, Settings/Online Accounts,...) seem to be Desktop centric and might rely on NetworkManager functionality for the connectivity checker, which systemd-networkd might not be able to provide.

Changed in netplan:
status: New → Incomplete
Revision history for this message
Csillagszemű Pityesz (hippi-viking) wrote :

As far as I know I am not using NetworkManager, 'nmcli dev' shows the following:

DEVICE TYPE STATE CONNECTION
virbr0 bridge nem kezelt --
vm-br0 bridge nem kezelt --
lan ethernet nem kezelt --
lo loopback nem kezelt --
iscsi vlan nem kezelt --
vm vlan nem kezelt --

('nem kezelt' means not managed; the 'iscsi' interface is a statically configured vlan device to access the VM storage backend)

My usecase is somewhat special as I use the desktop distribution with a somewhat more complicated setup than usual because I need to use networked VMs for my work. I am using networkd as I thought it would be more appropriate than NetworkManager.

Upon further investigation to the original matter I believe it is really an issue with systemd-resolved instead of netplan, so sorry for taking your time. My local router resolves hostnames differently for the host and the VM networks. After startup or issuing a 'netplan apply' after resume the resolver for the host ('lan' network) is being used as primary, BUT after resume not issuing a 'netplan apply' command somehow the resolver for the VM network takes precedence - which is bad. Then the host will obviously try to access the 'misresolved' systems through the wrong gateway, which is really only the symptom not the cause.

Revision history for this message
Lukas Märdian (slyon) wrote :

Thanks for the additional details!

In this case the output of "resolvectl" after "netplan apply" and after resume (no "netplan apply") might be useful. In addition to debug-logs of your systemd-resolved

$ sudo systemctl edit systemd-resolved

Adding:
```
[Service]
Environment=SYSTEMD_LOG_LEVEL=debug
```

$ sudo systemctl daemon-reload
$ sudo systemctl restart systemd-resolved # (or reboot)

Afterwards:
$ journalctl -u systemd-resolved # (for a full "netplan apply", suspend, resume cycle)

Changed in netplan:
status: Incomplete → Invalid
summary: - netplan, multiple dhcp route with metric failure
+ systemd-resolved switches primary interface for name resolution after
+ suspend/resume cycle
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.