networkd not applying config - missing events?

Bug #1775566 reported by Christian Ehrhardt  on 2018-06-07
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
netplan.io (Ubuntu)
Undecided
Unassigned
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

Hi,

TL;DR:
- networkd config written by netplan
- it seems we can eliminate netplan from this and still have the issue
- networkd seems to miss the event of the devices and therefore consider them unmanaged
- rebinding them makes it work
- the way to trigger this I found so far are q35 KVM guests (PCIe), but
  there might be more

---

I miss some hidden trigger of "netplan apply" to understand the following case.

I have kvm guests, you can spawn your own one to reproduce via:
 $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 release=bionic label=daily
 $ uvt-kvm create --password ubuntu bionic-netplan arch=amd64 release=bionic label=daily

So far all is good, but I wanted to run on q35 type guests (that means PCIe instead of PCI) based and more modern. To do so:
1. shut down your guest
2. run virsh edit bionic-netplan
2.1 replace pc-i440fx-bionic -> pc-q35-bionic
2.2 replace pci-root with pcie-root
2.3 replace piix3-uhci -> piix4-uhci
3. start the guest again
   virsh start bionic-netplan

It won't get network connection, this is where I started debugging.
I thought the devices might be wrong now or anything like it, but it is more puzzling.

First I realized that the device names changed from ens3 -> enp0s3 (the kernel naming).
So I thought this entry might have a problem:
     ethernets:
        ens3:
            dhcp4: true
            match:
                macaddress: 52:54:00:68:4b:62

I tried to name these enp0s3 to match,but it didn't matter and also according to the netplan man page:
   If there are match: rules, then the ID field is a purely opaque name which is only being used
   for references from definitions of compound devices in the config

And I found it works just fine when I run "sudo netplan apply".

This was odd, so to summarize up to here:
- PCIe based virt guest
- netplan egenrated config not working after (re)boot
- "netplan apply" makes it working

I disabled any cloud init things as recommended by the comment
  /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg
  network: {config: disabled}
So that I can rely on my netplan yaml to stay as is.
I tried various things but so far can't find what magic "netplan apply" does which is missing to my boot.

I checked after reboot the devices are considered unmanaged by networkctl
$ networkctl list
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 enp0s3 ether off unmanaged

But the config was generated:
$ ll /run/systemd/network/
total 32
drwxr-xr-x 2 root root 200 Jun 7 09:02 ./
drwxr-xr-x 22 root root 500 Jun 7 09:02 ../
-rw-r--r-- 1 root root 69 Jun 7 09:02 10-netplan-enp0s3.link
-rw-r--r-- 1 root root 104 Jun 7 09:02 10-netplan-enp0s3.network

I checked the log and saw that apply restarts networkd.
So I thought might just restart networkd, so I ran
 $ sudo systemctl restart systemd-networkd.service
But things stayed as-is without the config being picked up.

With some nice discussion and help on IRC I also tried to disable netplan and check if this is networkd only.
# make this static networkd
$ sudo cp /run/systemd/network/10-netplan-enp0s* /etc/systemd/network/
# no netplan config
$ sudo mv /etc/netplan/* /root

That was supposed to show if networkd itself (or its config files) had issues.
And with that it still did not work, so is the error in networkd instead?
If so what magic thing does "netplan apply" do to fix it?

Since it might be networkd as well I added a bug task for it.

Even "netplan apply" with NO yaml file fixes it.
So running in the static networkd config (with the files I copied formerly created by netplan as outlied above).

Then with debug enabled it does:
$ networkctl list
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 enp0s3 ether off unmanaged

$ sudo netplan --debug apply
DEBUG:no netplan generated networkd configuration exists
DEBUG:no netplan generated NM configuration exists
DEBUG:replug enp0s3: unbinding virtio0 from /sys/bus/virtio/drivers/virtio_net
DEBUG:replug enp0s3: rebinding virtio0 to /sys/bus/virtio/drivers/virtio_net

$ networkctl list
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  8 enp3 ether routable configured

So I did after a reboot:
echo virtio0 | sudo tee /sys/bus/virtio/drivers/virtio_net/virtio0/driver/unbind
echo virtio0 | sudo tee /sys/bus/virtio/drivers/virtio_net/bind

$ networkctl list
IDX LINK TYPE OPERATIONAL SETUP
  6 enp3 ether routable configured

So it is working after the replug - adn this replug was the magic done by "netplan apply".
Now why this fails with just a static networkd config - I don't know.
But I'd say it is a networkd bug right?

The former event existed on boot
[ 0.726055] virtio_net virtio0 enp0s3: renamed from eth0
This is the one networkd should have picked up.
[ 481.972268] virtio_net virtio0 enp3: renamed from eth0
[ 481.983710] IPv6: ADDRCONF(NETDEV_UP): enp3: link is not ready
[ 482.976135] IPv6: ADDRCONF(NETDEV_CHANGE): enp3: link becomes ready

I don't know where/how to adress this further, but I hope the repro and debug helps to fix this.

summary: - networkd not applying config written by netplan on boot in q35 (PCIe)
- KVM guests
+ networkd not applying config - missing events?
description: updated

I thought I might have oversimplified PCIe setup for q35 too much when I tried to compare to i440fx (where this works AFAICS).

But adding libvirts default devices for this case did not change anything.
For documentation purposes still here the difference how to add those as attachment.

The easiest way thou is to let libvirt do all that.
To do so copy the uvtool template
 $ cp /usr/share/uvtool/libvirt/template.xml q35-template.xml
 # Then edit the type line to be
 <type arch='x86_64' machine='pc-q35-bionic'>hvm</type>

When spawning with uvtool and this template you'll get the full proper defaults.
 $ uvt-kvm create --template /home/paelzer/q35-template.xml --password ubuntu bionic-q35-normal-extended arch=amd64 release=bionic label=daily

For this bug here it makes no difference, with the full set of pci roots and stuff available the event is still missed.
(the extended setup is required for e..g hotplugging)

Since we ruled out netplan, mark it invalid.
Focus on systemd-networkd for now.

Changed in netplan.io (Ubuntu):
status: New → Invalid
tags: added: patch

I identified that systems I spawn as q35 right away work.
Only those where the initial cloud init runs as i440fx and then I change them to q35 are affected.

I compared configurations and eventually had even my converted instance running, to the point that it was hard to tell why.

I found that the root cause is this difference in config:
BAD:
network:
    version: 2
    ethernets:
        enp3:
            dhcp4: true
            match:
                macaddress: 52:54:00:ba:23:d6
            set-name: enp3

GOOD:
network:
    version: 2
    ethernets:
        enp1s0:
            dhcp4: true
            match:
                macaddress: 52:54:00:ba:23:d6
            set-name: enp1s0

Now yes, the device is enp1s0 at the moment:
$ dmesg | grep enp
[ 0.898280] virtio_net virtio0 enp1s0: renamed from eth0

But according to the netplan spec this should not matter right?
It has a match so the upper name is just an id.
And set-name can be whatever we want.

Bad case networkd files:
$ tail /run/systemd/network/10-netplan-enp3.*
==> /run/systemd/network/10-netplan-enp3.link <==
[Match]
MACAddress=52:54:00:ba:23:d6

[Link]
Name=enp3
WakeOnLan=off

==> /run/systemd/network/10-netplan-enp3.network <==
[Match]
MACAddress=52:54:00:ba:23:d6
Name=enp3

[Network]
DHCP=ipv4

[DHCP]
UseMTU=true
RouteMetric=100

Good case networkd files:
$ tail /run/systemd/network/10-netplan-enp1s0.*
==> /run/systemd/network/10-netplan-enp1s0.link <==
[Match]
MACAddress=52:54:00:ba:23:d6

[Link]
Name=enp1s0
WakeOnLan=off

==> /run/systemd/network/10-netplan-enp1s0.network <==
[Match]
MACAddress=52:54:00:ba:23:d6
Name=enp1s0

[Network]
DHCP=ipv4

[DHCP]
UseMTU=true
RouteMetric=100

Cloud init generating them right the first time makes it work.
If we go in with the old names it fails, while according to the man page it should not.
So is it an issue in netplan or more in networkd I'm not sure - leaving it to the owners of the packages.

Changed in netplan.io (Ubuntu):
status: Invalid → New
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netplan.io (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers