I think the underlying problem is improper fragmentation of netlink messages sent to the WireGuard device by systemd v237 in the set_wireguard_interface function:
Appending netlink message data can fail if the message size limit has been exceeded. This can happen if there are too many peers or ip masks in the netdev file, and the v237 code doesn't seem to handle this properly. It's supposed to split the data up into message fragments, but instead it can end up writing incoherent data to the netlink socket or end up in an infinite loop.
This issue was fixed in systemd v241 by reworking the code over a few commits:
> The idea is that netlink's messages are limited in size. If an interface has many peers, addresses or ip masks then the configuration might not fit into one message and has to be split across different messages.
yuwata on 2019-01-15:
> Yeah. I guess there was some bug in the cancellation logic, and it causes infinite loop with the magic number 23.
I think the underlying problem is improper fragmentation of netlink messages sent to the WireGuard device by systemd v237 in the set_wireguard_ interface function:
https:/ /github. com/systemd/ systemd/ blob/v237/ src/network/ netdev/ wireguard. c#L107
Appending netlink message data can fail if the message size limit has been exceeded. This can happen if there are too many peers or ip masks in the netdev file, and the v237 code doesn't seem to handle this properly. It's supposed to split the data up into message fragments, but instead it can end up writing incoherent data to the netlink socket or end up in an infinite loop.
This issue was fixed in systemd v241 by reworking the code over a few commits:
https:/ /github. com/systemd/ systemd/ pull/11418 /github. com/systemd/ systemd/ pull/11580 (this fixed issues with the first PR)
https:/
I found some comments (now resolved) on one of the commits illuminating:
https:/ /github. com/systemd/ systemd/ pull/11418/ commits/ e1f717d4a02e15a e11a191dd4962b2 f4d117678d
Mic92 on 2019-01-15:
> The idea is that netlink's messages are limited in size. If an interface has many peers, addresses or ip masks then the configuration might not fit into one message and has to be split across different messages.
yuwata on 2019-01-15:
> Yeah. I guess there was some bug in the cancellation logic, and it causes infinite loop with the magic number 23.
The infinite loop with 23 peers yuwata mentions is a reference to Leonid's bug report from January: /bugs.launchpad .net/ubuntu/ +source/ systemd/ +bug/1811149
https:/
I expect that backporting these fixes from v241 to bionic's systemd v237 branch would resolve both my issue and the issue reported by Leonid.
I realize this is a non-trivial change and there's a regression risk.