netplan causes unresponsive system with certain nsswitch config

Bug #2071747 reported by Adam Saponara
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
netplan.io (Ubuntu)
Triaged
High
Unassigned

Bug Description

A recent patch appears to chown networkd-related files to `root:systemd-network`[1]. If nsswitch.conf is configured with `group: systemd files`, this appears to create a circular dependency as systemd relies on netplan via systemd-networkd. On the next `systemctl daemon-reload`, pid 1 invokes netplan, netplan queries systemd for group info of `systemd-network`, but systemd cannot respond yet as it's waiting on netplan. Any programs making libc calls that nsswitch to systemd during this time are blocked. Something in systemd eventually SIGTERMs netplan after ~45s.

Here is an strace log of pid 1 during a reload illustrating the problem:

```
30854<(sd-executor)> 1719955780.753479 <... waitid resumed>{si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30866, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WEXITED, NULL) = 0 <0.022023>
30854<(sd-executor)> 1719955780.753519 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30866, si_uid=0, si_status=0, si_utime=1, si_stime=3} ---
30854<(sd-executor)> 1719955780.753561 waitid(P_PID, 30856<friendly-recove>, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30856, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WEXITED, NULL) = 0 <0.000039>
30854<(sd-executor)> 1719955780.753646 waitid(P_PID, 30868<systemd-rc-loca>, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30868, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WEXITED, NULL) = 0 <0.000023>
30854<(sd-executor)> 1719955780.753711 waitid(P_PID, 30869<systemd-run-gen>, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30869, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WEXITED, NULL) = 0 <0.000023>
30854<(sd-executor)> 1719955780.753773 waitid(P_PID, 30861<systemd-bless-b>, {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30861, si_uid=0, si_status=0, si_utime=0, si_stime=0}, WEXITED, NULL) = 0 <0.000022>
30854<(sd-executor)> 1719955780.753840 waitid(P_PID, 30858<netplan>, <unfinished ...>
30858<netplan> <snip> (netplan looking up systemd-network group)
30858<netplan> 1719955825.602714 sendto(4<UNIX-STREAM:[182429]>, "{\"method\":\"io.systemd.UserDatabase.GetMemberships\",\"parameters\":{\"groupName\":\"systemd-network\",\"service\":\"io.systemd.DynamicUser\"},\"more\":true}\0", 144, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0)>
30858<netplan> 1719955825.602771 epoll_ctl(5<anon_inode:[eventpoll]>, EPOLL_CTL_MOD, 4<UNIX-STREAM:[182429]>, {events=EPOLLIN, data={u32=3132670720, u64=106458192069376}}) = 0 <0.000010>
30858<netplan> 1719955825.602823 epoll_wait(5<anon_inode:[eventpoll]>, [], 8, 0) = 0 <0.000010>
30858<netplan> 1719955825.602859 brk(0x60d2babee000) = 0x60d2babee000 <0.000017>
30858<netplan> 1719955825.602901 recvfrom(4<UNIX-STREAM:[182429]>, 0x60d2babad2e0, 131080, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000011>
30858<netplan> 1719955825.602951 epoll_wait(5<anon_inode:[eventpoll]>, <unfinished ...>
30854<(sd-executor)> 1719955870.201033 <... waitid resumed>0x7fffaec9c570, WEXITED, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <89.447162>
30854<(sd-executor)> 1719955870.201147 --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
30854<(sd-executor)> 1719955870.201607 +++ killed by SIGALRM +++
30858<netplan> 1719955870.201625 <... epoll_wait resumed>0x60d2baba48b0, 8, -1) = -1 EINTR (Interrupted system call) <44.598663>
30858<netplan> 1719955870.201670 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=30854, si_uid=0} ---

```

Changing nsswitch.conf to `group: files systemd` or removing systemd fixes the problem.

Note this is not resolved by the patch added for a recent similar bug[2].

[1] https://git.launchpad.net/~ubuntu-core-dev/netplan/+git/ubuntu/tree/debian/patches/lp2065738/0013-libnetplan-use-more-restrictive-file-permissions.patch?h=ubuntu-jammy&id=6836c2bf27a209090ed9eb2c3deceb4cb2c9d85c#n88

[2] https://bugs.launchpad.net/ubuntu/+source/netplan.io/+bug/2071333

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in netplan.io (Ubuntu):
status: New → Confirmed
Lukas Märdian (slyon)
Changed in netplan.io (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → High
tags: added: foundations-todo
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.