rtkit fails to start in focal "pthread_create failed: Resource temporarily unavailable"

Bug #1871543 reported by Christian Ehrhardt 
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
rtkit (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I was debugging something else and found rtkit broken on my system.

Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Successfully called chroot.
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Successfully dropped privileges.
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Successfully limited resources.
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: pthread_create failed: Resource temporarily unavailable
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Canary thread running.
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Exiting canary thread.
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Demoting known real-time threads.
Apr 08 06:09:22 Keschdeichel rtkit-daemon[1726502]: Demoted 0 threads

That state is reproducible through restarts.
I haven't looked any further yet.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: rtkit 0.12-4
ProcVersionSignature: Ubuntu 5.3.0-45.37~18.04.1-generic 5.3.18
Uname: Linux 5.3.0-45-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu24
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Apr 8 07:59:58 2020
InstallationDate: Installed on 2018-10-12 (543 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
SourcePackage: rtkit
UpgradeStatus: Upgraded to focal on 2020-04-03 (4 days ago)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The issue is on my workstation which might have all kind of config history.
On a fresh focal system it looks like "default off, start works"

root@f:~# systemctl status rtkit-daemon
● rtkit-daemon.service - RealtimeKit Scheduling Policy Service
     Loaded: loaded (/lib/systemd/system/rtkit-daemon.service; disabled; vendor preset: enabled)
     Active: inactive (dead)

root@f:~# systemctl start rtkit-daemon.service
root@f:~# systemctl status rtkit-daemon
● rtkit-daemon.service - RealtimeKit Scheduling Policy Service
     Loaded: loaded (/lib/systemd/system/rtkit-daemon.service; disabled; vendor preset: enabled)
     Active: active (running) since Wed 2020-04-08 06:12:27 UTC; 1s ago
   Main PID: 76699 (rtkit-daemon)
      Tasks: 3 (limit: 38284)
     Memory: 684.0K
     CGroup: /system.slice/rtkit-daemon.service
             └─76699 /usr/libexec/rtkit-daemon

Apr 08 06:12:27 f systemd[1]: Starting RealtimeKit Scheduling Policy Service...
Apr 08 06:12:27 f systemd[1]: Started RealtimeKit Scheduling Policy Service.
Apr 08 06:12:27 f rtkit-daemon[76699]: Successfully called chroot.
Apr 08 06:12:27 f rtkit-daemon[76699]: Successfully dropped privileges.
Apr 08 06:12:27 f rtkit-daemon[76699]: Successfully limited resources.
Apr 08 06:12:27 f rtkit-daemon[76699]: Running.
Apr 08 06:12:27 f rtkit-daemon[76699]: Canary thread running.
Apr 08 06:12:27 f rtkit-daemon[76699]: Watchdog thread running.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Reproducible:
$ sudo /usr/libexec/rtkit-daemon --stderr
rtkit-daemon[1764664]: Successfully called chroot.
rtkit-daemon[1764664]: Successfully dropped privileges.
rtkit-daemon[1764664]: Successfully limited resources.
rtkit-daemon[1764664]: pthread_create failed: Resource temporarily unavailable
rtkit-daemon[1764664]: Canary thread running.
rtkit-daemon[1764664]: Exiting canary thread.
rtkit-daemon[1764664]: Demoting known real-time threads.
rtkit-daemon[1764664]: Demoted 0 threads.

It seems I can "fix" it by adding: --no-drop-privileges

So does it drop more privs on my system than on those where it works?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Start through dbus fails the same way.
$ rtkitctl --start

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Breakpoint 2, start_canary () at rtkit-daemon.c:2300
2300 if (start_canary() < 0)
(gdb) n
1670 if ((canary_fd = eventfd(0, EFD_NONBLOCK|EFD_CLOEXEC)) < 0 ||
(gdb)
1677 if ((r = -pthread_create(&canary_thread_id, NULL, canary_thread, NULL)) < 0 ||
(gdb)

Breakpoint 1, __pthread_create_2_1 (newthread=0x555555565178 <canary_thread_id>, attr=0x0, start_routine=0x55555555abd0 <canary_thread>, arg=0x0) at pthread_create.c:625
625 pthread_create.c: No such file or directory.
(gdb) fin
Run till exit from #0 __pthread_create_2_1 (newthread=0x555555565178 <canary_thread_id>, attr=0x0, start_routine=0x55555555abd0 <canary_thread>, arg=0x0) at pthread_create.c:625
[New Thread 0x7ffff7aa3700 (LWP 1795548)]
rtkit-daemon[1795542]: Canary thread running.
0x00005555555584bf in start_canary () at rtkit-daemon.c:1677
1677 if ((r = -pthread_create(&canary_thread_id, NULL, canary_thread, NULL)) < 0 ||
Value returned is $1 = 0
(gdb) n

Thread 1 "rtkit-daemon" hit Breakpoint 1, __pthread_create_2_1 (newthread=0x555555565170 <watchdog_thread_id>, attr=0x0, start_routine=0x55555555a800 <watchdog_thread>, arg=0x0)
    at pthread_create.c:625
625 pthread_create.c: No such file or directory.
(gdb) fin
Run till exit from #0 __pthread_create_2_1 (newthread=0x555555565170 <watchdog_thread_id>, attr=0x0, start_routine=0x55555555a800 <watchdog_thread>, arg=0x0) at pthread_create.c:625
0x00005555555584de in start_canary () at rtkit-daemon.c:1677
1677 if ((r = -pthread_create(&canary_thread_id, NULL, canary_thread, NULL)) < 0 ||
Value returned is $2 = 11

It doesn't even reach watchdog_thread() it fails to create the thread before that.

Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks Christian for the bug report. I haven't see that issue reported before. The rtkit user is correctly available on your system? Do you have any warning/hint in the journal maybe?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Similar:
https://superuser.com/questions/1440725/rtkit-fails-to-start-on-reboot
=> but I have no docker (nor runc/containerd)
https://bbs.archlinux.org/viewtopic.php?id=230079
=> re-install doesn't help for me (also I have no user issue)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It is the raw clone call that fails:
101 if (__glibc_unlikely (ARCH_CLONE (&start_thread, STACK_VARIABLES_ARGS,
102 »···»···»···»··· clone_flags, pd, &pd->tid, tp, &pd->tid)
103 »···»···»···== -1))
104 return errno;

That returns 11

Limits are the same in good & bad case
$ sudo prlimit --pid 1839368
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size 0 unlimited bytes
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size 0 0 bytes
LOCKS max number of file locks held unlimited unlimited locks
MEMLOCK max locked-in-memory address space 0 0 bytes
MSGQUEUE max bytes in POSIX mqueues 0 0 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 50 50 files
NPROC max number of processes 3 3 processes
RSS max resident set size unlimited unlimited bytes
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks 200000 200000 microsecs
SIGPENDING max number of pending signals 127615 127615 signals
STACK max stack size 8388608 unlimited bytes

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Knowing that --no-drop-privileges is related I was breaking that into sub-sections.

I set -O0 for better debugging.
Then I dropped code of the drop-priv section.

This section is it:
1755 if (setgroups(0, NULL) < 0 ||
1756 setresgid(pw->pw_gid, pw->pw_gid, pw->pw_gid) < 0 ||
1757 setresuid(pw->pw_uid, pw->pw_uid, pw->pw_uid) < 0) {
1758 r = -errno;
1759 syslog(LOG_ERR, "Failed to become %s: %s\n", username, strerror(errno));
1760 return r;
1761 }

All three work:
__GI_setgroups (n=0, groups=0x0)
__GI___setresgid (rgid=114, egid=114, sgid=114)
__GI___setresuid (ruid=109, euid=109, suid=109)

IDs match:

$ id 109
uid=109(rtkit) gid=114(rtkit) groups=114(rtkit)

So it is actually indeed user-related like https://bbs.archlinux.org/viewtopic.php?id=230079 was.

I see no difference in the user in my test system vs the failing one.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This would work as well as a workaround:
  --user-name root

setgroups and setresgid are safe, it is the user set via
  setresuid
that makes it fail eventually.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

$ sudo userdel --remove rtkit
$ id rtkit
id: ‘rtkit’: no such user
$ sudo apt install --reinstall rtkit
$ id rtkit
uid=109(rtkit) gid=114(rtkit) groups=114(rtkit)

The issue does not go away by removing and recreating the user.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was playing with different similar users (system user with nologin)
  $ for u in $(grep nologin /etc/passwd | cut -d':' -f 1); do echo Trying $u; /tmp/rtkit-0.12/rtkit-daemon --stderr --user-name $u; done

Working:
daemon bin sys games man lp mail news uucp proxy www-data backup list irc gnats nobody dnsmasq cups-pk-helper saned pulse geoclue nm-openvpn nm-openconnect Debian-exim radvd sanlock gluster statd tcpdump systemd-coredump

Failing:
systemd-network systemd-resolve syslog messagebus _apt uuidd avahi-autoipd usbmux kernoops avahi colord libvirt-qemu libvirt-dnsmasq mpd apt-cacher-ng _rpc systemd-timesync rtkit

Now what differentiates those two groups ?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I compared e.g. systemd-coredump vs rtkit.
Similar user add calls:

rtkit has:
--disabled-password

I recreated it without, and it made no difference.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

A full system restart resolved the issue for me, but thereby also removed my chances to debug further. I hope the bug gives others that might hit it as well a head start.

For now I'm marking it incomplete

Changed in rtkit (Ubuntu):
status: New → Incomplete
Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

I can reliably reproduce this:
1. Inside several aarch64 systemd-machined containers (with PrivateUsers=0 in nspawn conf) with focal and running Gnome Shell session (accessed via TigerVNC server). Hosts in NanoPC-T4.
2. On one bare metal NanoPC-T4 host with focal, running Gnome Shell Wayland session. No container/virtualization stuff, only bare metal.

Symptoms looks the same?

~$ sudo /usr/libexec/rtkit-daemon --stderr
rtkit-daemon[5786]: Successfully called chroot.
rtkit-daemon[5786]: Successfully dropped privileges.
rtkit-daemon[5786]: Successfully limited resources.
rtkit-daemon[5786]: pthread_create failed: Resource temporarily unavailable
rtkit-daemon[5786]: Demoting known real-time threads.
rtkit-daemon[5786]: Demoted 0 threads.

With --no-drop-privileges it works:
~$ sudo /usr/libexec/rtkit-daemon --stderr --no-drop-privileges
rtkit-daemon[5781]: Successfully called chroot.
rtkit-daemon[5781]: Successfully limited resources.
rtkit-daemon[5781]: Running.
rtkit-daemon[5781]: Canary thread running.
rtkit-daemon[5781]: Watchdog thread running.

However, in both cases I have to use Armbian kernel.
For the record, on RPi3B running generic Ubuntu 20.04 aarch64 image (boot via UEFI) this issue is not reproducible.

I can help with further debug here, or my issue is different and I have to go to Armbian forum? In the former case what exactly I need to de to help debug this?

Changed in rtkit (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I have not seen it again in any environment since then.
Interesting that you hit it in such a specific environment.

I'd have expected that it might be incompatible components.
Do you have other things than the kernel from outside the Ubuntu Archive?

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

> Do you have other things than the kernel from outside the Ubuntu Archive?

Not much, just two packages:

uboot bootloader: https://apt.armbian.com/pool/main/l/linux-u-boot-nanopct4-current/linux-u-boot-current-nanopct4_20.02.7_arm64.deb
board-specific configs and various tools, such as monitoring tool, installer, etc.: https://apt.armbian.com/pool/main/l/linux-focal-root-current-nanopct4/linux-focal-root-current-nanopct4_20.02.7_arm64.deb (take a peek inside to get idea).

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

Out of curiosity I tried to reproduce this on other SBC - NanoPi-M1 which also running Armbian. Interestingly, it's not reproducible on this board.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.