Wait timeout for systemd-udevd worker process is too short

Bug #1297248 reported by Tetsuo Handa on 2014-03-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Undecided
Unassigned

Bug Description

Coming from https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1276705 .

Commit 786235ee in the upstream kernel made kthread_create() return
immediately upon SIGKILL. On several machines, device initialization
phase takes more than 30 seconds before kthread_create() is called.

Since systemd-udevd unconditionally sends SIGKILL upon hardcoded 30
seconds timeout, boot is failing upon such machines. The conclusion
of LKML discussion seems that we should fix systemd-udevd side
rather than fixing kthread_create() side because hardcoded 30 seconds
timeout is considered as a bug of systemd. Therefore, please consider
allowing configurable (or at least much longer) timeout for
systemd-udevd worker process.

Martin Pitt (pitti) wrote :

Can you please explain in more detail (perhaps with an example) how the kthread_create() change affects the killing of RUN/IMPORT rules? udev has always behaved that way to guard against hanging RUN programs. The boot can take much longer than 30 seconds, just every single RUN action should be much, much faster (otherwise it's written in a wrong way). In fact I consider 30 seconds way too overgenerous.

Thanks!

Changed in systemd (Ubuntu):
status: New → Incomplete
Martin Pitt (pitti) wrote :

Also, is that really the udev rules, or is that perhaps initramfs-tools' 30 second timeout for wait-for-root? ($ROOTDELAY in /usr/share/initramfs-tools/scripts/local)

All the detail is in Bug #1276705.

(1) Currently finit_module() of mptsas kernel module does need more than
    30 seconds to initialize LSI SAS1068E disk.

(2) Currently systemd-udevd unconditionally sends SIGKILL upon hardcoded
    30 seconds timeout. As a result, finit_module() of mptsas kernel
    module receives SIGKILL when waiting for error handler thread to be
    started.

(3) Before commit 786235ee was applied, finit_module() receiving SIGKILL
    was no problem because kthread_create() ignored SIGKILL when waiting
    for error handler thread to be started. But after commit 786235ee was
    applied, finit_module() receiving SIGKILL is a problem because
    kthread_create() no longer ignores SIGKILL when waiting for error
    handler thread to be started. As a result, finit_module() of mptsas
    kernel module failed to initialize LSI SAS1068E disk, leading to
    a boot failure.

Commit 786235ee was meant for helping OOM killer to terminate the victim
process immediately when the victim process is unable to be terminated
due to waiting for kthreadd process to complete memory allocation.

Kernel developers think that it is a systemd's bug because any thread
who received SIGKILL has a right to terminate immediately. Therefore,
reverting commit 786235ee is not acceptable for kernel developers.

On the other hand, systemd developers think that it is a kernel's bug
because finit_module() should return within 30 seconds. Therefore,
changing to longer timeout is not acceptable for systemd developers.

Since there was no time to wait for systemd to allow longer timeout,
Bug #1276705 used a SAUCE patch that allows kthread_create() to ignore
SIGKILL up to 10 seconds. We used a SAUCE patch for Ubuntu 14.04, but
we don't want to carry this SAUCE patch forever.

Launchpad Janitor (janitor) wrote :

[Expired for systemd (Ubuntu) because there has been no activity for 60 days.]

Changed in systemd (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers