Comment 39 for bug 1276705

Revision history for this message
Tetsuo Handa (9-launchpad-i-love-sakura-ne-jp) wrote :

That return statement is called only when wait_for_completion_killable()
returned an error. That is, the caller received SIGKILL while waiting for
kthreadd to create a kernel thread.

That matches your bisection result because commit 786235ee changed to return to
the caller when the caller received SIGKILL in order to allow the OOM killer to
kill the process waiting for kthreadd to create a kernel thread.
The changelog which I expected for that commit is shown below.

----------
[PATCH] kthread: Make kthread_create() killable.

Any user process callers of wait_for_completion() except global init process
might be chosen by the OOM killer while waiting for completion() call by some
other process which does memory allocation.

When such users are chosen by the OOM killer when they are waiting for
completion() in TASK_UNINTERRUPTIBLE, the system will be kept stressed
due to memory starvation because the OOM killer cannot kill such users.

kthread_create() is one of such users and this patch fixes the problem for
kthreadd by making kthread_create() killable.

Signed-off-by: Tetsuo Handa <email address hidden>
Cc: Oleg Nesterov <email address hidden>
Acked-by: David Rientjes <email address hidden>
Signed-off-by: Andrew Morton <email address hidden>
----------

I think there are two problems listed below.

  (a) Somebody is sending SIGKILL to the caller of kthread_create().

        Somebody is "systemd" waited for timeout?
        The caller is "PID: 9847 Comm: systemd-udevd" ?

  (b) Error handling of the caller of kthread_create() is wrong.

        mptsas_probe() calls mptsas_remove() when
        scsi_host_alloc() returned NULL due to receiving SIGKILL.

        But mptsas_remove() assumes that "ioc->sh = sh;" was already called
        with sh != NULL which means scsi_host_alloc() did not return NULL.

        scsi_host_alloc() can return NULL when kzalloc() returned NULL.
        In other words, the caller of scsi_host_alloc() must be prepared for
        scsi_host_alloc() returning NULL even if the caller did not receive
        SIGKILL while waiting for kthreadd to create a kernel thread.

Therefore, I don't think reverting commit 786235ee is appropriate because
the problem will again happen when kzalloc() in scsi_host_alloc() fails.