Kworker process stuck in uninterruptible sleep

Bug #1813881 reported by overlord
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-hwe (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

2019-01-22T09:03:34.968028+00:00 localhost kernel: [80233.315906] INFO: task kworker/u30:3:5705 blocked for more than 120 seconds.
2019-01-22T09:03:34.968049+00:00 localhost kernel: [80233.321444] Tainted: P O 4.15.0-43-generic #46~16.04.1-Ubuntu
2019-01-22T09:03:34.980485+00:00 localhost kernel: [80233.327648] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-01-22T09:03:34.980519+00:00 localhost kernel: [80233.333902] kworker/u30:3 D 0 5705 2 0x80000000
2019-01-22T09:03:34.980521+00:00 localhost kernel: [80233.333909] Workqueue: events_unbound fsnotify_mark_destroy_workfn
2019-01-22T09:03:34.980522+00:00 localhost kernel: [80233.333910] Call Trace:
2019-01-22T09:03:34.980526+00:00 localhost kernel: [80233.333914] __schedule+0x3d6/0x8b0
2019-01-22T09:03:34.980527+00:00 localhost kernel: [80233.333918] schedule+0x36/0x80
2019-01-22T09:03:34.980528+00:00 localhost kernel: [80233.333920] schedule_timeout+0x1db/0x370
2019-01-22T09:03:34.980530+00:00 localhost kernel: [80233.333927] ? __enqueue_entity+0x5c/0x60
2019-01-22T09:03:34.980531+00:00 localhost kernel: [80233.333932] ? enqueue_entity+0x112/0x670
2019-01-22T09:03:34.980547+00:00 localhost kernel: [80233.333937] wait_for_completion+0xb4/0x140
2019-01-22T09:03:34.980554+00:00 localhost kernel: [80233.333939] ? wake_up_q+0x70/0x70
2019-01-22T09:03:34.980556+00:00 localhost kernel: [80233.333944] __synchronize_srcu.part.13+0x85/0xb0
2019-01-22T09:03:34.980557+00:00 localhost kernel: [80233.333947] ? trace_raw_output_rcu_utilization+0x50/0x50
2019-01-22T09:03:34.980558+00:00 localhost kernel: [80233.333950] synchronize_srcu+0xd3/0xe0
2019-01-22T09:03:34.980559+00:00 localhost kernel: [80233.333956] ? synchronize_srcu+0xd3/0xe0
2019-01-22T09:03:34.980560+00:00 localhost kernel: [80233.333962] fsnotify_mark_destroy_workfn+0x7c/0xe0
2019-01-22T09:03:34.980568+00:00 localhost kernel: [80233.333966] process_one_work+0x14d/0x410
2019-01-22T09:03:34.980570+00:00 localhost kernel: [80233.333968] worker_thread+0x22b/0x460
2019-01-22T09:03:34.980571+00:00 localhost kernel: [80233.333971] kthread+0x105/0x140
2019-01-22T09:03:34.980572+00:00 localhost kernel: [80233.333974] ? process_one_work+0x410/0x410
2019-01-22T09:03:34.980573+00:00 localhost kernel: [80233.333976] ? kthread_destroy_worker+0x50/0x50
2019-01-22T09:03:34.980574+00:00 localhost kernel: [80233.333979] ret_from_fork+0x35/0x40

The taint on the kernel is from zfs module.
Also there are other processes that reach the same state (D) like dockerd, systemd(init) ...

2019-01-22T09:03:34.949861+00:00 localhost kernel: [80233.299475] INFO: task dockerd:2809 blocked for more than 120 seconds.
2019-01-22T09:03:34.949863+00:00 localhost kernel: [80233.303136] Tainted: P O 4.15.0-43-generic #46~16.04.1-Ubuntu
2019-01-22T09:03:34.962084+00:00 localhost kernel: [80233.309016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2019-01-22T09:03:34.962114+00:00 localhost kernel: [80233.315513] dockerd D 0 2809 1 0x00000000
2019-01-22T09:03:34.962118+00:00 localhost kernel: [80233.315516] Call Trace:
2019-01-22T09:03:34.962120+00:00 localhost kernel: [80233.315521] __schedule+0x3d6/0x8b0
2019-01-22T09:03:34.962122+00:00 localhost kernel: [80233.315528] ? xen_smp_send_reschedule+0x10/0x20
2019-01-22T09:03:34.962137+00:00 localhost kernel: [80233.315532] schedule+0x36/0x80
2019-01-22T09:03:34.962139+00:00 localhost kernel: [80233.315535] schedule_timeout+0x1db/0x370
2019-01-22T09:03:34.962140+00:00 localhost kernel: [80233.315537] ? try_to_wake_up+0x59/0x4a0
2019-01-22T09:03:34.962164+00:00 localhost kernel: [80233.315539] wait_for_completion+0xb4/0x140
2019-01-22T09:03:34.962168+00:00 localhost kernel: [80233.315541] ? wake_up_q+0x70/0x70
2019-01-22T09:03:34.962170+00:00 localhost kernel: [80233.315547] flush_work+0x129/0x1e0
2019-01-22T09:03:34.962171+00:00 localhost kernel: [80233.315552] ? worker_detach_from_pool+0xb0/0xb0
2019-01-22T09:03:34.962186+00:00 localhost kernel: [80233.315555] flush_delayed_work+0x3f/0x50
2019-01-22T09:03:34.962194+00:00 localhost kernel: [80233.315559] fsnotify_wait_marks_destroyed+0x15/0x20
2019-01-22T09:03:34.962195+00:00 localhost kernel: [80233.315561] fsnotify_destroy_group+0x48/0xd0
2019-01-22T09:03:34.962196+00:00 localhost kernel: [80233.315563] inotify_release+0x1e/0x50
2019-01-22T09:03:34.962197+00:00 localhost kernel: [80233.315565] __fput+0xea/0x220
2019-01-22T09:03:34.962198+00:00 localhost kernel: [80233.315567] ____fput+0xe/0x10
2019-01-22T09:03:34.962200+00:00 localhost kernel: [80233.315569] task_work_run+0x8a/0xb0
2019-01-22T09:03:34.962202+00:00 localhost kernel: [80233.315571] exit_to_usermode_loop+0xc4/0xd0
2019-01-22T09:03:34.962203+00:00 localhost kernel: [80233.315573] do_syscall_64+0xf4/0x130
2019-01-22T09:03:34.962205+00:00 localhost kernel: [80233.315575] entry_SYSCALL_64_after_hwframe+0x3d/0xa2

lsb_release -rd
Description: Ubuntu 16.04.5 LTS
Release: 16.04

The issues seems to be related to this fix: (https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1802021).
The issue is reproducing also on Ubuntu 16.04.5LTS with kernel version 4.15.0-43-generic.

I am opening this bug for better traceability of the bak-ported fix.

Tags: bionic
overlord (lazamarius1)
affects: linux-azure (Ubuntu) → linux (Ubuntu)
description: updated
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1813881

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Revision history for this message
overlord (lazamarius1) wrote :

I am unable to run that command since it's missing from the system and any install option is compromised because of the bug (install times out).

description: updated
description: updated
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
overlord (lazamarius1)
affects: linux (Ubuntu) → linux-hwe (Ubuntu)
Revision history for this message
Andrei S (darthside) wrote :

This bug affects many production instances. We depend on this backport as upgrading to another distro is not possible at this point.
Do we know if this is a regression? Maybe a downgrade will solve the issue for now.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.