kworker and power_saving processes stuck in D state

Bug #1035216 reported by Frederik Himpe
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

On a Dell PowerEdge R420, the kworker and power_saving kernel processes get stuck in the D state after some time, causing a continuous load reported by top of 3.

root 17635 0.0 0.0 0 0 ? D Aug09 0:02 [kworker/0:2]
root 18581 0.0 0.0 0 0 ? D 08:35 0:01 [power_saving/0]
root 18582 0.0 0.0 0 0 ? D 08:35 0:01 [power_saving/1]

dmesg logs contain backtraces:
[ 0.060845] ------------[ cut here ]------------
[ 0.060960] WARNING: at /build/buildd/linux-3.2.0/drivers/iommu/intr_remapping.c:558 enable_intr_remapping+0x77/0x1ab()
[ 0.061107] Hardware name: PowerEdge R420
[ 0.061216] Your BIOS is broken and requested that x2apic be disabled
[ 0.061217] This will leave your machine vulnerable to irq-injection attacks
[ 0.061218] Use 'intremap=no_x2apic_optout' to override BIOS request
[ 0.061560] Modules linked in:
[ 0.061745] Pid: 1, comm: swapper/0 Not tainted 3.2.0-27-generic #43-Ubuntu
[ 0.061862] Call Trace:
[ 0.061973] [<ffffffff8106729f>] warn_slowpath_common+0x7f/0xc0
[ 0.062089] [<ffffffff81067396>] warn_slowpath_fmt+0x46/0x50
[ 0.062204] [<ffffffff81d3a0fb>] enable_intr_remapping+0x77/0x1ab
[ 0.062320] [<ffffffff81d0b63b>] enable_IR+0x39/0x42
[ 0.062433] [<ffffffff81d0b6cb>] enable_IR_x2apic+0x87/0x1de
[ 0.062550] [<ffffffff816377b7>] ? set_cpu_sibling_map+0x326/0x344
[ 0.062666] [<ffffffff81d0d7d9>] default_setup_apic_routing+0x12/0x78
[ 0.062786] [<ffffffff81d09561>] native_smp_prepare_cpus+0x1af/0x2a2
[ 0.062903] [<ffffffff81cfbc90>] kernel_init+0x80/0x158
[ 0.063019] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10
[ 0.063135] [<ffffffff81cfbc10>] ? start_kernel+0x3bd/0x3bd
[ 0.063250] [<ffffffff816643f0>] ? gs_change+0x13/0x13
[ 0.063366] ---[ end trace a7919e7f17c0a725 ]---

[66519.456289] INFO: task kworker/0:2:17635 blocked for more than 120 seconds.
[66519.456335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[66519.456406] kworker/0:2 D ffffffff81806080 0 17635 2 0x00000000
[66519.456413] ffff8807cfec3b30 0000000000000046 0000000000000001 0000000000000001
[66519.456425] ffff8807cfec3fd8 ffff8807cfec3fd8 ffff8807cfec3fd8 0000000000013780
[66519.456433] ffffffff81c0d020 ffff8807c39f2de0 ffff8807cfec3b10 7fffffffffffffff
[66519.456442] Call Trace:
[66519.456456] [<ffffffff81657d8f>] schedule+0x3f/0x60
[66519.456463] [<ffffffff816583d5>] schedule_timeout+0x2a5/0x320
[66519.456473] [<ffffffff8138897d>] ? acpi_ns_check_package_elements+0x43/0x98
[66519.456483] [<ffffffff811646fc>] ? kmem_cache_alloc+0x10c/0x140
[66519.456490] [<ffffffff8116224f>] ? kmem_cache_free+0x2f/0x110
[66519.456496] [<ffffffff81657bcf>] wait_for_common+0xdf/0x180
[66519.456503] [<ffffffff8103dcf9>] ? default_spin_lock_flags+0x9/0x10
[66519.456512] [<ffffffff8105fae0>] ? try_to_wake_up+0x200/0x200
[66519.456520] [<ffffffff8136bd84>] ? acpi_os_wait_events_complete+0x23/0x23
[66519.456526] [<ffffffff81657d4d>] wait_for_completion+0x1d/0x20
[66519.456533] [<ffffffff8108a5e6>] kthread_stop+0x46/0x110
[66519.456572] [<ffffffffa00160b3>] set_power_saving_task_num+0xb3/0xd8 [acpi_pad]
[66519.456585] [<ffffffffa0016107>] acpi_pad_idle_cpus+0x2f/0x38 [acpi_pad]
[66519.456597] [<ffffffffa00163ee>] acpi_pad_handle_notify+0x98/0x111 [acpi_pad]
[66519.456605] [<ffffffff8137cb8a>] ? acpi_ev_finish_gpe+0x30/0x30
[66519.456613] [<ffffffff810831f5>] ? queue_work_on+0x25/0x30
[66519.456619] [<ffffffff8136be5e>] ? __acpi_os_execute+0xa6/0xd3
[66519.456631] [<ffffffffa0016559>] acpi_pad_notify+0x1c/0x65 [acpi_pad]
[66519.456638] [<ffffffff8137b703>] acpi_ev_notify_dispatch+0x67/0x7e
[66519.456643] [<ffffffff8136bdab>] acpi_os_execute_deferred+0x27/0x34
[66519.456650] [<ffffffff81084f8a>] process_one_work+0x11a/0x480
[66519.456657] [<ffffffff81085d34>] worker_thread+0x164/0x370
[66519.456664] [<ffffffff81085bd0>] ? manage_workers.isra.29+0x130/0x130
[66519.456669] [<ffffffff8108a58c>] kthread+0x8c/0xa0
[66519.456677] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10
[66519.456682] [<ffffffff8108a500>] ? flush_kthread_worker+0xa0/0xa0
[66519.456688] [<ffffffff816643f0>] ? gs_change+0x13/0x13

[66519.456693] INFO: task power_saving/0:18581 blocked for more than 120 seconds.
[66519.456760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[66519.456827] power_saving/0 D ffffffff81806080 0 18581 2 0x00000000
[66519.456833] ffff8807cffbbd80 0000000000000046 ffff8807cffbbd30 ffff88102fc13780
[66519.456841] ffff8807cffbbfd8 ffff8807cffbbfd8 ffff8807cffbbfd8 0000000000013780
[66519.456849] ffff8807fc4e44d0 ffff8807f0255bc0 ffff8807cffbbdd0 ffffffffa0018170
[66519.456858] Call Trace:
[66519.456873] [<ffffffff81657d8f>] schedule+0x3f/0x60
[66519.456880] [<ffffffff81658b97>] __mutex_lock_slowpath+0xd7/0x150
[66519.456893] [<ffffffff816587aa>] mutex_lock+0x2a/0x50
[66519.456906] [<ffffffffa00165e9>] round_robin_cpu+0x34/0x197 [acpi_pad]
[66519.456918] [<ffffffffa00167e2>] power_saving_thread+0x96/0x220 [acpi_pad]
[66519.456930] [<ffffffffa001674c>] ? round_robin_cpu+0x197/0x197 [acpi_pad]
[66519.456936] [<ffffffff8108a58c>] kthread+0x8c/0xa0
[66519.456941] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10
[66519.456947] [<ffffffff8108a500>] ? flush_kthread_worker+0xa0/0xa0
[66519.456952] [<ffffffff816643f0>] ? gs_change+0x13/0x13
[66519.456956] INFO: task power_saving/1:18582 blocked for more than 120 seconds.
[66519.457019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[66519.457081] power_saving/1 D ffffffff81806080 0 18582 2 0x00000000
[66519.457086] ffff8807f15efd80 0000000000000046 ffff8807f15efd30 ffff88080fc13780
[66519.457094] ffff8807f15effd8 ffff8807f15effd8 ffff8807f15effd8 0000000000013780
[66519.457102] ffffffff81c0d020 ffff8807f0252de0 ffff8807f15efdd0 ffffffffa0018170
[66519.457110] Call Trace:
[66519.457126] [<ffffffff81657d8f>] schedule+0x3f/0x60
[66519.457132] [<ffffffff81658b97>] __mutex_lock_slowpath+0xd7/0x150
[66519.457145] [<ffffffff816587aa>] mutex_lock+0x2a/0x50
[66519.457157] [<ffffffffa00165e9>] round_robin_cpu+0x34/0x197 [acpi_pad]
[66519.457169] [<ffffffffa00167e2>] power_saving_thread+0x96/0x220 [acpi_pad]
[66519.457181] [<ffffffffa001674c>] ? round_robin_cpu+0x197/0x197 [acpi_pad]
[66519.457187] [<ffffffff8108a58c>] kthread+0x8c/0xa0
[66519.457192] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10
[66519.457198] [<ffffffff8108a500>] ? flush_kthread_worker+0xa0/0xa0
[66519.457203] [<ffffffff816643f0>] ? gs_change+0x13/0x13

[66639.303094] INFO: task kworker/0:2:17635 blocked for more than 120 seconds.
[66639.303143] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[66639.303205] kworker/0:2 D ffffffff81806080 0 17635 2 0x00000000
[66639.303213] ffff8807cfec3b30 0000000000000046 0000000000000001 0000000000000001
[66639.303223] ffff8807cfec3fd8 ffff8807cfec3fd8 ffff8807cfec3fd8 0000000000013780
[66639.303232] ffffffff81c0d020 ffff8807c39f2de0 ffff8807cfec3b10 7fffffffffffffff
[66639.303240] Call Trace:
[66639.303250] [<ffffffff81657d8f>] schedule+0x3f/0x60
[66639.303257] [<ffffffff816583d5>] schedule_timeout+0x2a5/0x320
[66639.303265] [<ffffffff8138897d>] ? acpi_ns_check_package_elements+0x43/0x98
[66639.303273] [<ffffffff811646fc>] ? kmem_cache_alloc+0x10c/0x140
[66639.303279] [<ffffffff8116224f>] ? kmem_cache_free+0x2f/0x110
[66639.303285] [<ffffffff81657bcf>] wait_for_common+0xdf/0x180
[66639.303291] [<ffffffff8103dcf9>] ? default_spin_lock_flags+0x9/0x10
[66639.303298] [<ffffffff8105fae0>] ? try_to_wake_up+0x200/0x200
[66639.303303] [<ffffffff8136bd84>] ? acpi_os_wait_events_complete+0x23/0x23
[66639.303310] [<ffffffff81657d4d>] wait_for_completion+0x1d/0x20
[66639.303315] [<ffffffff8108a5e6>] kthread_stop+0x46/0x110
[66639.303336] [<ffffffffa00160b3>] set_power_saving_task_num+0xb3/0xd8 [acpi_pad]
[66639.303348] [<ffffffffa0016107>] acpi_pad_idle_cpus+0x2f/0x38 [acpi_pad]
[66639.303360] [<ffffffffa00163ee>] acpi_pad_handle_notify+0x98/0x111 [acpi_pad]
[66639.303368] [<ffffffff8137cb8a>] ? acpi_ev_finish_gpe+0x30/0x30
[66639.303374] [<ffffffff810831f5>] ? queue_work_on+0x25/0x30
[66639.303379] [<ffffffff8136be5e>] ? __acpi_os_execute+0xa6/0xd3
[66639.303391] [<ffffffffa0016559>] acpi_pad_notify+0x1c/0x65 [acpi_pad]
[66639.303398] [<ffffffff8137b703>] acpi_ev_notify_dispatch+0x67/0x7e
[66639.303403] [<ffffffff8136bdab>] acpi_os_execute_deferred+0x27/0x34
[66639.303410] [<ffffffff81084f8a>] process_one_work+0x11a/0x480
[66639.303416] [<ffffffff81085d34>] worker_thread+0x164/0x370
[66639.303423] [<ffffffff81085bd0>] ? manage_workers.isra.29+0x130/0x130
[66639.303428] [<ffffffff8108a58c>] kthread+0x8c/0xa0
[66639.303435] [<ffffffff816643f4>] kernel_thread_helper+0x4/0x10
[66639.303440] [<ffffffff8108a500>] ? flush_kthread_worker+0xa0/0xa0
[66639.303446] [<ffffffff816643f0>] ? gs_change+0x13/0x13

and more...

I found this report, which mentions a fix in linux 3.5-rc2: http://en.community.dell.com/support-forums/servers/f/1466/p/19456558/20147617.aspx

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-27-generic 3.2.0-27.43
ProcVersionSignature: Ubuntu 3.2.0-27.43-generic 3.2.21
Uname: Linux 3.2.0-27-generic x86_64
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Aug 9 14:09 seq
 crw-rw---T 1 root audio 116, 33 Aug 9 14:09 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu12
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Date: Fri Aug 10 10:33:53 2012
HibernationDevice: RESUME=UUID=f6035f1e-803a-4c84-8cdb-b83302b75dbd
InstallationMedia: Ubuntu-Server 12.04 LTS "Precise Pangolin" - Release amd64 (20120424.1)
IwConfig:
 lo no wireless extensions.

 eth1 no wireless extensions.

 eth0 no wireless extensions.
MachineType: Dell Inc. PowerEdge R420
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-27-generic root=/dev/mapper/linux-root ro transparent_hugepage=always elevator=deadline
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-27-generic N/A
 linux-backports-modules-3.2.0-27-generic N/A
 linux-firmware 1.79
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
StagingDrivers: mei
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/11/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.2.4
dmi.board.name: 072XWF
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.2.4:bd05/11/2012:svnDellInc.:pnPowerEdgeR420:pvr:rvnDellInc.:rn072XWF:rvrA01:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R420
dmi.sys.vendor: Dell Inc.

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the v3.5-rc2 kernel[0]? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . You will need to install both the linux-image and linux-image-extra .deb packages.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc2-quantal/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: added: kernel-da-key needs-bisect needs-upstream-testing
Revision history for this message
Frederik Himpe (fhimpe) wrote :

Upstream bug report: https://bugzilla.kernel.org/show_bug.cgi?id=42981
Patch: https://patchwork.kernel.org/patch/1133941/

Seems like the patch finally went in 3.5-rc5 and 3.2.22: https://www.kernel.org/pub/linux/kernel/v3.0/ChangeLog-3.2.22

So I guess this should be fixed with the latest kernel which went into precise-updates. I will let you know if this problem would still be occuring anyway.

Revision history for this message
Bryan Quigley (bryanquigley) wrote :

Has this issue been fixed for you?

Frederik Himpe (fhimpe)
Changed in linux (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.