BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:1119]

Bug #1788817 reported by Juerg Haefliger on 2018-08-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Juerg Haefliger

Bug Description

== SRU Justification ==

Booting Trusty with kernel 3.13.0-156-generic on a fairly new Intel box results in stuck CPU warnings:

[ 33.551942] BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:1119]
[ 33.558560] Modules linked in: hid_generic ast(+) crct10dif_pclmul syscopyarea crc32_pclmul sysfillrect ghash_clmulni_intel sysimgblt aesni_intel i2c_algo_bit aes_x86_64 lrw ttm gf128mul glue_helper drm_kms_helper ablk_helper usbhid cryptd ahci drm hid libahci wmi
[ 33.583971] CPU: 0 PID: 1119 Comm: kworker/0:1 Not tainted 3.13.0-156-generic #206-Ubuntu
[ 33.592270] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.01.00.0294.101220161543 10/12/2016
[ 33.602836] Workqueue: events work_for_cpu_fn
[ 33.607351] task: ffff88084ddbc6b0 ti: ffff88084dd98000 task.ti: ffff88084dd98000
[ 33.614957] RIP: 0010:[<ffffffff81383bb2>] [<ffffffff81383bb2>] ioread32+0x42/0x50
[ 33.622805] RSP: 0018:ffff88084dd99d18 EFLAGS: 00000292
[ 33.628201] RAX: 00000000ffffffff RBX: ffff880851265000 RCX: 00000000000028a6
[ 33.635429] RDX: ffffc90016bd0000 RSI: ffffc90016bd0000 RDI: ffffc90016bd0000
[ 33.642653] RBP: ffff88084dd99d40 R08: 0000000000000092 R09: 000000000000079e
[ 33.649877] R10: ffffffff813df570 R11: ffff88084dd99a46 R12: 0000000016bc0000
[ 33.657104] R13: ffffffffa01b2ab0 R14: ffff88084dd99cc8 R15: 0000000000000000
[ 33.664329] FS: 0000000000000000(0000) GS:ffff88085f200000(0000) knlGS:0000000000000000
[ 33.672541] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 33.678380] CR2: 00007f3613be4000 CR3: 0000000001c0e000 CR4: 0000000000360770
[ 33.685605] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 33.692829] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 33.700052] Stack:
[ 33.702150] ffffffffa01a89e0 ffff880851265000 0000000000000000 0000000000000000
[ 33.709948] ffffffffa01b4100 ffff88084dd99d68 ffffffffa0073e88 ffff880851265000
[ 33.717751] ffff880850dee000 ffff880850dee098 ffff88084dd99db8 ffffffffa0075d92
[ 33.725550] Call Trace:
[ 33.728085] [<ffffffffa01a89e0>] ? ast_driver_load+0x320/0x540 [ast]
[ 33.734634] [<ffffffffa0073e88>] drm_dev_register+0xa8/0x1f0 [drm]
[ 33.741000] [<ffffffffa0075d92>] drm_get_pci_dev+0x92/0x140 [drm]
[ 33.747268] [<ffffffffa01a81e5>] ast_pci_probe+0x15/0x20 [ast]
[ 33.753282] [<ffffffff813b5f5a>] local_pci_probe+0x4a/0xb0
[ 33.758947] [<ffffffff8108746a>] work_for_cpu_fn+0x1a/0x30
[ 33.764613] [<ffffffff8108a5fe>] process_one_work+0x17e/0x480
[ 33.770536] [<ffffffff8108b5ab>] worker_thread+0x29b/0x410
[ 33.776205] [<ffffffff8108b310>] ? rescuer_thread+0x430/0x430
[ 33.782130] [<ffffffff810922cb>] kthread+0xcb/0xf0
[ 33.787101] [<ffffffff81092200>] ? kthread_create_on_node+0x1c0/0x1c0
[ 33.793721] [<ffffffff8174758e>] ret_from_fork+0x6e/0xa0
[ 33.799211] [<ffffffff81092200>] ? kthread_create_on_node+0x1c0/0x1c0
[ 33.805828] Code: 66 0f 1f 84 00 00 00 00 00 55 48 c7 c6 80 57 a9 81 48 89 e5 e8 f0 fe ff ff b8 ff ff ff ff 5d c3 66 0f 1f 84 00 00 00 00 00 8b 07 <c3> 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 00

== Fix ==

Backport commit 6c971c09f387 ("drm/ast: Fixed system hanged if disable P2A")

== Regression Potential ==

Low. The patch only touches the ast driver and the modifications are fairly small. The patch is also in Xenial (via a stable upstream update) and received testing there.

== Test Case ==

Booted a patched kernel on HW to verify it comes up without stuck CPU warnings.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1788817

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Juerg Haefliger (juergh) wrote :

Per git bisect, the fix for this should be:

commit 6c971c09f38704513c426ba6515f22fb3d6c87d5
Author: Y.C. Chen <email address hidden>
Date: Thu Jan 26 09:45:40 2017 +0800

    drm/ast: Fixed system hanged if disable P2A

    The original ast driver will access some BMC configuration through P2A bridge
    that can be disabled since AST2300 and after.
    It will cause system hanged if P2A bridge is disabled.
    Here is the update to fix it.

    Signed-off-by: Y.C. Chen <email address hidden>
    Signed-off-by: Dave Airlie <email address hidden>

Juerg Haefliger (juergh) on 2018-08-24
description: updated
Changed in linux (Ubuntu):
assignee: nobody → Juerg Haefliger (juergh)
Changed in linux (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Incomplete → Triaged
Changed in linux (Ubuntu Trusty):
status: Triaged → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Juerg Haefliger (juergh) on 2018-09-28
tags: added: verification-done-trusty
removed: verification-needed-trusty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-160.210

linux (3.13.0-160.210) trusty; urgency=medium

  * CVE-2018-14633
    - iscsi target: Use hex2bin instead of a re-implementation

  * CVE-2018-14634
    - exec: Limit arg stack to at most 75% of _STK_LIM

linux (3.13.0-159.209) trusty; urgency=medium

  * linux: 3.13.0-159.209 -proposed tracker (LP: #1791754)

  * L1TF mitigation not effective in some CPU and RAM combinations
    (LP: #1788563) // CVE-2018-3620 // CVE-2018-3646
    - x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
    - x86/speculation/l1tf: Fix off-by-one error when warning that system has too
      much RAM
    - x86/speculation/l1tf: Increase l1tf memory limit for Nehalem+

  * CVE-2018-15594
    - x86/paravirt: Fix spectre-v2 mitigations for paravirt guests

  * i40e NIC not recognized (LP: #1789215)
    - SAUCE: i40e_bpo: Import the i40e driver from Xenial 4.4
    - SAUCE: i40e_bpo: Add a compatibility layer
    - SAUCE: i40e_bpo: Don't probe for NICs supported by the in-tree driver
    - SAUCE: i40e_bpo: Rename the driver to i40e_bpo
    - SAUCE: i40e_bpo: Hook the driver into the kernel tree
    - [Config] Add CONFIG_I40E_BPO=m

  * Probable regression with EXT3 file systems and CVE-2018-1093 patches
    (LP: #1789131)
    - ext4: fix bitmap position validation

  * CVE-2018-3620 // CVE-2018-3646
    - mm: x86 pgtable: drop unneeded preprocessor ifdef
    - x86/asm: Move PUD_PAGE macros to page_types.h
    - x86/asm: Add pud/pmd mask interfaces to handle large PAT bit
    - x86/asm: Fix pud/pmd interfaces to handle large PAT bit
    - x86/mm: Fix regression with huge pages on PAE
    - SAUCE: x86/speculation/l1tf: Protect NUMA hinting PTEs against speculation
    - Revert "UBUNTU: [Config] disable NUMA_BALANCING"

  * CVE-2018-15572
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/speculation: Protect against userspace-userspace spectreRSB

  * CVE-2018-6555
    - SAUCE: irda: Only insert new objects into the global database via setsockopt

  * CVE-2018-6554
    - SAUCE: irda: Fix memory leak caused by repeated binds of irda socket

  * BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:1119] (LP: #1788817)
    - drm/ast: Fixed system hanged if disable P2A

  * errors when scanning partition table of corrupted AIX disk (LP: #1787281)
    - partitions/aix: fix usage of uninitialized lv_info and lvname structures
    - partitions/aix: append null character to print data from disk

 -- Stefan Bader <email address hidden> Mon, 24 Sep 2018 19:38:31 +0200

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Juerg Haefliger (juergh) on 2019-06-13
Changed in linux (Ubuntu):
status: Triaged → Invalid
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers