System hang with kernel general protection fault due to dell-wmi-sysman sysman_init failure

Bug #1931509 reported by AceLan Kao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Undecided
Unassigned
linux-oem-5.10 (Ubuntu)
Undecided
Unassigned
Focal
Undecided
AceLan Kao

Bug Description

[Impact]
Encountered below protection fault while doing reboot test since Ubuntu-oem-5.10-5.10.0-1028.29. Ubuntu Hirsute and Impish already have all the fixes.

Jun 9 09:06:02 ubuntu kernel: [ 6.216367] general protection fault, probably for non-canonical address 0x213146a124f901ea: 0000 [#1] SMP NOPTI
Jun 9 09:06:02 ubuntu kernel: [ 6.216371] CPU: 3 PID: 447 Comm: systemd-udevd Not tainted 5.10.0-1030-oem #31-Ubuntu
Jun 9 09:06:02 ubuntu kernel: [ 6.216372] Hardware name: Dell Inc. Latitude 5300/, BIOS 1.13.1 01/22/2021
Jun 9 09:06:02 ubuntu kernel: [ 6.216376] RIP: 0010:kobject_put+0xd/0x60
Jun 9 09:06:02 ubuntu kernel: [ 6.216378] Code: 02 0f 85 64 ff ff ff 45 31 ff e9 6e ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 85 ff 74 3e 55 48 89 e5 53 48 89 fb <f6> 47 3c 01 74 1a 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 38 83 f8
Jun 9 09:06:02 ubuntu kernel: [ 6.216379] RSP: 0018:ffffb233406a7c38 EFLAGS: 00010202
Jun 9 09:06:02 ubuntu kernel: [ 6.216381] RAX: 0000000000000000 RBX: 213146a124f901ea RCX: 0000000000000000
Jun 9 09:06:02 ubuntu kernel: [ 6.216383] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 213146a124f901ea
Jun 9 09:06:02 ubuntu kernel: [ 6.216384] RBP: ffffb233406a7c40 R08: 0000000000000001 R09: ffffffff9ac66400
Jun 9 09:06:02 ubuntu kernel: [ 6.216384] R10: ffff9c41106d7fa0 R11: 0000000000000001 R12: ffff9c4114011438
Jun 9 09:06:02 ubuntu kernel: [ 6.216385] R13: 213146a124f901ea R14: 0000000000000000 R15: ffffb233406a7e70
Jun 9 09:06:02 ubuntu kernel: [ 6.216387] FS: 00007f9e33410880(0000) GS:ffff9c485e4c0000(0000) knlGS:0000000000000000
Jun 9 09:06:02 ubuntu kernel: [ 6.216388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 9 09:06:02 ubuntu kernel: [ 6.216389] CR2: 00007ffe51396080 CR3: 000000011211e003 CR4: 00000000003706e0
Jun 9 09:06:02 ubuntu kernel: [ 6.216390] Call Trace:
Jun 9 09:06:02 ubuntu kernel: [ 6.216394] kset_unregister+0x2a/0x40
Jun 9 09:06:02 ubuntu kernel: [ 6.216399] sysman_init+0x20e/0x1000 [dell_wmi_sysman]
Jun 9 09:06:02 ubuntu kernel: [ 6.216401] ? 0xffffffffc0ce4000
Jun 9 09:06:02 ubuntu kernel: [ 6.216404] do_one_initcall+0x48/0x1d0
Jun 9 09:06:02 ubuntu kernel: [ 6.216406] ? _cond_resched+0x19/0x30
Jun 9 09:06:02 ubuntu kernel: [ 6.216408] ? kmem_cache_alloc_trace+0x37a/0x430
Jun 9 09:06:02 ubuntu kernel: [ 6.216410] ? do_init_module+0x28/0x250
Jun 9 09:06:02 ubuntu kernel: [ 6.216412] do_init_module+0x62/0x250
Jun 9 09:06:02 ubuntu kernel: [ 6.216414] load_module+0x11ac/0x1370
Jun 9 09:06:02 ubuntu kernel: [ 6.216417] ? security_kernel_post_read_file+0x5c/0x70
Jun 9 09:06:02 ubuntu kernel: [ 6.216419] ? security_kernel_post_read_file+0x5c/0x70
Jun 9 09:06:02 ubuntu kernel: [ 6.216421] __do_sys_finit_module+0xc2/0x120
Jun 9 09:06:02 ubuntu kernel: [ 6.216423] ? __do_sys_finit_module+0xc2/0x120
Jun 9 09:06:02 ubuntu kernel: [ 6.216427] __x64_sys_finit_module+0x1a/0x20
Jun 9 09:06:02 ubuntu kernel: [ 6.216429] do_syscall_64+0x38/0x90
Jun 9 09:06:02 ubuntu kernel: [ 6.216431] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[Fix]
The first commit fixes the issue, but also cherry pick some other commits for some other potential regressions.

bdda39673fde platform/x86: dell-wmi-sysman: Fix crash caused by calling kset_unregister twice
eaa1dcc79694 platform/x86: dell-wmi-sysman: Cleanup sysman_init() error-exit handling
f4c4e9ad1523 platform/x86: dell-wmi-sysman: Fix release_attributes_data() getting called twice on init_bios_attributes() failure
ececdb898376 platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
9b95665a83ec platform/x86: dell-wmi-sysman: Make sysman_init() return -ENODEV of the interfaces are not found
5e3f5973c8df platform/x86: dell-wmi-sysman: Make init_bios_attributes() ACPI object parsing more robust
42f38dcccfb3 platform/x86: dell-wmi-sysman: Cleanup create_attributes_level_sysfs_files()

[Test]
It passed 300 and 400 times reboot tests on 2 different platforms

[Where problems could occur]
Those commits are pretty straightforward, and we already done some stress test and make sure the system continue working well.

AceLan Kao (acelankao)
no longer affects: linux (Ubuntu Focal)
no longer affects: linux-oem-5.10 (Ubuntu Hirsute)
no longer affects: linux-oem-5.10 (Ubuntu Impish)
Changed in linux-oem-5.10 (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu Impish):
status: New → In Progress
Changed in linux-oem-5.10 (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux (Ubuntu Impish):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux-oem-5.10 (Ubuntu Focal):
assignee: nobody → AceLan Kao (acelankao)
AceLan Kao (acelankao)
description: updated
AceLan Kao (acelankao)
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Hirsute)
no longer affects: linux (Ubuntu Impish)
description: updated
description: updated
AceLan Kao (acelankao)
tags: added: oem-priority originate-from-1931125 somerville
AceLan Kao (acelankao)
description: updated
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

how can this get past regression testing?

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

Can reproduce with 5.10.0-1031-oem kernel.

SKU: ANDW-DVT2-C1 (202012-28526)
BIOS: 1.3.0
kernel: 5.10.0-1031-oem

Failed at the 9th time of 200 times warm boot stress.

Revision history for this message
AceLan Kao (acelankao) wrote :
Download full text (3.4 KiB)

Yes, it should be another issue in dell wmi driver, not the one we fixed during init.
We should file another bug for this one.

 �� 11 19:56:14 u-Latitude kernel: general protection fault, probably for non-canonical address 0x3ab228c7157e8ec0: 0000 [#1] SMP NOPTI
 �� 11 19:56:14 u-Latitude kernel: CPU: 3 PID: 819 Comm: NetworkManager Tainted: G W O 5.10.0-1031-oem #32-Ubuntu
 �� 11 19:56:14 u-Latitude kernel: Hardware name: Dell Inc. Latitude 9420/, BIOS 1.3.0 05/06/2021
 �� 11 19:56:14 u-Latitude kernel: RIP: 0010:kmem_cache_alloc+0xfd/0x440
 �� 11 19:56:14 u-Latitude kernel: Code: 4c 03 05 de 3d d4 4a 49 83 78 10 00 4d 8b 28 0f 84 f4 02 00 00 4d 85 ed 0f 84 eb 02 00 00 41 8b 44 24 28 49 8b 3c 24 4c 01 e8 <48> 8b 18 48 89 c1 49 33 9c 24 b8 00 00 00 4c 89 e8 48 0f c9 48 31
 �� 11 19:56:14 u-Latitude kernel: RSP: 0018:ffffb60200e63a70 EFLAGS: 00010216
 �� 11 19:56:14 u-Latitude kernel: RAX: 3ab228c7157e8ec0 RBX: 0000000000000001 RCX: 0000000000000001
 �� 11 19:56:14 u-Latitude kernel: RDX: 000000000000215f RSI: 0000000000000dc0 RDI: 0000000000032270
 �� 11 19:56:14 u-Latitude kernel: RBP: ffffb60200e63aa0 R08: ffff8f4e7f6f2270 R09: 0000000000000000
 �� 11 19:56:14 u-Latitude kernel: R10: 0000000000000000 R11: ffffdab984425600 R12: ffff8f4b00047200
 �� 11 19:56:14 u-Latitude kernel: R13: 3ab228c7157e8e98 R14: 0000000000000dc0 R15: 0000000000000000
 �� 11 19:56:14 u-Latitude kernel: FS: 00007efc69b44380(0000) GS:ffff8f4e7f6c0000(0000) knlGS:0000000000000000
 �� 11 19:56:14 u-Latitude kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 �� 11 19:56:14 u-Latitude kernel: CR2: 00007f3ceae3b000 CR3: 0000000110c52001 CR4: 0000000000770ee0
 �� 11 19:56:14 u-Latitude kernel: PKRU: 55555554
 �� 11 19:56:14 u-Latitude kernel: Call Trace:
 �� 11 19:56:14 u-Latitude kernel: ? acpi_ut_allocate_object_desc_dbg+0x63/0x10e
 �� 11 19:56:14 u-Latitude kernel: acpi_ut_allocate_object_desc_dbg+0x63/0x10e
 �� 11 19:56:14 u-Latitude kernel: acpi_ut_create_internal_object_dbg+0x51/0x119
 �� 11 19:56:14 u-Latitude kernel: acpi_ut_create_integer_object+0x47/0x98
 �� 11 19:56:14 u-Latitude kernel: acpi_ex_opcode_1A_0T_1R+0x3de/0x556
 �� 11 19:56:14 u-Latitude kernel: acpi_ds_exec_end_op+0x166/0x76b
 �� 11 19:56:14 u-Latitude kernel: acpi_ps_parse_loop+0x84b/0x924
 �� 11 19:56:14 u-Latitude kernel: acpi_ps_parse_aml+0x1af/0x550
 �� 11 19:56:14 u-Latitude kernel: acpi_ps_execute_method+0x208/0x2ca
 �� 11 19:56:14 u-Latitude kernel: acpi_ns_evaluate+0x34e/0x4f0
 �� 11 19:56:14 u-Latitude kernel: acpi_evaluate_object+0x18e/0x3b4
 �� 11 19:56:14 u-Latitude kernel: wmidev_evaluate_method+0x10f/0x140 [wmi]
 �� 11 19:56:14 u-Latitude kernel: run_smbios_call+0x6a/0x190 [dell_smbios]
 �� 11 19:56:14 u-Latitude kernel: dell_smbios_wmi_call+0x8c/0xe0 [dell_smbios]
 �� 11 19:56:14 u-Latitude kernel: dell_smbios_call+0x73/0xb0 [dell_smbios]
 �� 11 19:56:14 u-Latitude kernel: dell_rfkill_query+0x46/0xf0 [dell_laptop]
 �� 11 19:56:14 u-Latitude kernel: rfkill_set_block+0x36/0x150
 �� 11 19:56:14 u-Latitude kernel: rfkill_fop_write+0x136/0x1e0
 �� 11 19:56:14 u-Latitude kernel: vfs_write+0xca/0x280
 �� 11 19:56:1...

Read more...

Revision history for this message
perry_yuan (perry-yuan) wrote :

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/platform/x86/dell/dell-wmi-sysman?h=v5.13-rc6

there are some upstream fix patches need to backport for dell-wmi-sysman driver.
I would suggest that we run test on 5.13-rc6 and confirm if upstream already has fix

Revision history for this message
Yuan-Chen Cheng (ycheng-twn) wrote :

use kernel 1031, test on machine Dell Latitude 5300
reboot for count 489, can't reproduce this issue.

previously it reboots failure around 200th reboots.

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

I test ANDW-DVT2-C1 (202012-28526) with kernel v5.13-rc6 from https://kernel.ubuntu.com/~kernel-ppa/mainline/ and secure boot disable.

I can't reproduce #2. I'll check if 5.10-oem kernel contains all commits mentioned in #4.
Attached my check script for reference.

Revision history for this message
Scott Hu (huntu207) wrote :

Can not reproduce on QA side with 1031 kernel over than 350 times

SKU: ANDW-DVT2-C3
Image: canonical-oem-somerville-focal-amd64-20200502-85+fossa-mewtwo+X101
bios-version: 1.3.0
kernel:5.10.0-1031-oem

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

@acelankao

I compare the 5.10.0-1031-oem kernel and 5.13-rc6. There are three commits mentioned by #4, which is not in 5.10.0-1031-oem. Dell's perry suggest we sync with the latest Dell wmi driver.

Could you help to check if we need those changes?

platform/x86: wmi: Make remove callback return void
platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
platform/x86: dell-wmi-sysman: Make it safe to call exit_foo_attributes() multiple times

Thanks,

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote (last edit ):

I create a separated bug for #2, https://bugs.launchpad.net/somerville/+bug/1932099.
It is because the stress I perform is connected to WD19TB, which is not the original test scenario of the bug.

Revision history for this message
AceLan Kao (acelankao) wrote :

KC,

5.13-oem kernel already has the 3 commits you mentioned on comment #8
Review your log again, I feel like it's more like a h/w issue(memory) while allocating resource. It fails on different drivers.

Revision history for this message
AceLan Kao (acelankao) wrote :

2d0c418c91d8c platform/x86: dell-wmi-sysman: Make it safe to call exit_foo_attributes() multiple times
c59ab4cedab70 platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
2b329f5694aec platform/x86: wmi: Make remove callback return void

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

marking verified according to #7

tags: added: verification-done-focal
removed: verification-needed-focal
Changed in linux-oem-5.10 (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.10 - 5.10.0-1032.33

---------------
linux-oem-5.10 (5.10.0-1032.33) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1032.33 -proposed tracker (LP: #1932138)

  * Mute/Mic mute LEDs and right speaker are not work on HP platforms
    (LP: #1932055)
    - ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly
      G2
    - ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360
      1040 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8

 -- Timo Aaltonen <email address hidden> Wed, 16 Jun 2021 15:27:58 +0300

Changed in linux-oem-5.10 (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers