System hang with kernel general protection fault due to dell-wmi-sysman sysman_init failure

Bug #1931509 reported by AceLan Kao
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
linux-oem-5.10 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
AceLan Kao

Bug Description

[Impact]
Encountered below protection fault while doing reboot test since Ubuntu-oem-5.10-5.10.0-1028.29. Ubuntu Hirsute and Impish already have all the fixes.

Jun 9 09:06:02 ubuntu kernel: [ 6.216367] general protection fault, probably for non-canonical address 0x213146a124f901ea: 0000 [#1] SMP NOPTI
Jun 9 09:06:02 ubuntu kernel: [ 6.216371] CPU: 3 PID: 447 Comm: systemd-udevd Not tainted 5.10.0-1030-oem #31-Ubuntu
Jun 9 09:06:02 ubuntu kernel: [ 6.216372] Hardware name: Dell Inc. Latitude 5300/, BIOS 1.13.1 01/22/2021
Jun 9 09:06:02 ubuntu kernel: [ 6.216376] RIP: 0010:kobject_put+0xd/0x60
Jun 9 09:06:02 ubuntu kernel: [ 6.216378] Code: 02 0f 85 64 ff ff ff 45 31 ff e9 6e ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 85 ff 74 3e 55 48 89 e5 53 48 89 fb <f6> 47 3c 01 74 1a 48 8d 7b 38 b8 ff ff ff ff f0 0f c1 43 38 83 f8
Jun 9 09:06:02 ubuntu kernel: [ 6.216379] RSP: 0018:ffffb233406a7c38 EFLAGS: 00010202
Jun 9 09:06:02 ubuntu kernel: [ 6.216381] RAX: 0000000000000000 RBX: 213146a124f901ea RCX: 0000000000000000
Jun 9 09:06:02 ubuntu kernel: [ 6.216383] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 213146a124f901ea
Jun 9 09:06:02 ubuntu kernel: [ 6.216384] RBP: ffffb233406a7c40 R08: 0000000000000001 R09: ffffffff9ac66400
Jun 9 09:06:02 ubuntu kernel: [ 6.216384] R10: ffff9c41106d7fa0 R11: 0000000000000001 R12: ffff9c4114011438
Jun 9 09:06:02 ubuntu kernel: [ 6.216385] R13: 213146a124f901ea R14: 0000000000000000 R15: ffffb233406a7e70
Jun 9 09:06:02 ubuntu kernel: [ 6.216387] FS: 00007f9e33410880(0000) GS:ffff9c485e4c0000(0000) knlGS:0000000000000000
Jun 9 09:06:02 ubuntu kernel: [ 6.216388] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 9 09:06:02 ubuntu kernel: [ 6.216389] CR2: 00007ffe51396080 CR3: 000000011211e003 CR4: 00000000003706e0
Jun 9 09:06:02 ubuntu kernel: [ 6.216390] Call Trace:
Jun 9 09:06:02 ubuntu kernel: [ 6.216394] kset_unregister+0x2a/0x40
Jun 9 09:06:02 ubuntu kernel: [ 6.216399] sysman_init+0x20e/0x1000 [dell_wmi_sysman]
Jun 9 09:06:02 ubuntu kernel: [ 6.216401] ? 0xffffffffc0ce4000
Jun 9 09:06:02 ubuntu kernel: [ 6.216404] do_one_initcall+0x48/0x1d0
Jun 9 09:06:02 ubuntu kernel: [ 6.216406] ? _cond_resched+0x19/0x30
Jun 9 09:06:02 ubuntu kernel: [ 6.216408] ? kmem_cache_alloc_trace+0x37a/0x430
Jun 9 09:06:02 ubuntu kernel: [ 6.216410] ? do_init_module+0x28/0x250
Jun 9 09:06:02 ubuntu kernel: [ 6.216412] do_init_module+0x62/0x250
Jun 9 09:06:02 ubuntu kernel: [ 6.216414] load_module+0x11ac/0x1370
Jun 9 09:06:02 ubuntu kernel: [ 6.216417] ? security_kernel_post_read_file+0x5c/0x70
Jun 9 09:06:02 ubuntu kernel: [ 6.216419] ? security_kernel_post_read_file+0x5c/0x70
Jun 9 09:06:02 ubuntu kernel: [ 6.216421] __do_sys_finit_module+0xc2/0x120
Jun 9 09:06:02 ubuntu kernel: [ 6.216423] ? __do_sys_finit_module+0xc2/0x120
Jun 9 09:06:02 ubuntu kernel: [ 6.216427] __x64_sys_finit_module+0x1a/0x20
Jun 9 09:06:02 ubuntu kernel: [ 6.216429] do_syscall_64+0x38/0x90
Jun 9 09:06:02 ubuntu kernel: [ 6.216431] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[Fix]
The first commit fixes the issue, but also cherry pick some other commits for some other potential regressions.

bdda39673fde platform/x86: dell-wmi-sysman: Fix crash caused by calling kset_unregister twice
eaa1dcc79694 platform/x86: dell-wmi-sysman: Cleanup sysman_init() error-exit handling
f4c4e9ad1523 platform/x86: dell-wmi-sysman: Fix release_attributes_data() getting called twice on init_bios_attributes() failure
ececdb898376 platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
9b95665a83ec platform/x86: dell-wmi-sysman: Make sysman_init() return -ENODEV of the interfaces are not found
5e3f5973c8df platform/x86: dell-wmi-sysman: Make init_bios_attributes() ACPI object parsing more robust
42f38dcccfb3 platform/x86: dell-wmi-sysman: Cleanup create_attributes_level_sysfs_files()

[Test]
It passed 300 and 400 times reboot tests on 2 different platforms

[Where problems could occur]
Those commits are pretty straightforward, and we already done some stress test and make sure the system continue working well.

AceLan Kao (acelankao)
no longer affects: linux (Ubuntu Focal)
no longer affects: linux-oem-5.10 (Ubuntu Hirsute)
no longer affects: linux-oem-5.10 (Ubuntu Impish)
Changed in linux-oem-5.10 (Ubuntu):
status: New → Invalid
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux (Ubuntu Impish):
status: New → In Progress
Changed in linux-oem-5.10 (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux (Ubuntu Impish):
assignee: nobody → AceLan Kao (acelankao)
Changed in linux-oem-5.10 (Ubuntu Focal):
assignee: nobody → AceLan Kao (acelankao)
AceLan Kao (acelankao)
description: updated
AceLan Kao (acelankao)
no longer affects: linux (Ubuntu)
no longer affects: linux (Ubuntu Hirsute)
no longer affects: linux (Ubuntu Impish)
description: updated
description: updated
AceLan Kao (acelankao)
tags: added: oem-priority originate-from-1931125 somerville
AceLan Kao (acelankao)
description: updated
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

how can this get past regression testing?

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

Can reproduce with 5.10.0-1031-oem kernel.

SKU: ANDW-DVT2-C1 (202012-28526)
BIOS: 1.3.0
kernel: 5.10.0-1031-oem

Failed at the 9th time of 200 times warm boot stress.

Revision history for this message
AceLan Kao (acelankao) wrote :
Download full text (3.4 KiB)

Yes, it should be another issue in dell wmi driver, not the one we fixed during init.
We should file another bug for this one.

 �� 11 19:56:14 u-Latitude kernel: general protection fault, probably for non-canonical address 0x3ab228c7157e8ec0: 0000 [#1] SMP NOPTI
 �� 11 19:56:14 u-Latitude kernel: CPU: 3 PID: 819 Comm: NetworkManager Tainted: G W O 5.10.0-1031-oem #32-Ubuntu
 �� 11 19:56:14 u-Latitude kernel: Hardware name: Dell Inc. Latitude 9420/, BIOS 1.3.0 05/06/2021
 �� 11 19:56:14 u-Latitude kernel: RIP: 0010:kmem_cache_alloc+0xfd/0x440
 �� 11 19:56:14 u-Latitude kernel: Code: 4c 03 05 de 3d d4 4a 49 83 78 10 00 4d 8b 28 0f 84 f4 02 00 00 4d 85 ed 0f 84 eb 02 00 00 41 8b 44 24 28 49 8b 3c 24 4c 01 e8 <48> 8b 18 48 89 c1 49 33 9c 24 b8 00 00 00 4c 89 e8 48 0f c9 48 31
 �� 11 19:56:14 u-Latitude kernel: RSP: 0018:ffffb60200e63a70 EFLAGS: 00010216
 �� 11 19:56:14 u-Latitude kernel: RAX: 3ab228c7157e8ec0 RBX: 0000000000000001 RCX: 0000000000000001
 �� 11 19:56:14 u-Latitude kernel: RDX: 000000000000215f RSI: 0000000000000dc0 RDI: 0000000000032270
 �� 11 19:56:14 u-Latitude kernel: RBP: ffffb60200e63aa0 R08: ffff8f4e7f6f2270 R09: 0000000000000000
 �� 11 19:56:14 u-Latitude kernel: R10: 0000000000000000 R11: ffffdab984425600 R12: ffff8f4b00047200
 �� 11 19:56:14 u-Latitude kernel: R13: 3ab228c7157e8e98 R14: 0000000000000dc0 R15: 0000000000000000
 �� 11 19:56:14 u-Latitude kernel: FS: 00007efc69b44380(0000) GS:ffff8f4e7f6c0000(0000) knlGS:0000000000000000
 �� 11 19:56:14 u-Latitude kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 �� 11 19:56:14 u-Latitude kernel: CR2: 00007f3ceae3b000 CR3: 0000000110c52001 CR4: 0000000000770ee0
 �� 11 19:56:14 u-Latitude kernel: PKRU: 55555554
 �� 11 19:56:14 u-Latitude kernel: Call Trace:
 �� 11 19:56:14 u-Latitude kernel: ? acpi_ut_allocate_object_desc_dbg+0x63/0x10e
 �� 11 19:56:14 u-Latitude kernel: acpi_ut_allocate_object_desc_dbg+0x63/0x10e
 �� 11 19:56:14 u-Latitude kernel: acpi_ut_create_internal_object_dbg+0x51/0x119
 �� 11 19:56:14 u-Latitude kernel: acpi_ut_create_integer_object+0x47/0x98
 �� 11 19:56:14 u-Latitude kernel: acpi_ex_opcode_1A_0T_1R+0x3de/0x556
 �� 11 19:56:14 u-Latitude kernel: acpi_ds_exec_end_op+0x166/0x76b
 �� 11 19:56:14 u-Latitude kernel: acpi_ps_parse_loop+0x84b/0x924
 �� 11 19:56:14 u-Latitude kernel: acpi_ps_parse_aml+0x1af/0x550
 �� 11 19:56:14 u-Latitude kernel: acpi_ps_execute_method+0x208/0x2ca
 �� 11 19:56:14 u-Latitude kernel: acpi_ns_evaluate+0x34e/0x4f0
 �� 11 19:56:14 u-Latitude kernel: acpi_evaluate_object+0x18e/0x3b4
 �� 11 19:56:14 u-Latitude kernel: wmidev_evaluate_method+0x10f/0x140 [wmi]
 �� 11 19:56:14 u-Latitude kernel: run_smbios_call+0x6a/0x190 [dell_smbios]
 �� 11 19:56:14 u-Latitude kernel: dell_smbios_wmi_call+0x8c/0xe0 [dell_smbios]
 �� 11 19:56:14 u-Latitude kernel: dell_smbios_call+0x73/0xb0 [dell_smbios]
 �� 11 19:56:14 u-Latitude kernel: dell_rfkill_query+0x46/0xf0 [dell_laptop]
 �� 11 19:56:14 u-Latitude kernel: rfkill_set_block+0x36/0x150
 �� 11 19:56:14 u-Latitude kernel: rfkill_fop_write+0x136/0x1e0
 �� 11 19:56:14 u-Latitude kernel: vfs_write+0xca/0x280
 �� 11 19:56:1...

Read more...

Revision history for this message
perry_yuan (perry-yuan) wrote :

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/platform/x86/dell/dell-wmi-sysman?h=v5.13-rc6

there are some upstream fix patches need to backport for dell-wmi-sysman driver.
I would suggest that we run test on 5.13-rc6 and confirm if upstream already has fix

Revision history for this message
Yuan-Chen Cheng (ycheng-twn) wrote :

use kernel 1031, test on machine Dell Latitude 5300
reboot for count 489, can't reproduce this issue.

previously it reboots failure around 200th reboots.

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

I test ANDW-DVT2-C1 (202012-28526) with kernel v5.13-rc6 from https://kernel.ubuntu.com/~kernel-ppa/mainline/ and secure boot disable.

I can't reproduce #2. I'll check if 5.10-oem kernel contains all commits mentioned in #4.
Attached my check script for reference.

Revision history for this message
Scott Hu (huntu207) wrote :

Can not reproduce on QA side with 1031 kernel over than 350 times

SKU: ANDW-DVT2-C3
Image: canonical-oem-somerville-focal-amd64-20200502-85+fossa-mewtwo+X101
bios-version: 1.3.0
kernel:5.10.0-1031-oem

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote :

@acelankao

I compare the 5.10.0-1031-oem kernel and 5.13-rc6. There are three commits mentioned by #4, which is not in 5.10.0-1031-oem. Dell's perry suggest we sync with the latest Dell wmi driver.

Could you help to check if we need those changes?

platform/x86: wmi: Make remove callback return void
platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
platform/x86: dell-wmi-sysman: Make it safe to call exit_foo_attributes() multiple times

Thanks,

Revision history for this message
Kai-Chuan Hsieh (kchsieh) wrote (last edit ):

I create a separated bug for #2, https://bugs.launchpad.net/somerville/+bug/1932099.
It is because the stress I perform is connected to WD19TB, which is not the original test scenario of the bug.

Revision history for this message
AceLan Kao (acelankao) wrote :

KC,

5.13-oem kernel already has the 3 commits you mentioned on comment #8
Review your log again, I feel like it's more like a h/w issue(memory) while allocating resource. It fails on different drivers.

Revision history for this message
AceLan Kao (acelankao) wrote :

2d0c418c91d8c platform/x86: dell-wmi-sysman: Make it safe to call exit_foo_attributes() multiple times
c59ab4cedab70 platform/x86: dell-wmi-sysman: Fix possible NULL pointer deref on exit
2b329f5694aec platform/x86: wmi: Make remove callback return void

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

marking verified according to #7

tags: added: verification-done-focal
removed: verification-needed-focal
Changed in linux-oem-5.10 (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.10 - 5.10.0-1032.33

---------------
linux-oem-5.10 (5.10.0-1032.33) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1032.33 -proposed tracker (LP: #1932138)

  * Mute/Mic mute LEDs and right speaker are not work on HP platforms
    (LP: #1932055)
    - ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Elite Dragonfly
      G2
    - ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP EliteBook x360
      1040 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP EliteBook 840 Aero G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP ZBook Power G8

 -- Timo Aaltonen <email address hidden> Wed, 16 Jun 2021 15:27:58 +0300

Changed in linux-oem-5.10 (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.