Comment 11 for bug 1956519

Revision history for this message
Matthew Ruffell (mruffell) wrote : Re: kernel panic after upgrading to kernel 5.13.0-23

Hi Vadik, Oliver,

Thanks for reporting, and sorry that 5.13.0-24-generic in -proposed didn't solve the issue.

Let's do some analysis:

[ 1.381250] BUG: kernel NULL pointer dereference, address: 000000000000000c
[ 1.381270] RIP: 0010:amd_sfh_hid_client_init+0x47/0x350 [amd_sfh]
[ 1.381299] Call Trace:
[ 1.381302] ? __pci_set_master+0x5f/0xe0
[ 1.381310] amd_mp2_pci_probe+0xad/0x160 [amd_sfh]
[ 1.381314] local_pci_probe+0x48/0x80
...

Okay, so a null pointer dereference in the amd_sfh module. The c in 000000000000000c probably means offset +12 in the struct we are trying to access.

Let's see where this is:

$ eu-addr2line -ifae ./usr/lib/debug/lib/modules/5.13.0-23-generic/kernel/drivers/hid/amd-sfh-hid/amd_sfh.ko amd_sfh_hid_client_init+0x47
0x0000000000000767
amd_sfh_hid_client_init
/build/linux-k2e9CH/linux-5.13.0/drivers/hid/amd-sfh-hid/amd_sfh_client.c:147:27

Let's have a look:

134 int amd_sfh_hid_client_init(struct amd_mp2_dev *privdata)
135 {
...
146
147 cl_data->num_hid_devices = amd_mp2_get_sensor_num(privdata, &cl_data->sensor_idx[0]);
148
...

Okay, so we are dereferencing either cl_data->num_hid_devices or &cl_data->sensor_idx[0], but they are both in cl_data, so cl_data will be NULL.

Since you mentioned that it worked in 5.13.0-22-generic, and broke in 5.13.0-23-generic, lets see if this changed in 5.13.0-23-generic:

$ git log --grep "amd_sfh" Ubuntu-5.13.0-22.22..Ubuntu-5.13.0-23.23
commit d46ef750ed58cbeeba2d9a55c99231c30a172764
commit-impish 56559d7910e704470ad72da58469b5588e8cbf85
Author: Evgeny Novikov <email address hidden>
Date: Tue Jun 1 19:38:01 2021 +0300
Subject:HID: amd_sfh: Fix potential NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/d46ef750ed58cbeeba2d9a55c99231c30a172764

Okay, so this patch changes the parent function to amd_sfh_hid_client_init(), which is amd_mp2_pci_probe().

+ rc = amd_sfh_hid_client_init(privdata);
+ if (rc)
+ return rc;
+
        privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
        if (!privdata->cl_data)
                return -ENOMEM;
...
- return amd_sfh_hid_client_init(privdata);
+ return 0;

So it seems we are moving the call to amd_sfh_hid_client_init(privdata) from the end of the function up a bit, and interestingly, before the call to privdata->cl_data = devm_kzalloc().

So... we are using privdata->cl_data before it is being allocated? Looks like we have found our NULL pointer dereference.

I suppose the commit to "fix" the null pointer dereference actually introduced another one.

Looking at this commit in the upstream tree, I came across:

commit 88a04049c08cd62e698bc1b1af2d09574b9e0aee
Author: Basavaraj Natikar <email address hidden>
Date: Thu Sep 23 17:59:27 2021 +0530
Subject: HID: amd_sfh: Fix potential NULL pointer dereference
Link: https://github.com/torvalds/linux/commit/88a04049c08cd62e698bc1b1af2d09574b9e0aee

This patch seems to move the call to after cl_data is allocated, which should fix this.

- rc = amd_sfh_hid_client_init(privdata);
- if (rc)
- return rc;
-
        privdata->cl_data = devm_kzalloc(&pdev->dev, sizeof(struct amdtp_cl_data), GFP_KERNEL);
        if (!privdata->cl_data)
                return -ENOMEM;

- rc = devm_add_action_or_reset(&pdev->dev, amd_mp2_pci_remove, privdata);
+ mp2_select_ops(privdata);
+
+ rc = amd_sfh_hid_client_init(privdata);

This commit landed in 5.15-rc4:

$ git describe --contains 88a04049c08cd62e698bc1b1af2d09574b9e0aee
v5.15-rc4~40^2

It seems it was backported to 5.14.10:

https://lwn.net/Articles/872195/

Impish should have gotten 5.14.10 during its regular upstream -stable patches:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1950388

The commit is listed there, but when I search the Impish git tree, it is missing?

I think what has happened is the two commits have the same name, and Kamal must have gotten confused and thought it was a duplicate, and dropped it.

Here's what we are going to do.

I will build you a test kernel based on 5.13.0-23-generic, that includes Basavaraj Natikar's fix, and I will provide instructions on how to install it. You can test it to make sure it fixes the issue, and if it does, I will submit the patch for SRU to the 5.13 kernel.

I will write back once the test kernel has finished building, probably tomorrow.

Thanks,
Matthew