[EHL] CRB FAB-B Rev-201, QWW1: kernel NULL pointer dereference in intel_pinctrl_get_soc_data

Bug #1928328 reported by You-Sheng Yang
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
intel
Fix Released
Undecided
Unassigned

Bug Description

This happens every time booting. Reproducible with Canonical linux-intel (from ppa:canonical-kernel-team/ppa, focal pocket) and mainline v5.13-rc1. Elkhart Lake CRB FAB-B Rev-201, CPU QWW1, BIOS EHLSFWI1.R00.3044.A01.2101210945 01/21/2021.

This also causes a few seconds pause during the boot process, and blocks reboot process.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP NOPTI
 CPU: 2 PID: 168 Comm: systemd-udevd Not tainted 5.11.0-1003-intel #3-Ubuntu
 Hardware name: Intel Corporation Elkhart Lake Embedded Platform/ElkhartLake LPDDR4x T3 CRB, BIOS EHLSFWI1.R00.3044.A01.2101210945 01/21/2021
 RIP: 0010:strcmp+0xc/0x20
 Code: 06 49 89 f8 48 83 c6 01 48 83 c7 01 88 47 ff 84 c0 75 eb 4c 89 c0 c3 0f 1f 80 00 00 00 00 31 c0 eb 08 48 83 c0 01 84 d2 74 0f <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 31 c0 c3 66 90 48 85
 RSP: 0018:ffffa09e0050ba90 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa09e0050ba18
 RDX: 0000000000000000 RSI: ffffffffc00dfc7b RDI: 0000000000000000
 RBP: ffffa09e0050bab8 R08: 0000000000000000 R09: ffffa09e0050b808
 R10: ffff8ebf0c3d33ff R11: 0000000000000000 R12: ffffffffc00e1c20
 R13: ffffffffc00e30e0 R14: 0000000000000000 R15: 0000000000000002
 FS: 00007ff28eb16880(0000) GS:ffff8ebfe4300000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 0000000102658000 CR4: 0000000000350ee0
 Call Trace:
  ? intel_pinctrl_get_soc_data+0x67/0xc0
  intel_pinctrl_probe_by_uid+0x13/0x30
  platform_probe+0x45/0xa0
  really_probe+0xfb/0x420
  driver_probe_device+0xe9/0x160
  device_driver_attach+0x5d/0x70
  __driver_attach+0x8f/0x150
  ? device_driver_attach+0x70/0x70
  bus_for_each_dev+0x7e/0xc0
  driver_attach+0x1e/0x20
  bus_add_driver+0x152/0x1f0
  driver_register+0x74/0xd0
  ? 0xffffffffc00e6000
  __platform_driver_register+0x1e/0x20
  ehl_pinctrl_driver_init+0x1c/0x1000 [pinctrl_elkhartlake]
  do_one_initcall+0x48/0x1d0
  ? _cond_resched+0x19/0x30
  ? kmem_cache_alloc_trace+0x380/0x430
  ? do_init_module+0x28/0x250
  do_init_module+0x62/0x250
  load_module+0x11aa/0x1370
  ? security_kernel_post_read_file+0x5c/0x70
  ? security_kernel_post_read_file+0x5c/0x70
  __do_sys_finit_module+0xc2/0x120
  ? __do_sys_finit_module+0xc2/0x120
  __x64_sys_finit_module+0x1a/0x20
  do_syscall_64+0x38/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7ff28f09889d
 Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 f5 0c 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffd618ed1a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
 RAX: ffffffffffffffda RBX: 0000563511b5c430 RCX: 00007ff28f09889d
 RDX: 0000000000000000 RSI: 00007ff28ef75ded RDI: 0000000000000005
 RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000005 R11: 0000000000000246 R12: 00007ff28ef75ded
 R13: 0000000000000000 R14: 0000563511b603c0 R15: 0000563511b5c430
 Modules linked in: pinctrl_elkhartlake(+)
 CR2: 0000000000000000
 ---[ end trace 2004c11402add586 ]---
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu27.17
Architecture: amd64
CasperMD5CheckResult: skip
DistroRelease: Ubuntu 20.04
InstallationDate: Installed on 2021-05-13 (0 days ago)
InstallationMedia: Ubuntu 20.04.2 LTS "Focal Fossa" - Release amd64 (20210204)
Package: linux (not installed)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: focal
Uname: Linux 5.13.0-051300rc1-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True

X-HWE-Bug: Bug #1950721

Revision history for this message
You-Sheng Yang (vicamo) wrote :
Revision history for this message
You-Sheng Yang (vicamo) wrote : ProcCpuinfoMinimal.txt

apport information

tags: added: apport-collected focal
description: updated
Revision history for this message
You-Sheng Yang (vicamo) wrote :
tags: added: ehl linux-intel
Revision history for this message
You-Sheng Yang (vicamo) wrote :
Revision history for this message
pragyansri.pathi@intel.com (pragyan) wrote :

This is a known BIOS issue. No current plan timeline available to resolve the issue.

Anthony and Canonical team– Please file IPS ticket through the system. This will go to IoTG.

How to file IPS tickets?
1. Go to https://premiersupport.intel.com/IPS/home/home.jsp
2. Click on Premier Support Projects (On top)
3. Click on Create NEW on the left (my screen shot looks like this)
4. Ensure to fill in the field “Premier Support Project Name” = Canonical EHL - for EHL

Revision history for this message
Anthony Wong (anthonywong) wrote :

I will report this to IPS once the EHL project there is created.

My questions are:
1. is this a known issue on all EHL board revisions?
2. how is this problem be dealt with on production systems? Should we disable pinctrl driver, or need ODM/OEM to modify their BIOS?
3. are there any patches that we should apply on our IOTG kernel to avoid this issue?

Revision history for this message
pragyansri.pathi@intel.com (pragyan) wrote :

@anthony - Is there a IPS Ticket filed?

Revision history for this message
Anthony Wong (anthonywong) wrote :
Revision history for this message
Anthony Wong (anthonywong) wrote :

Shafin Intel Technical Specialist 9/30/2021 3:22 PM
Hi Anthony,

We are able to provide you an updated IFWI image that incorporates the new GPIO/PinCtrl patch cdrdv2.intel.com/v1/dl/getContent/655454/655453?filename=655453.zip. This image is intended only for testing and confirming a fix for this issue, and not to be shared. The official public IFWI image will be available in the MR2 release.

To enable the patch:
- Flash the EHL_GPIO_Patched_TSN_EXT.bin file onto the EHL CRB
- Enter the BIOS and set the PinCtrl Driver GPIO Scheme to <Enabled>
    BIOS -> Intel Advanced Menu -> PCH IO Configuration -> PinCtrl Driver GPIO Scheme (located at the bottom of the menu)

Please let us know if you have any questions or require any assistance.

Thanks,
Shafin

Changed in intel:
status: New → Triaged
Revision history for this message
Anthony Wong (anthonywong) wrote :

[2021-10-27]

The pinctrl feature will be integrated to Intel released BIOS at MR2. You can refer to Elkhart Lake Gold Deck (Doc ID: 606615) on the software release schedule, Yocto (and BIOS) MR2 release is trending WW51 - WW02'22).

Thanks.
Kim Hoe

Revision history for this message
Anthony Wong (anthonywong) wrote :

From Intel:

We are able to provide you an updated IFWI image that incorporates the new GPIO/PinCtrl patch cdrdv2.intel.com/v1/dl/getContent/655454/655453?filename=655453.zip. This image is intended only for testing and confirming a fix for this issue, and not to be shared. The official public IFWI image will be available in the MR2 release.

To enable the patch:
- Flash the EHL_GPIO_Patched_TSN_EXT.bin file onto the EHL CRB
- Enter the BIOS and set the PinCtrl Driver GPIO Scheme to <Enabled>
    BIOS -> Intel Advanced Menu -> PCH IO Configuration -> PinCtrl Driver GPIO Scheme (located at the bottom of the menu)

Revision history for this message
Anthony Wong (anthonywong) wrote :

Please try the test kernel in https://people.canonical.com/~ypwong/lp1928328/ after BIOS is updated.

Revision history for this message
You-Sheng Yang (vicamo) wrote :

That BIOS setting + Anthony's test kernel works for me. No more kernel NULL ptr dereference in dmesg, and checkbox hw info test passes normally.

description: updated
Ana Lasprilla (anamlt)
Changed in intel:
milestone: none → sprint4
Ana Lasprilla (anamlt)
Changed in intel:
status: Triaged → Fix Committed
Revision history for this message
Pierre Equoy (pieq) wrote :

Tested with the following HW and images:

SKU: Aaeon EHL
CID: 202109-29496
BIOS: UNEHAM11 (v5.19, 2021-10-29)
Images:
- 20211202.2 (desktop 20.04 with kernel 5.13.0-1008-intel)
- 20211201.4 (core 20 with kernel 5.13.0-1008-intel)

Cannot reproduce the issue.

Moreover, the GPIO are working as expected. I used a new Checkbox test to check a loopback connection on two GPIO pins successfully: https://code.launchpad.net/~pieq/plainbox-provider-checkbox/+git/plainbox-provider-checkbox/+merge/412924

→ marking as `cqa-verified`.

tags: added: cqa-verified
Ana Lasprilla (anamlt)
Changed in intel:
status: Fix Committed → Fix Released
information type: Private → Public
Revision history for this message
Kent Lin (kent-jclin) wrote :

EHL board need to include Firmware BKC MR2 Release (Includes IFWI v3441_01): ID:685308 in BIOS.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.