[SRU][Zesty] acpi: apei: check for pending errors when probing GHES entries

Bug #1698448 reported by Manoj Iyer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Critical
Manoj Iyer

Bug Description

[Impact]
In addition to the RAS patches for which SRU were submitted under bug https://launchpad.net/bugs/1696570. We also require the following patch:
f561618d9b80 acpi: apei: check for pending errors when probing GHES entries

Without this patch pending RAS error at boot may not be handled correctly on the QDF2400 platforms, and could cause platform to reboot, or prevent future RAS issues from being reported.

[Test]
Run mce-tests suite. Inject an error before the kernel boots to reproduce the issue.

[Fix]
Fix is available in the linux ARM64 git repo maintained by Will Deacon:
f561618d9b80 acpi: apei: check for pending errors when probing GHES entries. This patch needs to be applied on top of the RAS patches that were already submitted for SRU

[Regression Potential]
Small fix to handle pending errors right away. Low risk of regression.

Tags: qdf2400 zesty
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1698448

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: zesty
Manoj Iyer (manjo)
description: updated
Revision history for this message
Manoj Iyer (manjo) wrote : could you please help test kernel for bug #1698448

Jeff,

I have a kernel in PPA:
https://launchpad.net/~centriq-team/+archive/ubuntu/lp1698448/ that has
the patch for bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448. PPA
description has instructions for installing this kernel.

Tyler suggests that we will need to inject and error prior to kernel
boot to reproduce this issue ( I guess pulling a pcie card after fw
initialization and before kernel boot might do it? ). I would need to
have physical access to the system to do that kind of testing. The
QDF2400 systems we have are in a data center which limits my ability to
test this patch. Could someone from your team please test the patch
using the test kernel in the PPA and post comments in the Ubuntu bug
referenced above? FYI June 23rd is the deadline for submitting patches
for the current SRU cycle.

Thanks a ton!
Manoj Iyer

Revision history for this message
Jeffrey Hugo (jhugo-o) wrote :

Kernel verified.

Jeffrey Hugo
Senior Engineer
Qualcomm Datacenter Technologies, Inc.
1-303-247-5002

From: Manoj Iyer [mailto:<email address hidden>]
Sent: Monday, June 19, 2017 11:42 AM
To: Jeff Hugo <email address hidden>
Cc: <email address hidden>; Timur Tabi <email address hidden>; Tyler Baicar <email address hidden>
Subject: could you please help test kernel for bug #1698448

Jeff,

I have a kernel in PPA: https://launchpad.net/~centriq-team/+archive/ubuntu/lp1698448/ that has the patch for bug: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1698448. PPA description has instructions for installing this kernel.

Tyler suggests that we will need to inject and error prior to kernel boot to reproduce this issue ( I guess pulling a pcie card after fw initialization and before kernel boot might do it? ). I would need to have physical access to the system to do that kind of testing. The QDF2400 systems we have are in a data center which limits my ability to test this patch. Could someone from your team please test the patch using the test kernel in the PPA and post comments in the Ubuntu bug referenced above? FYI June 23rd is the deadline for submitting patches for the current SRU cycle.

Thanks a ton!
Manoj Iyer

Revision history for this message
Manoj Iyer (manjo) wrote :

Boot tested on ARM64 QDF2400:
ubuntu@ubuntu:~$ uname -a
Linux ubuntu 4.10.0-22-generic #24~lp1698448+checkerrors.1-Ubuntu SMP Fri Jun 16 20:16:10 UTC 2 aarch64 aarch64 aarch64 GNU/Linux

Boot tested on Power8:
ubuntu@manjo-srutest:~$ uname -a
Linux manjo-srutest 4.10.0-22-generic #24~lp1698448+checkerrors.1-Ubuntu SMP Fri Jun 16 20:15:17 UTC 2 ppc64le ppc64le ppc64le GNU/Linux

Boot tested on AMD64:
ubuntu@adib:~$ uname -a
Linux adib 4.10.0-22-generic #24~lp1698448+checkerrors.1-Ubuntu SMP Fri Jun 16 20:15:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.