[Artful/Zesty] ACPI APEI error handling bug fixes

Bug #1732990 reported by Manoj Iyer on 2017-11-17
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Manoj Iyer
Zesty
Critical
Manoj Iyer
Artful
Critical
Manoj Iyer

Bug Description

[Impact]
Error records which have multiple errors in them will incorrectly report all errors after the first one. This results in garbage non-standard error trace events to be generated, and for AER and MC errors there will be no kernel action to help recover from these errors in the AER and EDAC drivers.

[Fix]
Patches in Linus tree fixes this issue:
aaf2c2fb0f51 ACPI / APEI: clear error status before acknowledging the error
c4335fdd3822 ACPI: APEI: fix the wrong iteration of generic error status block

[Testing]
Insert a e1000 pcie card into the system, run the following command that should generate PCIe correctable errors, you will see only the first error in each GHES report go to the AER driver rather than all errors from the GHES reports.

$ sudo setpci -s 0002:00:00.0 0x70c.l=0x00808000;sudo setpci -s 0002:00:00.0 CAP_EXP+0x10.B=0x4b;sleep 1;sudo setpci -s 0002:00:00.0 CAP_EXP+0x10.B=0x48

Where "0002:00:00.0" being the root hub for the card.

Used JTAG to trigger multiple concurrent errors, and observed that all errors were parsed, instead of just the first one. As mentioned in comment #3. So, the poster of comment #3 will do the verification once the patch lands in -proposed.

[Regression Potential]
The two patches to ACPI APEI driver was cleanly cherry picked from linus's tree and applied to Artful and Zesty. The patches were tested on QDF2400 platform where it was found to issue and don't introduce any regressions.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1732990

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Jeffrey Hugo (jhugo-o) wrote :

Validated the test kernel in the PPA by installing it onto a QDF2400 device running 16.04.3

Used JTAG to trigger multiple concurrent errors, and observed that all errors were parsed, instead of just the first one:

root@ubuntu:/home/ubuntu# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
          <idle>-0 [024] .ns. 462.341863: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510fc001; running state: 1; PSCI state: 0
  97-overlayroot-2510 [024] ..s. 1010.705078: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510fc001; running state: 1; PSCI state: 0
  97-overlayroot-2510 [024] ..s. 1010.705081: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510fc001; running state: 1; PSCI state: 0
  97-overlayroot-2510 [024] ..s. 1010.705082: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510fc001; running state: 1; PSCI state: 0
  97-overlayroot-2510 [024] ..s. 1010.705083: arm_event: affinity level: 2; MPIDR: 0000000000000000; MIDR: 00000000510fc001; running state: 1; PSCI state: 0

The fixes to the PPA kernel address the issue.

Manoj Iyer (manjo) on 2017-11-20
Changed in linux (Ubuntu):
assignee: Manoj Iyer (manjo) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Zesty):
status: New → In Progress
Changed in linux (Ubuntu Artful):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
importance: Undecided → Critical
Changed in linux (Ubuntu Artful):
importance: Undecided → Critical
Changed in linux (Ubuntu Zesty):
assignee: nobody → Manoj Iyer (manjo)
Changed in linux (Ubuntu Artful):
assignee: nobody → Manoj Iyer (manjo)
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Manoj Iyer (manjo)
Manoj Iyer (manjo) on 2017-11-24
description: updated
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Manoj Iyer (manjo) on 2018-02-13
Changed in linux (Ubuntu Zesty):
status: In Progress → Won't Fix
Changed in linux (Ubuntu):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-artful' to 'verification-done-artful'. If the problem still exists, change the tag 'verification-needed-artful' to 'verification-failed-artful'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-artful
Launchpad Janitor (janitor) wrote :
Download full text (20.1 KiB)

This bug was fixed in the package linux - 4.13.0-36.40

---------------
linux (4.13.0-36.40) artful; urgency=medium

  * linux: 4.13.0-36.40 -proposed tracker (LP: #1750010)

  * Rebuild without "CVE-2017-5754 ARM64 KPTI fixes" patch set

linux (4.13.0-35.39) artful; urgency=medium

  * linux: 4.13.0-35.39 -proposed tracker (LP: #1748743)

  * CVE-2017-5715 (Spectre v2 Intel)
    - Revert "UBUNTU: SAUCE: turn off IBPB when full retpoline is present"
    - SAUCE: turn off IBRS when full retpoline is present
    - [Packaging] retpoline files must be sorted
    - [Packaging] pull in retpoline files

linux (4.13.0-34.37) artful; urgency=medium

  * linux: 4.13.0-34.37 -proposed tracker (LP: #1748475)

  * libata: apply MAX_SEC_1024 to all LITEON EP1 series devices (LP: #1743053)
    - libata: apply MAX_SEC_1024 to all LITEON EP1 series devices

  * KVM patches for s390x to provide facility bits 81 (ppa15) and 82 (bpb)
    (LP: #1747090)
    - KVM: s390: wire up bpb feature

  * artful 4.13 i386 kernels crash after memory hotplug remove (LP: #1747069)
    - Revert "mm, memory_hotplug: do not associate hotadded memory to zones until
      online"

  * CVE-2017-5715 (Spectre v2 Intel)
    - x86/feature: Enable the x86 feature to control Speculation
    - x86/feature: Report presence of IBPB and IBRS control
    - x86/enter: MACROS to set/clear IBRS and set IBPB
    - x86/enter: Use IBRS on syscall and interrupts
    - x86/idle: Disable IBRS entering idle and enable it on wakeup
    - x86/idle: Disable IBRS when offlining cpu and re-enable on wakeup
    - x86/mm: Set IBPB upon context switch
    - x86/mm: Only set IBPB when the new thread cannot ptrace current thread
    - x86/entry: Stuff RSB for entry to kernel for non-SMEP platform
    - x86/kvm: add MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD to kvm
    - x86/kvm: Set IBPB when switching VM
    - x86/kvm: Toggle IBRS on VM entry and exit
    - x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature
    - x86/spec_ctrl: Add lock to serialize changes to ibrs and ibpb control
    - x86/cpu/AMD: Add speculative control support for AMD
    - x86/microcode: Extend post microcode reload to support IBPB feature
    - KVM: SVM: Do not intercept new speculative control MSRs
    - x86/svm: Set IBRS value on VM entry and exit
    - x86/svm: Set IBPB when running a different VCPU
    - KVM: x86: Add speculative control CPUID support for guests
    - SAUCE: turn off IBPB when full retpoline is present

  * Artful 4.13 fixes for tun (LP: #1748846)
    - tun: call dev_get_valid_name() before register_netdevice()
    - tun: allow positive return values on dev_get_valid_name() call
    - tun/tap: sanitize TUNSETSNDBUF input

  * boot failure on AMD Raven + WestonXT (LP: #1742759)
    - SAUCE: drm/amdgpu: add atpx quirk handling (v2)

linux (4.13.0-33.36) artful; urgency=low

  * linux: 4.13.0-33.36 -proposed tracker (LP: #1746903)

  [ Stefan Bader ]
  * starting VMs causing retpoline4 to reboot (LP: #1747507) // CVE-2017-5715
    (Spectre v2 retpoline)
    - x86/retpoline: Fill RSB on context switch for affected CPUs
    - x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
    - x86/retpol...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers