[SRU][Zesty] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64

Bug #1696570 reported by Manoj Iyer
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Critical
Manoj Iyer
Zesty
Fix Released
Critical
Canonical Kernel Team

Bug Description

[Impact]
Adds UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64.

[Test]
Run mce-test for testing RAS features.

[Fix]
In maintainer (Will Deacon's) tree https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/ras-apei

[V17,01/11] acpi: apei: read ack upon ghes record consumption
[V17,02/11] ras: acpi/apei: cper: add support for generic data v3 structure
[V17,03/11] cper: add timestamp print to CPER status printing
[V17,04/11] efi: parse ARM processor error
[V17,05/11] arm64: exception: handle Synchronous External Abort
[V17,06/11] acpi: apei: handle SEA notification type for ARMv8
[V17,07/11] acpi: apei: panic OS with fatal error status block
[V17,08/11] efi: print unrecognized CPER section
[V17,09/11] ras: acpi / apei: generate trace event for unrecognized CPER section
[V17,10/11] trace, ras: add ARM processor error trace event
[V17,11/11] arm/arm64: KVM: add guest SEA support

[Regression Potential]
Patches deal with updates for RAS features on ARM64 with minor impact to generic code.Kernel was boot tested on ARM64, AMD64 and Power8 and no regressions were found.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1696570

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Manoj Iyer (manjo)
description: updated
Revision history for this message
Manoj Iyer (manjo) wrote :

Test kernel for RAS patches available in https://launchpad.net/~centriq-team/+archive/ubuntu/lp1696570

Revision history for this message
Manoj Iyer (manjo) wrote :

Kernel was boot tested on QDF2400 ARM64 server.

ubuntu@ubuntu:~$ uname -a
Linux ubuntu 4.10.0-22-generic #24~lp1696570+ras.1-Ubuntu SMP Thu Jun 8 20:38:16 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@ubuntu:~$

Revision history for this message
Manoj Iyer (manjo) wrote :

Kernel was boot tested on Dell PowerEdge T710 server.

ubuntu@adib:~$ uname -a
Linux adib 4.10.0-22-generic #24~lp1696570+ras.1-Ubuntu SMP Thu Jun 8 20:38:28 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@adib:~$

Revision history for this message
Manoj Iyer (manjo) wrote :

Kernel was boot tested on Power8 server.

ubuntu@manjo-srutest:~$ uname -a
Linux manjo-srutest 4.10.0-22-generic #24~lp1696570+ras.1-Ubuntu SMP Thu Jun 8 20:36:57 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux
ubuntu@manjo-srutest:~$

description: updated
Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: Incomplete → Fix Committed
Manoj Iyer (manjo)
Changed in linux (Ubuntu Zesty):
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Tyler Baicar (tbaicar) wrote :
Download full text (13.6 KiB)

Testing results on QDF2400 showing a recoverable DDR error, correctable vendor specific error, correctable ARM cache error, and fatal vendor specific error. All functionality appears to be working properly.

ubuntu@null-8cfdf006a3ef:~$ uname -a
Linux null-8cfdf006a3ef 4.10.0-29-generic #33~lp1706141+build.2-Ubuntu SMP Tue Jul 25 19:12:22 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux

ubuntu@null-8cfdf006a3ef:~$ dmesg | grep -i -E 'hest|ghes|edac|hardware'
[ 0.000000] ACPI: HEST 0x0000000008A60000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515)
[ 0.538984] HEST: Table parsing has been initialized.
[ 3.854385] EDAC MC: Ver: 3.0.0
[ 5.537078] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports.
[ 5.545952] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly.
[ 5.554123] ghes_edac: So, the end result of using this driver varies from vendor to vendor.
[ 5.562555] ghes_edac: If you find incorrect reports, please contact your hardware vendor
[ 5.570727] ghes_edac: to correct its BIOS.
[ 5.574905] ghes_edac: This system has 6 DIMM sockets.
[ 5.580205] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.589763] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.599319] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.608867] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.618416] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.628018] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
[ 6.573372] qcom-emac QCOM8070:00 eth0: hardware id 64.1, hardware version 1.3.0
[ 224.669058] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 224.677330] {1}[Hardware Error]: event severity: recoverable
[ 224.682992] {1}[Hardware Error]: precise tstamp: 2017-07-26 15:58:19
[ 224.689437] {1}[Hardware Error]: Error 0, type: recoverable
[ 224.695097] {1}[Hardware Error]: section_type: memory error
[ 224.700846] {1}[Hardware Error]: error_status: 0x00000000000c0400
[ 224.707113] {1}[Hardware Error]: physical_address: 0x0000000000204e10
[ 224.713726] {1}[Hardware Error]: physical_address_mask: 0x00000fffffffffff
[ 224.720776] {1}[Hardware Error]: node: 0 card: 1 module: 0 rank: 0 bank: 0 device: 0 row: 4 column: 306
[ 224.730427] {1}[Hardware Error]: error_type: 3, multi-bit ECC
[ 224.736356] EDAC MC0: 1 UE Multi-bit ECC on unknown label (node:0 card:1 module:0 rank:0 bank:0 row:4 col:306 page:0x204 offset:0xe10 grain:-4096 - status(0x00000000000c0400): Storage error in DRAM memory)
[ 224.736358] [Firmware Warn]: GHES: Invalid address in generic error data: 0x204e10
[ 251.685322] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
[ 251.685324] {2}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 251.685336] {2}[Hardware Error]: event severity: corrected
[ 251.685341] {2}[Hardware Error]: precise ts...

Changed in linux (Ubuntu Zesty):
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.12.0-11.12

---------------
linux (4.12.0-11.12) artful; urgency=low

  * linux: 4.12.0-11.12 -proposed tracker (LP: #1709929)

  * CVE-2017-1000111
    - packet: fix tp_reserve race in packet_set_ring

  * CVE-2017-1000112
    - udp: consistently apply ufo or fragmentation

  * Please only recommend or suggest initramfs-tools | linux-initramfs-tool for
    kernels able to boot without initramfs (LP: #1700972)
    - Revert "UBUNTU: [Debian] Don't depend on initramfs-tools"
    - [Debian] Don't depend on initramfs-tools

  * Miscellaneous Ubuntu changes
    - SAUCE: (noup) Update spl to 0.6.5.11-ubuntu1, zfs to 0.6.5.11-1ubuntu3
    - SAUCE: powerpc: Always initialize input array when calling epapr_hypercall()

  * Miscellaneous upstream changes
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

 -- Seth Forshee <email address hidden> Thu, 10 Aug 2017 13:37:00 -0500

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Manoj Iyer (manjo)
tags: added: verification-done-zesty
removed: verification-needed-zesty
Revision history for this message
Manoj Iyer (manjo) wrote :

ubuntu@awrep4:~$ dmesg | grep -i -E 'hest|ghes|edac|hardware'
[ 0.000000] ACPI: HEST 0x0000000008EC0000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515)
[ 0.620910] HEST: Table parsing has been initialized.
[ 4.178101] EDAC MC: Ver: 3.0.0
[ 5.636210] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports.
[ 5.644811] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly.
[ 5.652965] ghes_edac: So, the end result of using this driver varies from vendor to vendor.
[ 5.661385] ghes_edac: If you find incorrect reports, please contact your hardware vendor
[ 5.669545] ghes_edac: to correct its BIOS.
[ 5.673711] ghes_edac: This system has 12 DIMM sockets.
[ 5.679105] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.688460] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.697918] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.707375] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.716838] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
[ 5.726342] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
[ 7.020635] qcom-emac QCOM8070:00 eth0: hardware id 64.1, hardware version 1.3.0
ubuntu@awrep4:~$

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 4.10.0-33.37

---------------
linux (4.10.0-33.37) zesty; urgency=low

  * linux: 4.10.0-33.37 -proposed tracker (LP: #1709303)

  * CVE-2017-1000112
    - Revert "udp: consistently apply ufo or fragmentation"
    - udp: consistently apply ufo or fragmentation

  * CVE-2017-1000111
    - Revert "net-packet: fix race in packet_set_ring on PACKET_RESERVE"
    - packet: fix tp_reserve race in packet_set_ring

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - irqchip/gic-v3: Add missing system register definitions
    - arm64: KVM: Do not use stack-protector to compile EL2 code
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - [Config] CONFIG_CAVIUM_ERRATUM_30115=y
    - arm64: Add workaround for Cavium Thunder erratum 30115
    - KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
    - KVM: arm64: Enable GICv3 common sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
    - arm64: KVM: Make unexpected reads from WO registers inject an undef
    - KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
    - KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

  * ibmvscsis: Do not send aborted task response (LP: #1689365)
    - target: Fix unknown fabric callback queue-full errors
    - ibmvscsis: Do not send aborted task response
    - ibmvscsis: Clear left-over abort_cmd pointers
    - ibmvscsis: Fix the incorrect req_lim_delta

  * hisi_sas performance improvements (LP: #1708734)
    - scsi: hisi_sas: define hisi_sas_device.device_id as int
    - scsi: hisi_sas: optimise the usage of hisi_hba.lock
    - scsi: hisi_sas: relocate sata_done_v2_hw()
    - scsi: hisi_sas: optimise DMA slot memory

  * hisi_sas...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.