[SRU][Zesty] arm64: Add support for handling memory corruption

Bug #1696852 reported by Manoj Iyer on 2017-06-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Manoj Iyer
Zesty
High
Canonical Kernel Team

Bug Description

[Impact]
Enable memory corruption handling for arm64

[Test]
run mce-test mce-test/cases/function/hwpoison

[Fix]
[0] https://<email address hidden>/msg1376052.html
[1] https://www.spinics.net/lists/arm-kernel/msg581657.html
[1] https://lkml.org/lkml/2017/4/7/486
[2] https://lkml.org/lkml/2017/4/5/402

Jonathan (Zhixiong) Zhang (2):
  arm64: hwpoison: add VM_FAULT_HWPOISON[_LARGE] handling
  arm64: kconfig: allow support for memory failure handling

Punit Agrawal (2):
  arm64: hugetlb: Fix huge_pte_offset to return poisoned page table
    entries
  arm64: mm: Update perf accounting to handle poison faults

 arch/arm64/Kconfig | 1 +
 arch/arm64/include/asm/pgtable.h | 2 +-
 arch/arm64/mm/fault.c | 90 ++++++++++++++++++++++++----------------
 arch/arm64/mm/hugetlbpage.c | 29 +++++--------
 4 files changed, 67 insertions(+), 55 deletions(-)

[Regression Potential]
Changes are confined to ARM64 architecture. Detailed test results are posted to this bug as comments.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1696852

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Manoj Iyer (manjo) wrote :
Download full text (20.3 KiB)

Patches were applied to Zesty, along with the RAS patch series, and ran the mce-test hwpoison.

ubuntu@ubuntu:~/testing/mce-test/cases/function/hwpoison$ uname -a
Linux ubuntu 4.10.0-22-generic #24~lp1696852+mmcorruption.1 SMP Tue Jun 13 20:53:25 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux

Testing on QDF2400 SDP with mce-test:

ubuntu@ubuntu:~/testing/mce-test/cases/function/hwpoison$ sudo ./run_hugepage.sh
[sudo] password for ubuntu:
hwpoison-inject module is loaded.

***************************************************************************
Pay attention:

This is the functional test for huge page support of HWPoison.
***************************************************************************

-------------------------------------
TestCase head early file fork_shared killed
./thugetlb -x -m 2 -o 512 -e -f 1 -F ../../../work/hugepage
[ 403.325672] Memory failure: 0x171c200: Killing thugetlb:2912 due to hardware memory corruption
[ 403.333355] Memory failure: 0x171c200: Killing thugetlb:2913 due to hardware memory corruption
[ 403.341939] Memory failure: 0x171c200: huge page still referenced by 1 users
[ 403.348956] Memory failure: 0x171c200: recovery action for huge page: Failed
./run_hugepage.sh: line 61: 2912 Bus error (core dumped) ./thugetlb -x -m 2 -o 512 -e -f 1 -F ../../../work/hugepage
thugetlb was killed.
PASS
-------------------------------------
TestCase head early file fork_private_nocow killed
./thugetlb -x -m 2 -o 512 -e -f 2 -Fp ../../../work/hugepage
[ 403.739988] Memory failure: 0x171de00: Killing thugetlb:2916 due to hardware memory corruption
[ 403.747668] Memory failure: 0x171de00: Killing thugetlb:2917 due to hardware memory corruption
[ 403.756254] Memory failure: 0x171de00: recovery action for huge page: Delayed
./run_hugepage.sh: line 61: 2916 Bus error (core dumped) ./thugetlb -x -m 2 -o 512 -e -f 2 -Fp ../../../work/hugepage
thugetlb was killed.
PASS
-------------------------------------
TestCase head early file fork_private_cow killed
./thugetlb -x -m 2 -o 512 -e -f 3 -Fpc ../../../work/hugepage
[ 404.239395] Memory failure: 0x171dc00: Killing thugetlb:2920 due to hardware memory corruption
[ 404.247074] Memory failure: 0x171dc00: recovery action for huge page: Delayed
./run_hugepage.sh: line 61: 2920 Bus error (core dumped) ./thugetlb -x -m 2 -o 512 -e -f 3 -Fpc ../../../work/hugepage
thugetlb was killed.
PASS
-------------------------------------
TestCase head early shm fork_shared killed
./thugetlb -x -m 2 -o 512 -e -S -F ../../../work/hugepage
[ 404.625458] Memory failure: 0x171da00: Killing thugetlb:2923 due to hardware memory corruption
[ 404.633134] Memory failure: 0x171da00: Killing thugetlb:2924 due to hardware memory corruption
[ 404.641716] Memory failure: 0x171da00: huge page still referenced by 1 users
[ 404.648741] Memory failure: 0x171da00: recovery action for huge page: Failed
./run_hugepage.sh: line 61: 2923 Bus error (core dumped) ./thugetlb -x -m 2 -o 512 -e -S -F ../../../work/hugepage
thugetlb was killed.
PASS
-------------------------------------
TestCase head early anonymous fork_shared killed
./thugetlb -x -m 2 -o...

description: updated
Manoj Iyer (manjo) wrote :

Kernel available in PPA: https://launchpad.net/~centriq-team/+archive/ubuntu/lp1696852/

Boot tested on Power8:
ubuntu@manjo-srutest:~$ uname -a
Linux manjo-srutest 4.10.0-22-generic #24~lp1696852+mmcorruption.1-Ubuntu SMP Tue Jun 13 21:31:10 UTC ppc64le ppc64le ppc64le GNU/Linux

Boot tested on AMD64:
ubuntu@adib:~$ uname -a
Linux adib 4.10.0-22-generic #24~lp1696852+mmcorruption.1-Ubuntu SMP Tue Jun 13 21:31:31 UTC x86_64 x86_64 x86_64 GNU/Linux

Seth Forshee (sforshee) on 2017-07-20
Changed in linux (Ubuntu):
status: Incomplete → Fix Committed
Manoj Iyer (manjo) on 2017-07-25
Changed in linux (Ubuntu Zesty):
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Zesty):
status: New → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.12.0-11.12

---------------
linux (4.12.0-11.12) artful; urgency=low

  * linux: 4.12.0-11.12 -proposed tracker (LP: #1709929)

  * CVE-2017-1000111
    - packet: fix tp_reserve race in packet_set_ring

  * CVE-2017-1000112
    - udp: consistently apply ufo or fragmentation

  * Please only recommend or suggest initramfs-tools | linux-initramfs-tool for
    kernels able to boot without initramfs (LP: #1700972)
    - Revert "UBUNTU: [Debian] Don't depend on initramfs-tools"
    - [Debian] Don't depend on initramfs-tools

  * Miscellaneous Ubuntu changes
    - SAUCE: (noup) Update spl to 0.6.5.11-ubuntu1, zfs to 0.6.5.11-1ubuntu3
    - SAUCE: powerpc: Always initialize input array when calling epapr_hypercall()

  * Miscellaneous upstream changes
    - selftests: typo correction for memory-hotplug test
    - selftests: check hot-pluggagble memory for memory-hotplug test
    - selftests: check percentage range for memory-hotplug test
    - selftests: add missing test name in memory-hotplug test
    - selftests: fix memory-hotplug test

 -- Seth Forshee <email address hidden> Thu, 10 Aug 2017 13:37:00 -0500

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Manoj Iyer (manjo) wrote :

ubuntu@awrep4:~/mce-test/cases/function/hwpoison$ sudo ./run_hugepage.sh

***************************************************************************
Pay attention:

This is the functional test for huge page support of HWPoison.
***************************************************************************

 Num of Executed Test Case: 42 Num of Failed Case: 14

Manoj Iyer (manjo) wrote :

ubuntu@awrep4:~/mce-test/cases/function/hwpoison$ sudo ./run_soft.sh

***************************************************************************
Pay attention:

This test is soft mode of HWPoison functional test.
***************************************************************************

------------------------------------------------------------------------
Running tsoft (simple soft offline test)
PASS: ./tsoft
------------------------------------------------------------------------
Running tsoftinj (soft offline test on various types of pages)
anonymous
anonymous
private, diskbacked
./helpers.sh: line 28: 12471 Bus error (core dumped) ./tsoftinj
FAIL: ./tsoftinj returned with failure.
------------------------------------------------------------------------
Running random_offline (random soft offline test for 60 seconds)
ERROR: No soft offlining support in kernel
PASS: ./random_offline -t 60
Unpoisoning.
WARNING: hwpoison page counter is broken.
HardwareCorrupted: 8 kB

 Num of Executed Test Case: 3 Num of Failed Case: 1

tags: added: verification-done-zesty
removed: verification-needed-zesty
Manoj Iyer (manjo) wrote :

ubuntu@awrep4:~/mce-test/cases/function/hwpoison$ uname -a
Linux awrep4 4.10.0-33-generic #37-Ubuntu SMP Fri Aug 11 10:55:04 UTC 2017 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@awrep4:~/mce-test/cases/function/hwpoison$

Launchpad Janitor (janitor) wrote :
Download full text (8.5 KiB)

This bug was fixed in the package linux - 4.10.0-33.37

---------------
linux (4.10.0-33.37) zesty; urgency=low

  * linux: 4.10.0-33.37 -proposed tracker (LP: #1709303)

  * CVE-2017-1000112
    - Revert "udp: consistently apply ufo or fragmentation"
    - udp: consistently apply ufo or fragmentation

  * CVE-2017-1000111
    - Revert "net-packet: fix race in packet_set_ring on PACKET_RESERVE"
    - packet: fix tp_reserve race in packet_set_ring

  * ThunderX: soft lockup on 4.8+ kernels when running qemu-efi with vhost=on
    (LP: #1673564)
    - irqchip/gic-v3: Add missing system register definitions
    - arm64: KVM: Do not use stack-protector to compile EL2 code
    - KVM: arm/arm64: vgic-v3: Use PREbits to infer the number of ICH_APxRn_EL2
      registers
    - KVM: arm/arm64: vgic-v3: Fix nr_pre_bits bitfield extraction
    - arm64: Add a facility to turn an ESR syndrome into a sysreg encoding
    - KVM: arm/arm64: vgic-v3: Add accessors for the ICH_APxRn_EL2 registers
    - KVM: arm64: Make kvm_condition_valid32() accessible from EL2
    - KVM: arm64: vgic-v3: Add hook to handle guest GICv3 sysreg accesses at EL2
    - KVM: arm64: vgic-v3: Add ICV_BPR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGRPEN1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IAR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_EOIR1_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_AP1Rn_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_HPPIR1_EL1 handler
    - KVM: arm64: vgic-v3: Enable trapping of Group-1 system registers
    - KVM: arm64: Enable GICv3 Group-1 sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Add ICV_BPR0_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_IGNREN0_EL1 handler
    - KVM: arm64: vgic-v3: Add misc Group-0 handlers
    - KVM: arm64: vgic-v3: Enable trapping of Group-0 system registers
    - KVM: arm64: Enable GICv3 Group-0 sysreg trapping via command-line
    - arm64: Add MIDR values for Cavium cn83XX SoCs
    - [Config] CONFIG_CAVIUM_ERRATUM_30115=y
    - arm64: Add workaround for Cavium Thunder erratum 30115
    - KVM: arm64: vgic-v3: Add ICV_DIR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_RPR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_CTLR_EL1 handler
    - KVM: arm64: vgic-v3: Add ICV_PMR_EL1 handler
    - KVM: arm64: Enable GICv3 common sysreg trapping via command-line
    - KVM: arm64: vgic-v3: Log which GICv3 system registers are trapped
    - arm64: KVM: Make unexpected reads from WO registers inject an undef
    - KVM: arm64: Log an error if trapping a read-from-write-only GICv3 access
    - KVM: arm64: Log an error if trapping a write-to-read-only GICv3 access

  * ibmvscsis: Do not send aborted task response (LP: #1689365)
    - target: Fix unknown fabric callback queue-full errors
    - ibmvscsis: Do not send aborted task response
    - ibmvscsis: Clear left-over abort_cmd pointers
    - ibmvscsis: Fix the incorrect req_lim_delta

  * hisi_sas performance improvements (LP: #1708734)
    - scsi: hisi_sas: define hisi_sas_device.device_id as int
    - scsi: hisi_sas: optimise the usage of hisi_hba.lock
    - scsi: hisi_sas: relocate sata_done_v2_hw()
    - scsi: hisi_sas: optimise DMA slot memory

  * hisi_sas...

Read more...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers