Activity log for bug #1706141

Date Who What changed Old value New value Message
2017-07-24 18:23:31 Manoj Iyer bug added bug
2017-07-24 18:30:05 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2017-07-25 16:18:51 Manoj Iyer description [Impact] In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables EDAC_GHES. Customers using RAS on ARM64 may want this functionality. According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0 Server systems, as it enables firmware-first error handling of memory and CPU errors. Due to a lack of standard RAS architecture (or machine check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only mechanism available for reporting hardware errors (e.g. memory and CPU errors). This enables reporting of hardware errors, and also helps enable memory fault recovery mechanisms to extend the life of the system by offlining pages when recoverable uncorrected errors are encountered. Note that other ARM vendors will be going in this direction for hardware error handling. [Test] [Fix] Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set EDAC_GHES=y [Regression Potential] The config change is limited to ARM64 architecture, and does not impact any other architecture. Potential for regressions is low. [Impact] In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables EDAC_GHES. Customers using RAS on ARM64 may want this functionality. According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0 Server systems, as it enables firmware-first error handling of memory and CPU errors. Due to a lack of standard RAS architecture (or machine check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only mechanism available for reporting hardware errors (e.g. memory and CPU errors). This enables reporting of hardware errors, and also helps enable memory fault recovery mechanisms to extend the life of the system by offlining pages when recoverable uncorrected errors are encountered. Note that other ARM vendors will be going in this direction for hardware error handling. [Test] Boot the kernel and check dmesg for the following: $ dmesg | grep -i -E "edac|hest|ghes" [ 0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515) [ 0.620278] HEST: Table parsing has been initialized. [ 4.178298] EDAC MC: Ver: 3.0.0 [ 5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports. [ 5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly. [ 5.681542] ghes_edac: So, the end result of using this driver varies from vendor to vendor. [ 5.689972] ghes_edac: If you find incorrect reports, please contact your hardware vendor [ 5.698142] ghes_edac: to correct its BIOS. [ 5.702320] ghes_edac: This system has 12 DIMM sockets. [ 5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [Fix] 1. Apply RAS patch series submitted for SRU in Bug #1696570 2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set EDAC_GHES=y [Regression Potential] The config change is limited to ARM64 architecture, and does not impact any other architecture. Potential for regressions is low.
2017-07-25 16:21:03 Manoj Iyer description [Impact] In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables EDAC_GHES. Customers using RAS on ARM64 may want this functionality. According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0 Server systems, as it enables firmware-first error handling of memory and CPU errors. Due to a lack of standard RAS architecture (or machine check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only mechanism available for reporting hardware errors (e.g. memory and CPU errors). This enables reporting of hardware errors, and also helps enable memory fault recovery mechanisms to extend the life of the system by offlining pages when recoverable uncorrected errors are encountered. Note that other ARM vendors will be going in this direction for hardware error handling. [Test] Boot the kernel and check dmesg for the following: $ dmesg | grep -i -E "edac|hest|ghes" [ 0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515) [ 0.620278] HEST: Table parsing has been initialized. [ 4.178298] EDAC MC: Ver: 3.0.0 [ 5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports. [ 5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly. [ 5.681542] ghes_edac: So, the end result of using this driver varies from vendor to vendor. [ 5.689972] ghes_edac: If you find incorrect reports, please contact your hardware vendor [ 5.698142] ghes_edac: to correct its BIOS. [ 5.702320] ghes_edac: This system has 12 DIMM sockets. [ 5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [Fix] 1. Apply RAS patch series submitted for SRU in Bug #1696570 2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set EDAC_GHES=y [Regression Potential] The config change is limited to ARM64 architecture, and does not impact any other architecture. Potential for regressions is low. [Impact] In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables EDAC_GHES. Customers using RAS on ARM64 may want this functionality. According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0 Server systems, as it enables firmware-first error handling of memory and CPU errors. Due to a lack of standard RAS architecture (or machine check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only mechanism available for reporting hardware errors (e.g. memory and CPU errors). This enables reporting of hardware errors, and also helps enable memory fault recovery mechanisms to extend the life of the system by offlining pages when recoverable uncorrected errors are encountered. Note that other ARM vendors will be going in this direction for hardware error handling. [Test] Boot the kernel and check dmesg for the following: $ dmesg | grep -i -E "edac|hest|ghes" [ 0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515) [ 0.620278] HEST: Table parsing has been initialized. [ 4.178298] EDAC MC: Ver: 3.0.0 [ 5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports. [ 5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly. [ 5.681542] ghes_edac: So, the end result of using this driver varies from vendor to vendor. [ 5.689972] ghes_edac: If you find incorrect reports, please contact your hardware vendor [ 5.698142] ghes_edac: to correct its BIOS. [ 5.702320] ghes_edac: This system has 12 DIMM sockets. [ 5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [Fix] 1. Apply RAS patch series submitted for SRU in Bug #1696570 2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set EDAC_GHES=y 3. Remove edac_core from debian.master/abi/<ver>/arm64/generic.modules [Regression Potential] The config change is limited to ARM64 architecture, and does not impact any other architecture. Potential for regressions is low.
2017-07-26 02:35:45 Manoj Iyer description [Impact] In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables EDAC_GHES. Customers using RAS on ARM64 may want this functionality. According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0 Server systems, as it enables firmware-first error handling of memory and CPU errors. Due to a lack of standard RAS architecture (or machine check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only mechanism available for reporting hardware errors (e.g. memory and CPU errors). This enables reporting of hardware errors, and also helps enable memory fault recovery mechanisms to extend the life of the system by offlining pages when recoverable uncorrected errors are encountered. Note that other ARM vendors will be going in this direction for hardware error handling. [Test] Boot the kernel and check dmesg for the following: $ dmesg | grep -i -E "edac|hest|ghes" [ 0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515) [ 0.620278] HEST: Table parsing has been initialized. [ 4.178298] EDAC MC: Ver: 3.0.0 [ 5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports. [ 5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly. [ 5.681542] ghes_edac: So, the end result of using this driver varies from vendor to vendor. [ 5.689972] ghes_edac: If you find incorrect reports, please contact your hardware vendor [ 5.698142] ghes_edac: to correct its BIOS. [ 5.702320] ghes_edac: This system has 12 DIMM sockets. [ 5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [Fix] 1. Apply RAS patch series submitted for SRU in Bug #1696570 2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set EDAC_GHES=y 3. Remove edac_core from debian.master/abi/<ver>/arm64/generic.modules [Regression Potential] The config change is limited to ARM64 architecture, and does not impact any other architecture. Potential for regressions is low. [Impact] In configs Zesty and Artful, EDAC_MM_EDAC is set to =m, this disables EDAC_GHES. Customers using RAS on ARM64 may want this functionality. According to RAS expert at QTI. EDAC_GHES is essential for ARMv8.0 Server systems, as it enables firmware-first error handling of memory and CPU errors. Due to a lack of standard RAS architecture (or machine check architecture equivalent) on ARMv8.0 systems, APEI/GHES is the only mechanism available for reporting hardware errors (e.g. memory and CPU errors). This enables reporting of hardware errors, and also helps enable memory fault recovery mechanisms to extend the life of the system by offlining pages when recoverable uncorrected errors are encountered. Note that other ARM vendors will be going in this direction for hardware error handling. [Test] Test kernel available in https://launchpad.net/~centriq-team/+archive/ubuntu/lp1706141 Boot the kernel and check dmesg for the following: $ dmesg | grep -i -E "edac|hest|ghes" [ 0.000000] ACPI: HEST 0x0000000009160000 000288 (v01 QCOM QDF2400 00000001 INTL 20150515) [ 0.620278] HEST: Table parsing has been initialized. [ 4.178298] EDAC MC: Ver: 3.0.0 [ 5.664499] ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports. [ 5.673371] ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly. [ 5.681542] ghes_edac: So, the end result of using this driver varies from vendor to vendor. [ 5.689972] ghes_edac: If you find incorrect reports, please contact your hardware vendor [ 5.698142] ghes_edac: to correct its BIOS. [ 5.702320] ghes_edac: This system has 12 DIMM sockets. [ 5.707717] EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.717264] EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.726806] EDAC MC2: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.736344] EDAC MC3: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.745883] EDAC MC4: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT) [ 5.755469] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. [Fix] 1. Apply RAS patch series submitted for SRU in Bug #1696570 2. Set config option EDAC_MM_EDAC=y for ARM64, this will automatically set EDAC_GHES=y 3. Remove edac_core from debian.master/abi/<ver>/arm64/generic.modules [Regression Potential] The config change is limited to ARM64 architecture, and does not impact any other architecture. Potential for regressions is low.
2017-07-27 14:46:20 Stefan Bader nominated for series Ubuntu Zesty
2017-07-27 14:46:20 Stefan Bader bug task added linux (Ubuntu Zesty)
2017-07-27 20:01:43 Seth Forshee linux (Ubuntu): status Incomplete Fix Committed
2017-08-07 13:44:28 Kleber Sacilotto de Souza linux (Ubuntu Zesty): status New Fix Committed
2017-08-10 05:56:17 Launchpad Janitor linux (Ubuntu): status Fix Committed Fix Released
2017-08-10 05:56:17 Launchpad Janitor cve linked 2017-1000364
2017-08-10 05:56:17 Launchpad Janitor cve linked 2017-10810
2017-08-10 05:56:17 Launchpad Janitor cve linked 2017-7533
2017-08-16 16:34:44 Kleber Sacilotto de Souza tags qdf2400 qdf2400 verification-needed-zesty
2017-08-22 13:25:25 Manoj Iyer tags qdf2400 verification-needed-zesty qdf2400 verification-done-zesty
2017-08-28 10:14:27 Launchpad Janitor linux (Ubuntu Zesty): status Fix Committed Fix Released
2017-08-28 10:14:27 Launchpad Janitor cve linked 2017-1000111
2017-08-28 10:14:27 Launchpad Janitor cve linked 2017-1000112
2017-08-28 10:14:27 Launchpad Janitor cve linked 2017-7487