[Bug] Harrisonville: pnd2_edac always fail to load on B1 stepping Harrisonville SDP

Bug #1709257 reported by quanxian
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
intel
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Medium
Seth Forshee
Artful
Fix Released
Medium
Seth Forshee

Bug Description

Description:
During our EDAC validation on Wind River Linux 9, we found pnd2_edac manually load always fails on B1 stepping Harrisonville SDP.
Error info as below:
root@intel-x86-64:~# uname -r
4.8.20-WR9.0.0.5_standard
root@intel-x86-64:~# dmesg |grep -i edac
[ 11.949838] EDAC MC: Ver: 3.0.0
[ 11.954312] EDAC DEBUG: edac_mc_sysfs_init: device mc created
[ 12.060157] EDAC DEBUG: pnd2_init:
[ 12.060160] EDAC DEBUG: pnd2_probe:
[ 12.060167] EDAC DEBUG: dnv_rd_reg: Read b_cr_tolud_pci=00000000_80000000
[ 12.060169] EDAC DEBUG: dnv_rd_reg: Read b_cr_touud_lo_pci=00000000_80000000
[ 12.060172] EDAC DEBUG: dnv_rd_reg: Read b_cr_touud_hi_pci=00000000_00000004
[ 12.060228] EDAC DEBUG: dnv_rd_reg: Read b_cr_asym_mem_region0_mchbar=00000000_00000000
[ 12.060239] EDAC DEBUG: dnv_rd_reg: Read b_cr_asym_mem_region1_mchbar=00000000_00000000
[ 12.060247] EDAC DEBUG: dnv_rd_reg: Read b_cr_mot_out_base_mchbar=00000000_00000000
[ 12.060255] EDAC DEBUG: dnv_rd_reg: Read b_cr_mot_out_mask_mchbar=00000000_00000000
[ 12.078301] Modules linked in: pnd2_edac(+) edac_core x86_pkg_temp_thermal i2c_i801 intel_powerclamp matroxfb_base matroxfb_g450 matroxfb_accel matroxfb_DAC1064 g450_pll matroxfb_misc i2c_ismt i2c_smbus coretemp crct10dif_pclmul crct10dif_common aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd efi_pstore efivars acpi_cpufreq tpm_tis tpm_tis_core softdog efivarfs
[ 12.143390] [<ffffffffa09f80d8>] ? dnv_rd_reg+0x128/0x220 [pnd2_edac]
[ 12.148890] [<ffffffffa09f80d8>] dnv_rd_reg+0x128/0x220 [pnd2_edac]
[ 12.148896] [<ffffffffa0a2722b>] pnd2_init+0x22b/0x809 [pnd2_edac]
[ 12.148961] EDAC pnd2: Failed to register device with error -19.
[ 12.180813] EDAC DEBUG: pnd2_init:
[ 12.180814] EDAC DEBUG: pnd2_probe:
[ 12.180841] EDAC DEBUG: dnv_rd_reg: Read b_cr_tolud_pci=00000000_80000000
[ 12.180937] EDAC DEBUG: dnv_rd_reg: Read b_cr_touud_lo_pci=00000000_80000000
[ 12.180969] EDAC DEBUG: dnv_rd_reg: Read b_cr_touud_hi_pci=00000000_00000004
[ 12.181129] EDAC DEBUG: dnv_rd_reg: Read b_cr_asym_mem_region0_mchbar=00000000_00000000
[ 12.181299] EDAC DEBUG: dnv_rd_reg: Read b_cr_asym_mem_region1_mchbar=00000000_00000000
[ 12.181425] EDAC DEBUG: dnv_rd_reg: Read b_cr_mot_out_base_mchbar=00000000_00000000
[ 12.181671] EDAC DEBUG: dnv_rd_reg: Read b_cr_mot_out_mask_mchbar=00000000_00000000
[ 12.181810] EDAC pnd2: Failed to register device with error -19.
root@intel-x86-64:~# dmesg -c > dmesg.clear_14.D91.log
root@intel-x86-64:~# modprobe edac_core
root@intel-x86-64:~# modprobe pnd2_edac
modprobe: ERROR: could not insert 'pnd2_edac': No such device
root@intel-x86-64:~# dmesg
[ 194.524122] EDAC DEBUG: pnd2_init:
[ 194.524126] EDAC DEBUG: pnd2_probe:
[ 194.524135] EDAC DEBUG: dnv_rd_reg: Read b_cr_tolud_pci=00000000_80000000
[ 194.524139] EDAC DEBUG: dnv_rd_reg: Read b_cr_touud_lo_pci=00000000_80000000
[ 194.524143] EDAC DEBUG: dnv_rd_reg: Read b_cr_touud_hi_pci=00000000_00000004
[ 194.524211] EDAC DEBUG: dnv_rd_reg: Read b_cr_asym_mem_region0_mchbar=00000000_00000000
[ 194.524226] EDAC DEBUG: dnv_rd_reg: Read b_cr_asym_mem_region1_mchbar=00000000_00000000
[ 194.524239] EDAC DEBUG: dnv_rd_reg: Read b_cr_mot_out_base_mchbar=00000000_00000000
[ 194.524252] EDAC DEBUG: dnv_rd_reg: Read b_cr_mot_out_mask_mchbar=00000000_00000000
[ 194.524264] EDAC pnd2: Failed to register device with error -19.
root@intel-x86-64:~# dmidecode -t bios | grep Version
        Version: HAVLCRB1.X64.0014.D91.1704200405

Target Kernel: 4.14
Target Release: 18.04
if 17.10, it will need back porting

Revision history for this message
quanxian (quanxian-wang) wrote :

upstream is working on that. Keep tune.

Revision history for this message
quanxian (quanxian-wang) wrote :

1) The commit IDs on Boris' 'for-next' branch are:
bc8f10babcc2 EDAC, pnd2: Properly toggle hidden state for P2SB PCI device
5fd77cb3bac7 EDAC, pnd2: Conditionally unhide/hide the P2SB PCI device to read BAR
d84676a9e128 EDAC, pnd2: Mask off the lower four bits of a BAR
They will be merged on next merge window v4.14-rc1 at latest.
2) The patch 'i2c: i801: Restore the presence state of P2SB PCI device after reading BAR' has been ACK-ed by Wolfram Sang but hasn't been pushed on his 'for-next' branch yet. Wolfram Sang said he'd like to give Jean a chance for comment.

v4.14-rc1(or v4.14) should be the target kernel.

description: updated
description: updated
information type: Proprietary → Private
Changed in linux (Ubuntu Artful):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Seth Forshee (sforshee)
Revision history for this message
Seth Forshee (sforshee) wrote :

bc8f10babcc2 "EDAC, pnd2: Properly toggle hidden state for P2SB PCI device" doesn't apply cleanly without 3e5d2bd19138 "EDAC, pnd2: Build in a minimal sideband driver for Apollo Lake". The backport is not straightforward, and I'm not sure how to validate if I did attempt a backport. Can you please advise if we should also take that commit, or else provide us with a tested backport of bc8f10babcc2 to 4.13? Thanks!

For future reference, this appears to be the extra patch which was requested - https://patchwork.ozlabs.org/patch/801447/.

Changed in linux (Ubuntu Artful):
status: Triaged → Incomplete
information type: Private → Public
Revision history for this message
quanxian (quanxian-wang) wrote :

ok, I will contact upstream for suggestion. Keep tune.

Revision history for this message
quanxian (quanxian-wang) wrote :

The commit "bc8f10babcc2" is for fixing the issue of commit "3e5d2bd19138" that also unconditionally hides the P2SB device on Apollo Lake. To backport from 4.14 to 4.13 (or older version) for 'pnd2_edac' driver, please apply all the following four patches (on Boris' for-next branch):
bc8f10babcc2 EDAC, pnd2: Properly toggle hidden state for P2SB PCI device
5fd77cb3bac7 EDAC, pnd2: Conditionally unhide/hide the P2SB PCI device to read BAR
d84676a9e128 EDAC, pnd2: Mask off the lower four bits of a BAR
3e5d2bd19138 EDAC, pnd2: Build in a minimal sideband driver for Apollo Lake

Revision history for this message
Seth Forshee (sforshee) wrote :

Applied the indicated patches to 4.13. With 4.12 there were still problems cherry picking, but there were just a few small patches in 4.13 so I just went ahead and pulled in all of these as well:

ee514c7a2379 EDAC, pnd2: Return proper error value from apl_rd_reg()
77641dacead2 EDAC, pnd2: Make function sbi_send() static
164c29244d4b EDAC, pnd2: Fix Apollo Lake DIMM detection

Changed in linux (Ubuntu Artful):
status: Incomplete → Fix Committed
Revision history for this message
quanxian (quanxian-wang) wrote :

From comment#2, patch bfd4473b850c was not pushed into upstream. Currently it has been in upstream now.

Comment from upstream
"
The patch "bfd4473b850c i2c: i801: Restore the presence state of P2SB PCI device after reading BAR" has been pushed on the 'for-next' branch of I2C maintainer Wolfram Sang.
'

Revision history for this message
Seth Forshee (sforshee) wrote : Re: [Bug 1709257] Re: [Bug] Harrisonville: pnd2_edac always fail to load on B1 stepping Harrisonville SDP

On Tue, Aug 29, 2017 at 02:08:52AM -0000, quanxian wrote:
> >From comment#2, patch bfd4473b850c was not pushed into upstream.
> Currently it has been in upstream now.
>
> Comment from upstream
> "
> The patch "bfd4473b850c i2c: i801: Restore the presence state of P2SB PCI device after reading BAR" has been pushed on the 'for-next' branch of I2C maintainer Wolfram Sang.
> '

@quanxian: Yes I noticed that, in fact I cherry-picked the commit from
the i2c tree when applying to artful (hadn't yet hit linux-next).

Revision history for this message
quanxian (quanxian-wang) wrote :

Appreciate it. Thanks @Seth Forshee

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (12.1 KiB)

This bug was fixed in the package linux - 4.12.0-13.14

---------------
linux (4.12.0-13.14) artful; urgency=low

  * linux: 4.12.0-13.14 -proposed tracker (LP: #1714687)

  * vhost guest network randomly drops under stress (kvm) (LP: #1711251)
    - Revert "vhost: cache used event for better performance"

  * EDAC sbridge: Failed to register device with error -22. (LP: #1714112)
    - [Config] CONFIG_EDAC_GHES=n

  * Artful update to v4.12.10 stable release (LP: #1714525)
    - sparc64: remove unnecessary log message
    - bonding: require speed/duplex only for 802.3ad, alb and tlb
    - bonding: ratelimit failed speed/duplex update warning
    - af_key: do not use GFP_KERNEL in atomic contexts
    - dccp: purge write queue in dccp_destroy_sock()
    - dccp: defer ccid_hc_tx_delete() at dismantle time
    - ipv4: fix NULL dereference in free_fib_info_rcu()
    - net_sched/sfq: update hierarchical backlog when drop packet
    - net_sched: remove warning from qdisc_hash_add
    - bpf: fix bpf_trace_printk on 32 bit archs
    - net: igmp: Use ingress interface rather than vrf device
    - openvswitch: fix skb_panic due to the incorrect actions attrlen
    - ptr_ring: use kmalloc_array()
    - ipv4: better IP_MAX_MTU enforcement
    - nfp: fix infinite loop on umapping cleanup
    - tun: handle register_netdevice() failures properly
    - sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
    - tipc: fix use-after-free
    - ipv6: reset fn->rr_ptr when replacing route
    - ipv6: repair fib6 tree in failure case
    - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
    - net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
    - irda: do not leak initialized list.dev to userspace
    - net: sched: fix NULL pointer dereference when action calls some targets
    - net_sched: fix order of queue length updates in qdisc_replace()
    - bpf, verifier: add additional patterns to evaluate_reg_imm_alu
    - bpf: fix mixed signed/unsigned derived min/max value bounds
    - bpf/verifier: fix min/max handling in BPF_SUB
    - Input: trackpoint - add new trackpoint firmware ID
    - Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
    - Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
    - KVM: s390: sthyi: fix sthyi inline assembly
    - KVM: s390: sthyi: fix specification exception detection
    - KVM: x86: simplify handling of PKRU
    - KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state
    - KVM: x86: block guest protection keys unless the host has them enabled
    - ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets
    - ALSA: core: Fix unexpected error at replacing user TLV
    - ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
    - ALSA: firewire: fix NULL pointer dereference when releasing uninitialized
      data of iso-resource
    - ALSA: firewire-motu: destroy stream data surely at failure of card
      initialization
    - ARCv2: SLC: Make sure busy bit is set properly for region ops
    - ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
    - ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Changed in intel:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.