EDAC sbridge: Failed to register device with error -22.

Bug #1714112 reported by Vinson Lee on 2017-08-30
42
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Canonical Hardware Enablement Team Release Cycle
Undecided
Unassigned
linux (Ubuntu)
Medium
Seth Forshee
Artful
Medium
Seth Forshee

Bug Description

These kernel error messages appear with artful kernel 4.12.0-11-generic.

EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -22.

EDAC_GHES is enable in 4.12.0-11-generic, but according to https://lkml.org/lkml/2017/7/9/258 it should be disabled.

$ grep CONFIG_EDAC /boot/config-4.12.0-11-generic
CONFIG_EDAC_ATOMIC_SCRUB=y
CONFIG_EDAC_SUPPORT=y
CONFIG_EDAC=y
# CONFIG_EDAC_LEGACY_SYSFS is not set
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_DECODE_MCE=m
CONFIG_EDAC_GHES=y
CONFIG_EDAC_AMD64=m
# CONFIG_EDAC_AMD64_ERROR_INJECTION is not set
CONFIG_EDAC_E752X=m
CONFIG_EDAC_I82975X=m
CONFIG_EDAC_I3000=m
CONFIG_EDAC_I3200=m
CONFIG_EDAC_IE31200=m
CONFIG_EDAC_X38=m
CONFIG_EDAC_I5400=m
CONFIG_EDAC_I7CORE=m
CONFIG_EDAC_I5000=m
CONFIG_EDAC_I5100=m
CONFIG_EDAC_I7300=m
CONFIG_EDAC_SBRIDGE=m
CONFIG_EDAC_SKX=m
CONFIG_EDAC_PND2=m

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1714112

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Vinson Lee (vlee) on 2017-08-30
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
tags: added: kernel-da-key
Seth Forshee (sforshee) wrote :

Funny, the Kconfig help text says it should be enabled. They need to make up their minds :-P

I found a branch which I think has the fixes, but it looks to me like they are still testing it. So for now I'll disable that option.

Changed in linux (Ubuntu Artful):
assignee: nobody → Seth Forshee (sforshee)
status: Triaged → In Progress
Seth Forshee (sforshee) on 2017-09-01
Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (12.1 KiB)

This bug was fixed in the package linux - 4.12.0-13.14

---------------
linux (4.12.0-13.14) artful; urgency=low

  * linux: 4.12.0-13.14 -proposed tracker (LP: #1714687)

  * vhost guest network randomly drops under stress (kvm) (LP: #1711251)
    - Revert "vhost: cache used event for better performance"

  * EDAC sbridge: Failed to register device with error -22. (LP: #1714112)
    - [Config] CONFIG_EDAC_GHES=n

  * Artful update to v4.12.10 stable release (LP: #1714525)
    - sparc64: remove unnecessary log message
    - bonding: require speed/duplex only for 802.3ad, alb and tlb
    - bonding: ratelimit failed speed/duplex update warning
    - af_key: do not use GFP_KERNEL in atomic contexts
    - dccp: purge write queue in dccp_destroy_sock()
    - dccp: defer ccid_hc_tx_delete() at dismantle time
    - ipv4: fix NULL dereference in free_fib_info_rcu()
    - net_sched/sfq: update hierarchical backlog when drop packet
    - net_sched: remove warning from qdisc_hash_add
    - bpf: fix bpf_trace_printk on 32 bit archs
    - net: igmp: Use ingress interface rather than vrf device
    - openvswitch: fix skb_panic due to the incorrect actions attrlen
    - ptr_ring: use kmalloc_array()
    - ipv4: better IP_MAX_MTU enforcement
    - nfp: fix infinite loop on umapping cleanup
    - tun: handle register_netdevice() failures properly
    - sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
    - tipc: fix use-after-free
    - ipv6: reset fn->rr_ptr when replacing route
    - ipv6: repair fib6 tree in failure case
    - tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
    - net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
    - irda: do not leak initialized list.dev to userspace
    - net: sched: fix NULL pointer dereference when action calls some targets
    - net_sched: fix order of queue length updates in qdisc_replace()
    - bpf, verifier: add additional patterns to evaluate_reg_imm_alu
    - bpf: fix mixed signed/unsigned derived min/max value bounds
    - bpf/verifier: fix min/max handling in BPF_SUB
    - Input: trackpoint - add new trackpoint firmware ID
    - Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
    - Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad
    - KVM: s390: sthyi: fix sthyi inline assembly
    - KVM: s390: sthyi: fix specification exception detection
    - KVM: x86: simplify handling of PKRU
    - KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state
    - KVM: x86: block guest protection keys unless the host has them enabled
    - ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets
    - ALSA: core: Fix unexpected error at replacing user TLV
    - ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
    - ALSA: firewire: fix NULL pointer dereference when releasing uninitialized
      data of iso-resource
    - ALSA: firewire-motu: destroy stream data surely at failure of card
      initialization
    - ARCv2: SLC: Make sure busy bit is set properly for region ops
    - ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
    - ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_...

Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Vinson Lee (vlee) wrote :

This bug has reappeared in 4.13.0-11.

Changed in linux (Ubuntu Artful):
status: Fix Released → Confirmed
Josh Coyle (blackoutwnct) wrote :
Download full text (4.2 KiB)

I'm also seeing this under kernel version 4.13.0-16-generic on artful (17.10).

I'm not too sure on what info you're after, but I've tried to follow what's been submitted above.

$ dmesg
[ 106.176439] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[ 106.176442] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[ 106.176446] EDAC sbridge: Seeking for: PCI ID 8086:6fa0
[ 106.176450] EDAC sbridge: Seeking for: PCI ID 8086:6f60
[ 106.176457] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[ 106.176459] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[ 106.176462] EDAC sbridge: Seeking for: PCI ID 8086:6fa8
[ 106.176466] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[ 106.176468] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[ 106.176471] EDAC sbridge: Seeking for: PCI ID 8086:6f71
[ 106.176474] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[ 106.176476] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[ 106.176479] EDAC sbridge: Seeking for: PCI ID 8086:6faa
[ 106.176483] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[ 106.176485] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[ 106.176487] EDAC sbridge: Seeking for: PCI ID 8086:6fab
[ 106.176491] EDAC sbridge: Seeking for: PCI ID 8086:6fac
[ 106.176493] EDAC sbridge: Seeking for: PCI ID 8086:6fac
[ 106.176496] EDAC sbridge: Seeking for: PCI ID 8086:6fac
[ 106.176499] EDAC sbridge: Seeking for: PCI ID 8086:6fad
[ 106.176501] EDAC sbridge: Seeking for: PCI ID 8086:6fad
[ 106.176504] EDAC sbridge: Seeking for: PCI ID 8086:6fad
[ 106.176507] EDAC sbridge: Seeking for: PCI ID 8086:6f68
[ 106.176510] EDAC sbridge: Seeking for: PCI ID 8086:6f68
[ 106.176513] EDAC sbridge: Seeking for: PCI ID 8086:6f68
[ 106.176516] EDAC sbridge: Seeking for: PCI ID 8086:6f79
[ 106.176523] EDAC sbridge: Seeking for: PCI ID 8086:6f6a
[ 106.176530] EDAC sbridge: Seeking for: PCI ID 8086:6f6b
[ 106.176537] EDAC sbridge: Seeking for: PCI ID 8086:6f6c
[ 106.176544] EDAC sbridge: Seeking for: PCI ID 8086:6f6d
[ 106.176551] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[ 106.176552] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[ 106.176555] EDAC sbridge: Seeking for: PCI ID 8086:6ffc
[ 106.176559] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[ 106.176561] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[ 106.176564] EDAC sbridge: Seeking for: PCI ID 8086:6ffd
[ 106.176568] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[ 106.176570] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[ 106.176573] EDAC sbridge: Seeking for: PCI ID 8086:6faf
[ 106.176752] EDAC MC0: Giving out device to module sb_edac.c controller Broadwell SrcID#1_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
[ 106.176947] EDAC MC1: Giving out device to module sb_edac.c controller Broadwell SrcID#0_Ha#0: DEV 0000:7f:12.0 (INTERRUPT)
[ 106.176966] EDAC sbridge: Some needed devices are missing
[ 106.192161] EDAC MC: Removed device 0 for sb_edac.c Broadwell SrcID#1_Ha#0: DEV 0000:ff:12.0
[ 106.212157] EDAC MC: Removed device 1 for sb_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:7f:12.0
[ 106.212184] EDAC sbridge: Couldn't find mci handler
[ 106.212850] EDAC sbridge: Couldn't find mci handler
[ 106.213534] EDAC sbridge: Failed to register device with error -19.

$ grep CONFIG_EDAC /bo...

Read more...

Chang Liu (cchliu) wrote :

I am seeing this re-appearing in kernel 4.13.0-21-generic:

EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19

Anybody has an idea on how to solve it?

I am seeing this problem still occuring in kernel 4.13.0-26-generic:

EDAC MC0: Giving out device to module sb_edac.c controller Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
EDAC MC1: Giving out device to module sb_edac.c controller Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0 (INTERRUPT)
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
Removed device 1 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.

We can't apply actual HWE Kernel for Ubuntu 16.04 Server, since this bug breaks our bridge networking for KVM, because of:

  * vhost guest network randomly drops under stress (kvm) (LP: #1711251)
    - Revert "vhost: cache used event for better performance"

Will there be a Fix for HWE Kernels?

A fix for HWE kernels would be most welcome. While some of our systems (for example Ivy Bridge-based) work fine, the newer ones (Broadwell) fail with:
MC0: Giving out device to module sb_edac.c controller Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 (INTERRUPT)
EDAC sbridge: Some needed devices are missing
EDAC MC: Removed device 0 for sb_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0
EDAC sbridge: Couldn't find mci handler
EDAC sbridge: Failed to register device with error -19.

Stephen Hill (steve-d-hill) wrote :

I think I have same problem on my Intel(R) Xeon(R) CPU E3-1226 v3 on kernel 4.13.0-36-lowlatency:

 Feb 27 19:04:21 hill systemd[1]: Starting LSB: Initialize EDAC...
 Feb 27 19:04:21 hill kernel: [ 0.120006] EDAC MC: Ver: 3.0.0
 Feb 27 19:04:21 hill kernel: [ 8.882863] EDAC MC0: Giving out device to module ie31200_edac controller IE31200: DEV 0000:00:00.0 (P$
 Feb 27 19:04:22 hill edac[2257]: * Enabling Memory Error Detection an_edacd Correction edac
 Feb 27 19:04:22 hill edac[2257]: modprobe: ERROR: could not insert 'sb_edac': No such device
 Feb 27 19:04:22 hill edac[2257]: * failure with exit code 1
 Feb 27 19:04:22 hill edac[2257]: ...fail!
 Feb 27 19:04:22 hill edac[2257]: * Loading DIMM labels for Memory Error Detection and Correction edac
 Feb 27 19:04:23 hill systemd[1]: Started LSB: Initialize EDAC.
 Feb 27 19:04:23 hill edac[2257]: ...done.

Module exists - /lib/modules/4.13.0-36-lowlatency/kernel/drivers/edac/sb_edac.ko.
Used to work fine on 14.04 with 4.4.x kernel.

Stephen Hill (steve-d-hill) wrote :

My problem above might be slightly different. Have found reference to something similar here - https://www.spinics.net/lists/linux-edac/msg08263.html. I'm not sure....

This bug was nominated against a series that is no longer supported, ie artful. The bug task representing the artful nomination is being closed as Won't Fix.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu Artful):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers