hardware errors every 300s

Bug #993758 reported by Charles Lindsay on 2012-05-03
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Medium
linux (Ubuntu)
Medium
Unassigned

Bug Description

Every 300s on the dot I get this in dmesg:

[ 300.804131] [Hardware Error]: CPU:1 MC0_STATUS[-|CE|-|-|AddrV|CECC]: 0x9467400000000136
[ 300.804147] [Hardware Error]: MC0_ADDR: 0x00000003f2f7c5c0
[ 300.804152] [Hardware Error]: Data Cache Error: during L1 linefill from L2.
[ 300.804160] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD
[ 300.804171] [Hardware Error]: CPU:1 MC1_STATUS[Over|CE|-|-|-]: 0xd000000000000171
[ 300.804178] [Hardware Error]: Instruction Cache Error: Copyback Parity/Victim error.
[ 300.804184] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
[ 300.804194] [Hardware Error]: CPU:1 MC2_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40040000000018a
[ 300.804202] [Hardware Error]: MC2_ADDR: 0x00000003e7d745c0
[ 300.804207] [Hardware Error]: Bus Unit Error: SNP error during data copyback.
[ 300.804213] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP

Same again at 600.804..., 900.804..., etc. for as long as my computer is on.

cat /proc/cpuinfo says:

processor : 0
vendor_id : AuthenticAMD
cpu family : 16
model : 4
model name : AMD Phenom(tm) II X4 955 Processor
stepping : 2
microcode : 0x1000086
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt npt lbrv svm_lock nrip_save
bogomips : 6428.47
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

(...and repeated for the next 3 cores.)

This only started happening with my recent upgrade to Ubuntu 12.04. I'm running linux-image-3.2.0-24-generic 3.2.0-24.37.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-24-generic 3.2.0-24.37
ProcVersionSignature: Ubuntu 3.2.0-24.37-generic 3.2.14
Uname: Linux 3.2.0-24-generic x86_64
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 2.0.1-0ubuntu7
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfe024000 irq 16'
   Mixer name : 'Realtek ALC889'
   Components : 'HDA:10ec0889,1458a102,00100004'
   Controls : 48
   Simple ctrls : 22
Date: Wed May 2 22:20:15 2012
HibernationDevice: RESUME=UUID=078f1193-c57b-4c6f-a307-7eeb09d49fff
InstallationMedia: Ubuntu 12.04 LTS "Precise Pangolin" - Release amd64 (20120425)
IwConfig:
 lo no wireless extensions.

 eth1 no wireless extensions.

 eth0 no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. GA-790FXTA-UD5
ProcEnviron:
 TERM=xterm
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.2.0-24-generic root=UUID=cde6d387-0099-461a-9b23-cfe68018e3f7 ro quiet splash irqpoll vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-24-generic N/A
 linux-backports-modules-3.2.0-24-generic N/A
 linux-firmware 1.79
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/03/2009
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F2
dmi.board.name: GA-790FXTA-UD5
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF2:bd12/03/2009:svnGigabyteTechnologyCo.,Ltd.:pnGA-790FXTA-UD5:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-790FXTA-UD5:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-790FXTA-UD5
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Charles Lindsay (chazomaticus) wrote :
Brad Figg (brad-figg) on 2012-05-03
Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.4kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.4-rc5-precise/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Charles Lindsay (chazomaticus) wrote :

$ uname -rv
3.4.0-030400rc5-generic #201205011817 SMP Tue May 1 22:18:19 UTC 2012
$ dmesg | tail
[ 599.832672] [Hardware Error]: MC0_ADDR: 0x000000042fc8c5c0
[ 599.832678] [Hardware Error]: Data Cache Error: during L1 linefill from L2.
[ 599.832686] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD
[ 599.832697] [Hardware Error]: CPU:1 MC1_STATUS[Over|CE|-|-|-]: 0xd000000000000171
[ 599.832704] [Hardware Error]: Instruction Cache Error: Copyback Parity/Victim error.
[ 599.832711] [Hardware Error]: cache level: L1, tx: INSN, mem-tx: EV
[ 599.832721] [Hardware Error]: CPU:1 MC2_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40040000000018a
[ 599.832730] [Hardware Error]: MC2_ADDR: 0x000000042fc8c5c0
[ 599.832734] [Hardware Error]: Bus Unit Error: SNP error during data copyback.
[ 599.832741] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: SNP

Still happens with linux-image-3.4.0-030400rc5-generic_3.4.0-030400rc5.201205011817_amd64!

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report at bugzilla.kernel.org [1]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

[1] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Maarten Bezemer (veger) wrote :

Thanks for taking the time to report this bug in the upstream bug tracking system this is a tremendous help. Launchpad has the ability to watch lots of upstream bug trackers and this can be done by following the procedure documented at https://wiki.ubuntu.com/Bugs/Watches. I've added the bug watch for this bug report.

Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Changed in linux:
status: Confirmed → Invalid
mlindeblom (mlindeblom) wrote :

I just upgraded from 12.10 to 13.04.
I now get this error message in dmesg:

[ 3297.587238] [Hardware Error]: MC2_ADDR: 0x00000001d31aa180
[ 3297.588899] [Hardware Error]: Bus Unit Error: EV error during data copyback.
[ 3297.590563] [Hardware Error]: cache level: L2, tx: GEN, mem-tx: EV

mlindeblom (mlindeblom) wrote :

I have 2 disks on the same machine with 13.04.
The one that is a 1T hard drive has no problems.
The one on a 12o GB SSD produces the results listed in previous comment.
This drive has discard added in fstab.

mlindeblom (mlindeblom) wrote :

I replaced CPU only to find problem still existed.
A BIOS upgrade fixed the problem for me.

Not sure if this is still relevant and/or active, however, I just happened to get this error too. If someone could suggest any troubleshooting methods, that would be great.

[ 18.460750] [Hardware Error]: MC0 Error: Internal error condition type 2.
[ 18.460784] [Hardware Error]: Error Status: Uncorrected, software containable error.
[ 18.460802] [Hardware Error]: CPU:2 (15:1:2) MC0_STATUS[-|UE|MiscV|-|-|-|-]: 0xb880000000020f0f
[ 18.460832] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)

$ uname -a
Linux mytux 3.8.0-25-generic #37-Ubuntu SMP Thu Jun 6 20:47:07 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-8150 Eight-Core Processor
stepping : 2
microcode : 0x6000629
cpu MHz : 1400.000
cache size : 2048 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
bogomips : 7248.07
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb

Same thing repeated for rest of the 7 cores.

MOBO: GA-970A-D3 rev 1.1 Latest BIOS
RAM: G.Skill Sniper DDR3 4Gx2 1600 F3-12800CL9D-8GBSR
HDD: 1TB, 500GB, 320GB, 160GB All Seagate

Any help is appreciated.

Charles Lindsay, this bug report is being closed due to your last comment https://bugzilla.kernel.org/show_bug.cgi?id=43205#c7 regarding this being due to faulty CPU. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status. Thank you again for taking the time to report this bug and helping to make Ubuntu better. Please submit any future bugs you may find.

Changed in linux (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.