mcelog errors and server freeze with qemu-kvm 0.12.3 and linux-image-2.6.32-32-server

Bug #809313 reported by David Mayr
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Hello,

we run several hundred servers with Ubuntu 10.04 as virtualisation nodes (with about 20-30 virtual machines each) with qemu-kvm 0.12.3 and had to find out the hard way that some kernel regression was introduced in linux-image-2.6.32-32-server that made our servers quite instable.

Once every few days they just froze randomly and showed nothing but a black screen. We then changed back to linux-image-2.6.32-31-server and got our systems stable again. Unfortunately we were not able to reproduce this behaviour.

After booting a crashed machine, we found entries like the following in syslog. Opposed to what it says, I'm quite sure that it's not a hardware bug, as the same machines just run fine with 2.6.32-31 kernel.

We also tried several upstream kernels from 2.6.36, 2.6.37, 2.6.38 and even 2.6.39 series - all with the same problem.

------------
mcelog: failed to prefill DIMM database from DMI data
mcelog: Kernel does not support page offline interface
mcelog: HARDWARE ERROR. This is *NOT* a software problem!
mcelog: Please contact your hardware vendor
mcelog: MCE 0
mcelog: CPU 0 BANK 5
mcelog: MISC 7fff ADDR 3fff81024ae8
mcelog: TIME 1310033176 Thu Jul 7 12:06:16 2011
mcelog: MCG status:
mcelog: MCi status:
mcelog: Error overflow
mcelog: Uncorrected error
mcelog: Error enabled
mcelog: MCi_MISC register valid
mcelog: MCi_ADDR register valid
mcelog: Processor context corrupt
mcelog: MCA: Internal Timer error
mcelog: STATUS fe00000000800400 MCGSTATUS 0
mcelog: MCGCAP 1c09 APICID 0 SOCKETID 0
mcelog: CPUID Vendor Intel Family 6 Model 26
mcelog: HARDWARE ERROR. This is *NOT* a software problem!
mcelog: Please contact your hardware vendor
mcelog: MCE 1
mcelog: CPU 1 BANK 5
mcelog: MISC 7fff ADDR 3fffa003b652
mcelog: TIME 1310033176 Thu Jul 7 12:06:16 2011
mcelog: MCG status:
mcelog: MCi status:
mcelog: Error overflow
mcelog: Uncorrected error
mcelog: Error enabled
mcelog: MCi_MISC register valid
mcelog: MCi_ADDR register valid
mcelog: Processor context corrupt
mcelog: MCA: Internal Timer error
mcelog: STATUS fe00000000800400 MCGSTATUS 0
mcelog: MCGCAP 1c09 APICID 2 SOCKETID 0
mcelog: CPUID Vendor Intel Family 6 Model 26
------------

Cheers,
David

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-32-server (not installed)
Regression: Yes
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.32-32.62-server 2.6.32.38+drm33.16
Uname: Linux 2.6.32-32-server x86_64
NonfreeKernelModules: sch_htb xt_physdev xt_mac ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ebtable_nat ebtables snd_hda_codec_atihdmi fbcon tileblit font bitblit softcursor vga16fb vgastate kvm_intel kvm ip6table_filter ip6_tables xt_tcpudp bridge nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables snd_hda_intel x_tables radeon snd_hda_codec stp ttm snd_hwdep drm_kms_helper snd_pcm snd_timer snd drm i2c_algo_bit soundcore snd_page_alloc lp parport multipath linear 3w_9xxx 3w_xxxx raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 e1000 sata_nv ahci aacraid r8169 mii sata_sil sata_via
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
AplayDevices: Error: [Errno 2] No such file or directory
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D3p', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info: Error: [Errno 2] No such file or directory
Card0.Amixer.values: Error: [Errno 2] No such file or directory
Date: Tue Jul 12 14:02:14 2011
Frequency: Once every few days.
HibernationDevice: RESUME=UUID=27b2f2d3-0ec2-4f22-9f6d-c857b6830ab6
InstallationMedia:

IwConfig: Error: [Errno 2] No such file or directory
MachineType: MSI MS-7522
ProcCmdLine: root=/dev/mapper/vg0-root ro
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34.7
RfKill: Error: [Errno 2] No such file or directory
SourcePackage: linux
WifiSyslog:

dmi.bios.date: 11/02/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: V8.14
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: MSI X58 Pro-E (MS-7522)
dmi.board.vendor: MSI
dmi.board.version: 3.0
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: MICRO-STAR INTERNATIONAL CO.,LTD
dmi.chassis.version: 3.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrV8.14:bd11/02/2010:svnMSI:pnMS-7522:pvr3.0:rvnMSI:rnMSIX58Pro-E(MS-7522):rvr3.0:cvnMICRO-STARINTERNATIONALCO.,LTD:ct3:cvr3.0:
dmi.product.name: MS-7522
dmi.product.version: 3.0
dmi.sys.vendor: MSI

Revision history for this message
David Mayr (n-launchpad-davey-de) wrote :
Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Thibaut (t-britz) wrote :

We are also affected by this:

From a pool from 100 Servers running

2.6.38-10-server #46-Ubuntu SMP Tue Jun 28 16:31:00 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

we constantly see random machines crashing (always other machines) with a black screen and no entries in the kern.log.

Mcelog shows the following error:

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 0 BANK 5
MISC 7fff ADDR 3fff81035b70
TIME 1313834489 Sat Aug 20 12:01:29 2011
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26

Revision history for this message
Thibaut (t-britz) wrote :

We don't run any virtualisation, but heavy IO.

Revision history for this message
Sebastian Nickel (sebastian-nickel) wrote :

This issue is still present in "linux-image-2.6.32-34-server". Is there any update to this bug?

Revision history for this message
Vladimir Popovski (vladimir.p) wrote :

Have anybody found a root cause of this issue? We are running Natty 2.6.38-8 and experienced it as well.
Any applicable workarounds? We tried to force Linux to generate a core dump in such situations, but were unsuccessful.

Revision history for this message
Dan Kegel (dank) wrote :

I can repeat a very similar panic at will by running a tiny test program that just retrieves the manufacturer field of the inserted DVD using a scsi passthrough command in wine (with either win32 or win64). This is on an updated ubuntu 11.10 x86_64.
 I only have a photo of my kernel panic, but here's the full mcelog:

mcelog: failed to prefill DIMM database from DMI data
Kernel does not support page offline interface
mcelog: mcelog read: No such device
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 5
MISC 7fff ADDR 3fff812fa524
TIME 1328052557 Tue Jan 31 15:29:17 2012
MCG status:
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26
Hardware event. This is not a software error.
MCE 1
CPU 2 BANK 5
MISC 7fff ADDR 3fff8102fbe0
TIME 1328052557 Tue Jan 31 15:29:17 2012
MCG status:
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 4 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26

Revision history for this message
Dan Kegel (dank) wrote :

See also bug 924596, which is somebody else's similar but less reproducible bug.

Revision history for this message
Dan Kegel (dank) wrote :

Sigh. Bug 924596 is my bug report with more info about the mcelog I pasted above. I was distracted in my previous post.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.3 kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.3-rc2 fixed the issue, the tag would be: 'kernel-fixed-upstream-v3.3-rc2'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.

Thanks in advance.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-rc2-precise/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
David Mayr (n-launchpad-davey-de) wrote :

Thanks for you time, Joseph. Since yesterday we are running kernel 3.3-rc2 on four of our nodes for testing - I will report any instability here.

Revision history for this message
Markus Schade (lp-markusschade) wrote :

Hi Joseph,

David asked me to follow up on this bug.
Previously we had also tested the oneiric-lts-backports-kernel (3.0.0-15.26~lucid1-server). This had the same issue, crashing within a day or two.

We have been running 3.3-rc2 on these four nodes for 13 days now without any incident. So it seems, this kernel is stable. The downside is, that after a managedsave/restore cycle, the VMs are dead. While the VM is restored from the saved state, it is no longer responding to any inputs from the console or via network.
There was an issue that a restored VM crashed immediatly after being restored, but that was fixed in 3.0. Now the VMs does not crash, but is simply stuck.

http://permalink.gmane.org/gmane.linux.kernel.commits.head/304494

Revision history for this message
Markus Schade (lp-markusschade) wrote :

Just to follow up on this.

The 3.3-rc2 ran for almost 2 months without any issues.
We have also re-tested the more recent lucid kernels on a coouple of machines and as of 2.6.32-38-server #83-Ubuntu, the crashes have not ocurred anymore. We will give it some more time on more machines, but right now it looks like this bug could be closed as fixed.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Markus,

So does the issue not happen with the officially supported Ubuntu Linux kernels? Or just the 3.3-rc2 upstream kernel?

Revision history for this message
Markus Schade (lp-markusschade) wrote :

We had re-tested the regular Ubuntu kernels. With so many changes since -32, we may never know, which commit had triggered this and was reverted or fixed in later versions. It also may or may not have been a Ubuntu change.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Are the Ubuntu kernels not exhibiting this bug now?

Revision history for this message
Markus Schade (lp-markusschade) wrote :

They were. But as of -40, the crashes are back. -38 was very stable. -39 did not get much field testing, but was showing some crashes.
I have attached the dmesg error we have been getting.

In parallel, we are also testing the -oneiric-lts-backport kernel. That is also stable in version 3.0.0-15-server #26~lucid1-Ubuntu. But as of 3.0.0-17-server #30~lucid1-Ubuntu the crashes and mce errors are back. The systems barely running a couple of hours.

HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 0
CPU 0 BANK 5
MISC 7fff ADDR 3fff81031d60
TIME 1333609850 Thu Apr 5 09:10:50 2012
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 1
CPU 1 BANK 5
MISC 7fff ADDR 3fff81031d60
TIME 1333609850 Thu Apr 5 09:10:50 2012
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 2 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 2
CPU 2 BANK 5
MISC 7fff ADDR 3fff81031d60
TIME 1333609850 Thu Apr 5 09:10:50 2012
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 4 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
MCE 3
CPU 3 BANK 5
MISC 7fff ADDR 3fff81031d60
TIME 1333609850 Thu Apr 5 09:10:50 2012
MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS fe00000000800400 MCGSTATUS 0
MCGCAP 1c09 APICID 6 SOCKETID 0
CPUID Vendor Intel Family 6 Model 26

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you run a memory check on that system?

Revision history for this message
Markus Schade (lp-markusschade) wrote :

We always do a burn in and memory test before putting systems in production. Should a system exhibit multiple crashes, we revert back to a know stable kernel version. If that doesn't solve the problem, we move the virtual machines off and re-test the hardware. We are also talking about dozens of identical systems, which exhibit the same behaviour, which is also the reason why it is so frustrating. We never know if the new version will be stable or not.
I know that memory errors are a common cause for such problems and I would completely agree to do a memtest86 again, but
the point is, before we upgraded to the kernels mentioned above, the systems have been stable, some for more than a year (if you discount the necessary reboot after 208 days because of the overflow of the sched clock).

I haven't given up on -40 yet, but the best uptime we have ome up so far is 8 days, and we can only have so many crashes before have to go back to a stable version (currently 2.6.32-38 or 3.0.0-15) in order to keep customers happy.

Revision history for this message
Markus Schade (lp-markusschade) wrote :

the overflow bug, I mentioned, is this one: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/805341
which is fixed in 2.6.32-38.

Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Markus Schade (lp-markusschade) wrote :

We have traced most of the 2.6.32-40 crashes to memory errors, but I will wait a couple of days to be sure.
Also we are slowly moving towards the -oneiric-backports kernel in preparation for the precise release, the 3.0.0-17 kernels has almost definitely introduced a regression (mce error in comment #16). I think the uptimes of the systems speak for themselves:

     # Uptime | System Boot up
----------------------------+---------------------------------------------------
     1 60 days, 09:08:35 | Linux 3.0.0-15-server Tue Jan 31 09:43:58 2012
-> 2 5 days, 05:23:23 | Linux 3.0.0-15-server Thu Apr 5 09:22:03 2012
     3 3 days, 14:32:26 | Linux 3.0.0-17-server Sun Apr 1 18:47:57 2012
     4 0 days, 10:05:16 | Linux 3.0.0-17-server Sat Mar 31 21:12:03 2012
     5 0 days, 09:48:18 | Linux 3.0.0-17-server Sun Apr 1 07:57:30 2012

Similar on one of the systems, we had tested the generic kernel:

     # Uptime | System Boot up
----------------------------+---------------------------------------------------
     1 56 days, 01:37:30 | Linux 3.3.0-030300rc2-ge Fri Feb 3 10:59:03 2012
     2 5 days, 23:50:19 | Linux 3.0.0-17-server Fri Mar 30 13:37:49 2012
     3 3 days, 13:07:19 | Linux 3.0.0-17-server Thu Apr 5 15:05:29 2012
-> 4 1 day , 00:14:27 | Linux 3.0.0-15-server Mon Apr 9 14:34:48 2012
     5 0 days, 09:40:52 | Linux 3.0.0-17-server Mon Apr 9 04:52:40 2012

To give a better impression of the kernel versions and number of affected systems, I have put together some stats:

Kernel | max uptime | #hosts |
---------------------------------------------
2.6.32-31-server | 208 | 100++
2.6.32-39-server | 28 | >10
3.0.0-15-server | 70 | <50
3.0.0-17-server | 12 | <10
2.6.32-38-server | 48 | <50
2.6.32-40-server | 15 | >10

Revision history for this message
Markus Schade (lp-markusschade) wrote :
Download full text (4.8 KiB)

Although we have moved to the oneiric backports kernel, we are still seeing these MCE error. In some cases replacing RAM seems to solve the issue. But we are still not quite certain, that this is actually the cause because we have found the following:

May 13 15:03:39 node8 kernel: [260532.465745] BUG: unable to handle kernel paging request at ffff8806514603d0
May 13 15:03:39 node8 kernel: [260532.465803] IP: [<ffffffff81156327>] remove_rmap_item_from_tree+0xe7/0x150
May 13 15:03:39 node8 kernel: [260532.465845] PGD 1c04063 PUD 0
May 13 15:03:39 node8 kernel: [260532.465875] Oops: 0002 [#1] SMP
May 13 15:03:39 node8 kernel: [260532.465906] CPU 1
May 13 15:03:39 node8 kernel: [260532.465913] Modules linked in: cls_u32 sch_sfq sch_htb xt_physdev xt_mac ip6table_filter ip6_tables ib_iser
 rdma_cm ib_cm ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iw_cm ib_sa xt_state nf_conntrack ib_mad ib_core ipt_REJECT
 xt_tcpudp ib_addr iptable_filter ip_tables iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi x_tables nouveau ttm drm_kms_helper drm i2c_
algo_bit mxm_wmi lp kvm_intel wmi kvm speedstep_lib serio_raw parport i7core_edac edac_core video bridge stp multipath linear 3w_9xxx 3w_xxxx
 raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 e1000 r8169 aacraid ahci libahci sata_nv sata_sil sata_via
May 13 15:03:39 node8 kernel: [260532.466384]
May 13 15:03:39 node8 kernel: [260532.466407] Pid: 50, comm: ksmd Not tainted 3.0.0-19-server #33~lucid1-Ubuntu MSI MS-7522/MSI X58 Pro-E (MS-7522)
May 13 15:03:39 node8 kernel: [260532.466470] RIP: 0010:[<ffffffff81156327>] [<ffffffff81156327>] remove_rmap_item_from_tree+0xe7/0x150
May 13 15:03:39 node8 kernel: [260532.466527] RSP: 0018:ffff88061aaafdd0 EFLAGS: 00010202
May 13 15:03:39 node8 kernel: [260532.466557] RAX: 0000000000022200 RBX: ffff88034830a6c0 RCX: 0000000000000034
May 13 15:03:39 node8 kernel: [260532.466607] RDX: 0000000000000000 RSI: ffffea001523e498 RDI: ffff8806514603a8
May 13 15:03:39 node8 kernel: [260532.466659] RBP: ffff88061aaafdf0 R08: 0000000000000000 R09: 24c0000000000000
May 13 15:03:39 node8 kernel: [260532.466710] R10: ffff8805fc06f000 R11: 0000000000000001 R12: ffff8806173c79d8
May 13 15:03:39 node8 kernel: [260532.466758] R13: ffffea001523e498 R14: ffff880617775b68 R15: 0000000000000000
May 13 15:03:39 node8 kernel: [260532.466807] FS: 0000000000000000(0000) GS:ffff88063fc20000(0000) knlGS:0000000000000000
May 13 15:03:39 node8 kernel: [260532.466858] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
May 13 15:03:39 node8 kernel: [260532.466889] CR2: ffff8806514603d0 CR3: 0000000001c03000 CR4: 00000000000026e0
May 13 15:03:39 node8 kernel: [260532.466938] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 13 15:03:39 node8 kernel: [260532.466986] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 13 15:03:39 node8 kernel: [260532.467036] Process ksmd (pid: 50, threadinfo ffff88061aaae000, task ffff8806174eae40)
May 13 15:03:39 node8 kernel: [260532.467085] Stack:
May 13 15:03:39 node8 kernel: [260532.467107] ffffea001512c9b8 ffff8802f6f7c180 ffff88034830a6c0 ffff88...

Read more...

Revision history for this message
Markus Schade (lp-markusschade) wrote :
Download full text (6.2 KiB)

it doesn't get much better. Recent call trace with 3.0.0-19 from oneric backports

Jun 11 02:10:13 node2 kernel: [2387197.859644] ------------[ cut here ]------------
Jun 11 02:10:13 node2 kernel: [2387197.859653] WARNING: at /build/buildd/linux-lts-backport-oneiric-3.0.0/net/sched/sch_generic.c:255 dev_wat
chdog+0x24d/0x260()
Jun 11 02:10:13 node2 kernel: [2387197.859656] Hardware name: MS-7522
Jun 11 02:10:13 node2 kernel: [2387197.859658] NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out
Jun 11 02:10:13 node2 kernel: [2387197.859660] Modules linked in: cls_u32 sch_sfq sch_htb xt_physdev xt_mac ip6table_filter ip6_tables ipt_MA
SQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp nouveau iptable_filter ip_tables x_ta
bles kvm_intel kvm ib_iser speedstep_lib rdma_cm ttm ib_cm iw_cm ib_sa ib_mad ib_core drm_kms_helper ib_addr drm iscsi_tcp libiscsi_tcp psmou
se libiscsi i2c_algo_bit scsi_transport_iscsi mxm_wmi wmi serio_raw i7core_edac edac_core lp parport video bridge stp multipath linear 3w_9xx
x 3w_xxxx raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 e1000 r8169 aacraid ahci libahci
 sata_nv sata_sil sata_via
Jun 11 02:10:13 node2 kernel: [2387197.859708] Pid: 2285, comm: kvm Not tainted 3.0.0-19-server #33~lucid1-Ubuntu
Jun 11 02:10:13 node2 kernel: [2387197.859710] Call Trace:
Jun 11 02:10:13 node2 kernel: [2387197.859712] <IRQ> [<ffffffff81061ccf>] warn_slowpath_common+0x7f/0xc0
Jun 11 02:10:13 node2 kernel: [2387197.859720] [<ffffffff81061dc6>] warn_slowpath_fmt+0x46/0x50
Jun 11 02:10:13 node2 kernel: [2387197.859724] [<ffffffff81049e89>] ? sched_slice+0x59/0xa0
Jun 11 02:10:13 node2 kernel: [2387197.859728] [<ffffffff8152376d>] dev_watchdog+0x24d/0x260
Jun 11 02:10:13 node2 kernel: [2387197.859731] [<ffffffff81523520>] ? __netdev_watchdog_up+0x80/0x80
Jun 11 02:10:13 node2 kernel: [2387197.859735] [<ffffffff81071bd9>] call_timer_fn+0x49/0x130
Jun 11 02:10:13 node2 kernel: [2387197.859738] [<ffffffff81523520>] ? __netdev_watchdog_up+0x80/0x80
Jun 11 02:10:13 node2 kernel: [2387197.859741] [<ffffffff81071ff9>] run_timer_softirq+0x149/0x280
Jun 11 02:10:13 node2 kernel: [2387197.859745] [<ffffffff812fd9f0>] ? timerqueue_add+0x60/0xb0
Jun 11 02:10:13 node2 kernel: [2387197.859749] [<ffffffff8102911d>] ? lapic_next_event+0x1d/0x30
Jun 11 02:10:13 node2 kernel: [2387197.859753] [<ffffffff81068c0f>] __do_softirq+0xbf/0x200
Jun 11 02:10:13 node2 kernel: [2387197.859757] [<ffffffff81089167>] ? hrtimer_interrupt+0x127/0x210
Jun 11 02:10:13 node2 kernel: [2387197.859761] [<ffffffff8160e15c>] call_softirq+0x1c/0x30
Jun 11 02:10:13 node2 kernel: [2387197.859764] [<ffffffff8100d415>] do_softirq+0x65/0xa0
Jun 11 02:10:13 node2 kernel: [2387197.859767] [<ffffffff81068a0d>] irq_exit+0xbd/0xe0
Jun 11 02:10:13 node2 kernel: [2387197.859770] [<ffffffff8160ea9e>] smp_apic_timer_interrupt+0x6e/0x99
Jun 11 02:10:13 node2 kernel: [2387197.859772] [<ffffffff8160d913>] apic_timer_interrupt+0x13/0x20
Jun 11 02:10:13 node2 kernel: [2387197.859774] <EOI> [<ffffffffa02a1e97>] ? start_apic_timer+0x57/0x80 [kvm]
Jun 11 02:10:13 node2 ker...

Read more...

Revision history for this message
Markus Schade (lp-markusschade) wrote :

Since using the oneric backports kernel most systems are running stable now for a long time. We have moved on to precise or at least to using the precise kernel (while still keeping the system at lucid) and have not encountered this error since.

While it would be nice to know, which change exactly fixed this, we are just happy to have stable systems and up2date kernels.

So I would close this as fixed in oneiric/precise.

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.