Bug #333201 “Virtual machine soft lockup - CPU gets stuck for XX...” : Bugs : linux package : Ubuntu

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-02-23:

#1

kvm_crash.txt Edit (13.7 KiB, text/plain)

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-02-23:

#2

dmesg.log Edit (20.2 KiB, text/plain)

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-02-23:

#3

lspci-vnvn.log Edit (3.1 KiB, text/plain)

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-02-23:

#4

uname-a.log Edit (90 bytes, text/plain)

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-02-23:

#5

version.log Edit (27 bytes, text/plain)

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-02-23:

#6

Sorry if you got lots of e-mails there, I couldn't see how to add multiple attachments.

I have more information on this.

For the virtual server host where none of our virtual machines suffer from this, we are running the 2.6.27-7-server kernel.
For the virtual server host where just one of our virtual machines suffer from this, we are running the 2.6.27-11-server kernel.

Are there any changes between the two versions that could cause something like this? Would you recommend I try the older kernel?

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-03-04:

#7

soft_lockup.png Edit (19.9 KiB, image/png)

Download full text (10.9 KiB)

This problem just happened again. Here is an extract from /var/log/syslog on the machine affected.
As before, the other virtual machine on this host was ok.

Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] BUG: soft lockup - CPU#1 stuck for 61s! [fcheck:21720]
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] Modules linked in: ipv6 evdev psmouse serio_raw button ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic uhci_
hcd ata_piix e1000 usbcore libata scsi_mod dock thermal processor fan
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] CPU 1:
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] Modules linked in: ipv6 evdev psmouse serio_raw button ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic uhci_
hcd ata_piix e1000 usbcore libata scsi_mod dock thermal processor fan
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] Pid: 21720, comm: fcheck Not tainted 2.6.27-11-server #1
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] RIP: 0010:[<ffffffff802abeb7>] [<ffffffff802abeb7>] find_get_pages+0x77/0x110
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] RSP: 0000:ffff880003d37948 EFLAGS: 00000293
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] RAX: ffff880006aaa5e8 RBX: ffff880003d37988 RCX: ffff880006aaa5e8
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe20000126900
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] RBP: ffff880003d378f8 R08: 0000000000000002 R09: 0000000000000001
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] R10: 0000000000000002 R11: ffff880003d37a78 R12: ffffffff802b6d84
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] R13: ffff880003d37988 R14: 0000000000000001 R15: 0000000000000800
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] FS: 00007f1991ff36e0(0000) GS:ffff88000f495180(0000) knlGS:0000000000000000
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] CR2: 0000000000dfb190 CR3: 000000000f15e000 CR4: 00000000000006e0
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322]
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] Call Trace:
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff802abe83>] ? find_get_pages+0x43/0x110
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff802b6a04>] ? pagevec_lookup+0x24/0x30
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff802b825b>] ? __invalidate_mapping_pages+0x8b/0x1a0
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff802b4874>] ? get_dirty_limits+0x14/0x2b0
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff803024ae>] ? generic_forget_inode+0x4e/0x190
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff802b8380>] ? invalidate_mapping_pages+0x10/0x20
Mar 3 22:32:25 gla1-mailman1 kernel: [740722.950322] [<ffffffff...

This problem just happened again. Here is an extract from /var/log/syslog on the machine affected.
As before, the other virtual machine on this host was ok.

Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] BUG: soft lockup - CPU#1 stuck for 61s! [fcheck:21720]
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] Modules linked in: ipv6 evdev psmouse serio_raw button ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic uhci_
hcd ata_piix e1000 usbcore libata scsi_mod dock thermal processor fan
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] CPU 1:
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] Modules linked in: ipv6 evdev psmouse serio_raw button ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic uhci_
hcd ata_piix e1000 usbcore libata scsi_mod dock thermal processor fan
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] Pid: 21720, comm: fcheck Not tainted 2.6.27-11-server #1
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] RIP: 0010:[<ffffffff802abeb7>]  [<ffffffff802abeb7>] find_get_pages+0x77/0x110
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] RSP: 0000:ffff880003d37948  EFLAGS: 00000293
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] RAX: ffff880006aaa5e8 RBX: ffff880003d37988 RCX: ffff880006aaa5e8
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe20000126900
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] RBP: ffff880003d378f8 R08: 0000000000000002 R09: 0000000000000001
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] R10: 0000000000000002 R11: ffff880003d37a78 R12: ffffffff802b6d84
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] R13: ffff880003d37988 R14: 0000000000000001 R15: 0000000000000800
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] FS:  00007f1991ff36e0(0000) GS:ffff88000f495180(0000) knlGS:0000000000000000
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] CR2: 0000000000dfb190 CR3: 000000000f15e000 CR4: 00000000000006e0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] 
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] Call Trace:
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802abe83>] ? find_get_pages+0x43/0x110
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b6a04>] ? pagevec_lookup+0x24/0x30
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b825b>] ? __invalidate_mapping_pages+0x8b/0x1a0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b4874>] ? get_dirty_limits+0x14/0x2b0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff803024ae>] ? generic_forget_inode+0x4e/0x190
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b8380>] ? invalidate_mapping_pages+0x10/0x20
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff803022ed>] ? prune_icache+0x27d/0x290
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff8030233f>] ? shrink_icache_memory+0x3f/0x50
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b8d45>] ? shrink_slab+0x135/0x190
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802ba3af>] ? do_try_to_free_pages+0x17f/0x2e0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802ba607>] ? try_to_free_pages+0x67/0x70
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b9120>] ? isolate_pages_global+0x0/0x50
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802b2931>] ? __alloc_pages_internal+0x241/0x510
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802d6679>] ? alloc_page_vma+0x79/0xf0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802c2557>] ? do_wp_page+0x187/0x640
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802c3160>] ? handle_mm_fault+0x3b0/0x470
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff80505e4f>] ? do_page_fault+0x34f/0x750
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff80246463>] ? task_new_fair+0xb3/0x100
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff80245e5d>] ? resched_task+0x2d/0x90
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802fc3d1>] ? locks_remove_posix+0x11/0xc0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff80240cc8>] ? wake_up_new_task+0xc8/0x140
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802f0dc2>] ? pipe_ioctl+0x12/0xb0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802f86b6>] ? vfs_ioctl+0x36/0xb0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff805030be>] ? _spin_lock+0xe/0x20
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff802f8135>] ? set_close_on_exec+0x75/0xb0
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322]  [<ffffffff805034fa>] ? error_exit+0x0/0x70
Mar  3 22:32:25 gla1-mailman1 kernel: [740722.950322] 
Mar  3 22:33:13 gla1-mailman1 postfix/anvil[21708]: statistics: max connection rate 2/60s for (smtp:10.2.8.16) at Mar  3 22:29:51
Mar  3 22:33:13 gla1-mailman1 postfix/anvil[21708]: statistics: max connection count 1 for (smtp:10.2.8.16) at Mar  3 22:29:44
Mar  3 22:33:13 gla1-mailman1 postfix/anvil[21708]: statistics: max cache size 1 at Mar  3 22:29:44
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450326] BUG: soft lockup - CPU#1 stuck for 61s! [fcheck:21720]
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] Modules linked in: ipv6 evdev psmouse serio_raw button ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic uhci_hcd ata_piix e1000 usbcore libata scsi_mod dock thermal processor fan
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] CPU 1:
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] Modules linked in: ipv6 evdev psmouse serio_raw button ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif sg ata_generic uhci_hcd ata_piix e1000 usbcore libata scsi_mod dock thermal processor fan
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] Pid: 21720, comm: fcheck Not tainted 2.6.27-11-server #1
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] RIP: 0010:[<ffffffff802abeac>]  [<ffffffff802abeac>] find_get_pages+0x6c/0x110
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] RSP: 0000:ffff880003d37948  EFLAGS: 00000246
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] RAX: ffff880006aaa5e8 RBX: ffff880003d37988 RCX: ffff880006aaa5e8
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe20000126900
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] RBP: ffff880003d378f8 R08: 0000000000000002 R09: 0000000000000001
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] R10: 0000000000000002 R11: ffff880003d37a78 R12: ffffffff802b6d84
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] R13: ffff880003d37988 R14: 0000000000000001 R15: 0000000000000800
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] FS:  00007f1991ff36e0(0000) GS:ffff88000f495180(0000) knlGS:0000000000000000
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] CR2: 0000000000dfb190 CR3: 000000000f15e000 CR4: 00000000000006e0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] 
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367] Call Trace:
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802abe83>] ? find_get_pages+0x43/0x110
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b6a04>] ? pagevec_lookup+0x24/0x30
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b825b>] ? __invalidate_mapping_pages+0x8b/0x1a0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b4874>] ? get_dirty_limits+0x14/0x2b0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff803024ae>] ? generic_forget_inode+0x4e/0x190
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b8380>] ? invalidate_mapping_pages+0x10/0x20
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff803022ed>] ? prune_icache+0x27d/0x290
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff8030233f>] ? shrink_icache_memory+0x3f/0x50
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b8d45>] ? shrink_slab+0x135/0x190
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802ba3af>] ? do_try_to_free_pages+0x17f/0x2e0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802ba607>] ? try_to_free_pages+0x67/0x70
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b9120>] ? isolate_pages_global+0x0/0x50
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802b2931>] ? __alloc_pages_internal+0x241/0x510
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802d6679>] ? alloc_page_vma+0x79/0xf0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802c2557>] ? do_wp_page+0x187/0x640
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802c3160>] ? handle_mm_fault+0x3b0/0x470
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff80505e4f>] ? do_page_fault+0x34f/0x750
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff80246463>] ? task_new_fair+0xb3/0x100
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff80245e5d>] ? resched_task+0x2d/0x90
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802fc3d1>] ? locks_remove_posix+0x11/0xc0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff80240cc8>] ? wake_up_new_task+0xc8/0x140
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802f0dc2>] ? pipe_ioctl+0x12/0xb0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802f86b6>] ? vfs_ioctl+0x36/0xb0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff805030be>] ? _spin_lock+0xe/0x20
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff802f8135>] ? set_close_on_exec+0x75/0xb0
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]  [<ffffffff805034fa>] ? error_exit+0x0/0x70
Mar  3 22:33:30 gla1-mailman1 kernel: [740788.450367]

The text above is repeated until 23:54:17, where I rebooted the virtual machine. A reboot makes the problem go away for a while.

I have attached a screenshot too.

I'm really keen to get this fixed. I'm happy to go through all sorts of debugging, this is a really important issue for my company. If you need any more details, please just asked and I'll do my best to gather the information needed.

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-03-11:

#8

Hi, sorry to be a pain, but this is a big deal for us.

It hasn't happened since switching to the other kernel. (so it did happen with 2.6.27-11-server but hasn't happened yet with 2.6.27-7-server)

Can I get any more information to help you out? Please let me know.

Revision history for this message

Ali Ross (gnu2tux) wrote on 2009-03-12:

#9

confirm error. Seems to be just 2.6.27-7.

Perhaps hardware specific? Dell Poweredge server here.

Changed in linux:
status:	New → Confirmed

Revision history for this message

Ali Ross (gnu2tux) wrote on 2009-03-12:

#10

I meant 2.6.27-11.

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-04-15:

#11

The "soft lockup" has now also happened with the previous kernel (2.6.27-7-server).
So it's happening with 2.6.27-11-server and also 2.6.27-7-server now.

I will upgrade the kernel on the host and virtual machine and wait.

Revision history for this message

Stephan (stephan-fishycam) wrote on 2009-04-27:

#12

This is now fixed, we upgraded the BIOS on the server (Dell Poweredge 1950).

Revision history for this message

Bryan McLellan (btm) wrote on 2010-03-12:

#13

I occasionally experience this error on a 9.10 guest running 2.6.31-14-server on a 9.10 host with 2.6.31-14-generic and kvm=1:84+dfsg-0ubuntu16+0.11.0+0ubuntu6.3 on an HP DL360 G6.

Revision history for this message

Bryan McLellan (btm) wrote on 2010-03-25:

#14

Still seeing this issue.

Host:
  Updated to the latest BIOS / Firmware on the DL360 G6 to date
  2.6.31-20-generic
  qemu-kvm=0.11.0-0ubuntu6.3
Guest:
  2.6.31-20-server

Installing cpuburn and running two instances of BurnP6 on the guest produced cpu soft lockups within 24 hours.

Revision history for this message

Marcus Bointon (marcus-synchromedia) wrote on 2010-04-09:

#15

I'm getting this on 10.04 beta2 with 2.6.32-19-virtual (in the VM, built with vmbuilder) and 2.6.32-19-server on the host on boot of a VM, rendering virtualization completely inoperable, 100% failure rate. I'm running qemu-kvm 0.12.3+noroms-0ubuntu5

Revision history for this message

Bryan McLellan (btm) wrote on 2010-04-09:

#16

Marcus, what hardware are you experiencing this on?

Revision history for this message

Marcus Bointon (marcus-synchromedia) wrote on 2010-05-04:

#17

kvm failing to run Edit (30.9 KiB, image/png)

Sorry I didn't see your question earlier.
I'm now running the release version of 10.04:
Linux 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux
This particular machine isn't very high powered, but it should be at least usable. It has a single quad-core L5320 Xeon (with vmx), 10Gb RAM, SATA soft raid-1, no other major processes (not even apache), load average < 0.1. FWIW the host OS reboots in under a minute.

Here's CPUinfo on one core:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU L5320 @ 1.86GHz
stepping : 7
cpu MHz : 1866.966
cache size : 4096 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips : 3733.93
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

I've attached a screenshot of the kind of thing it's doing. It takes about 20 minutes to even get as far as this screen though! It's completely unusable - after 50 mins it reached a login prompt, but typed input eitehr doesn't work or is too slow to tell.

This is the command I gave to vmbuilder (which can't get much more vanilla!)

vmbuilder kvm ubuntu \
--suite lucid \
--flavour virtual \
--arch amd64 \
--libvirt qemu:///system \
--hostname vm1 \
--user user \
--name user \
--pass default \
--ip 192.168.0.100 \
--dest /root/vm1

and kvm is run from its generated run.sh file with this:

exec kvm -m 128 -smp 1 -drive file=tmpkJv9OP.qcow2 "$@"

I don't know if this problem is because vmbuilder built a bad image, the config is bad, or because kvm isn't working right.

Sorry I didn't see your question earlier.
I'm now running the release version of 10.04:
Linux 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 2010 x86_64 GNU/Linux
This particular machine isn't very high powered, but it should be at least usable. It has a single quad-core L5320 Xeon (with vmx), 10Gb RAM, SATA soft raid-1, no other major processes (not even apache), load average < 0.1. FWIW the host OS reboots in under a minute.

Here's CPUinfo on one core:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           L5320  @ 1.86GHz
stepping        : 7
cpu MHz         : 1866.966
cache size      : 4096 KB
physical id     : 1
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 4
initial apicid  : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips        : 3733.93
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:

I've attached a screenshot of the kind of thing it's doing. It takes about 20 minutes to even get as far as this screen though! It's completely unusable - after 50 mins it reached a login prompt, but typed input eitehr doesn't work or is too slow to tell.

This is the command I gave to vmbuilder (which can't get much more vanilla!)

vmbuilder kvm ubuntu \
--suite lucid \
--flavour virtual \
--arch amd64 \
--libvirt qemu:///system \
--hostname vm1 \
--user user \
--name user \
--pass default \
--ip 192.168.0.100 \
--dest /root/vm1

and kvm is run from its generated run.sh file with this:

exec kvm -m 128 -smp 1 -drive file=tmpkJv9OP.qcow2 "$@"

I don't know if this problem is because vmbuilder built a bad image, the config is bad, or because kvm isn't working right.

Revision history for this message

Marcus Bointon (marcus-synchromedia) wrote on 2010-05-05:

#18

I managed to find a KVM appliance image (there don't seem to be many around) here:
http://ica-atom.org/docs/index.php?title=ICA-AtoM_virtual_appliance
This VM works with respectable performance and no errors on my server, so it looks like kvm is in the clear. The only thing that was different in the config is RAM allocation (128 vs 256m) in the run script, so I increased it in my generated vm and it still had the same problems, so it looks like vmbuilder is at fault (or JeOS itself).

Revision history for this message

Stephan (stephan-fishycam) wrote on 2010-05-12:

#19

soft_lockup.txt Edit (3.6 KiB, text/plain)

I still get this error from time to time. I'm not on the servers I originally reported the issue with as that company went into administration, so I can't say if they are still doing it.

I get the error on Ubuntu 9.10 and also 10.04. The guest and the host are both 10.04 now. The guests are JeOS.
The host server is a quad core 2.4 Ghz with 8 GB or RAM and tons of spare capacity. It's an ASUS P5K-VM motherboard.

I've attached the latest log file.

Revision history for this message

Bryan McLellan (btm) wrote on 2010-07-29:

#20

10.04.01 Guests built on 9.10 hosts using vmbuilder=0.12.4-0ubuntu1 (from maverick, I had issues with 0.12.3-0ubuntu1, I see there is a newer version in -proposed) still present soft lock ups. It's only one type of server, and tends to not be purely CPU driven although the guests with higher load do present much more often.

Has anyone tried this with a 10.04 (lucid) host or at least kvm=1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9 backported to 9.10 (karmic)?

I can't imagine it would be vmbuilder or JeOS, as the former is mostly a convenience script around deb-bootstrap, libvirt, and such. The latter shouldn't be able to have any essential packages missing because by way of the way it is built, dependencies would be enforced. Thus toolchain issues _should_ effect a greater number of people.

I'm going back to betting on KVM or the kernel. I'll try to narrow it down more, as this is effecting my production systems gravely now.

Revision history for this message

Sergey Svishchev (svs) wrote on 2010-08-10:

#21

This is a long shot, but try another clocksource (normally kvm-clock). I've had to disable kvm-clock altogether for another reason, and since that time "soft lockups" did not happen on 9.10 systems.

Revision history for this message

Steven Wagner (stevenwagner) wrote on 2010-08-20:

#22

I am getting the same error, using Ubuntu Server 10.04 64 bit, with a Ubuntu Server 10.04 64 bit guest. I am trying first to turn off cpu frequency scaling and see if that makes the issue go away. Right now I can't reproduce, but occurs consistently after about 2 weeks of uptime. Sergey- What are the steps to switch off of kvm-clock?

Revision history for this message

Stephan (stephan-fishycam) wrote on 2010-08-21:

#23

I haven't had this problem for months. I don't think anything changed on my side. This is a tricky one!

Revision history for this message

Martin (martin00) wrote on 2010-09-06:

#24

Download full text (4.0 KiB)

I've the same problem with a INTEL P55 Mainboard. Very sad its a productive machine.
I get it randomly all 2-4 weeks on one VM of 6. Always "cpu stuck" and the websites on it are down. I hate it very badly.

Host: 2.6.32-24-server #41-Ubuntu (now installing #42)
VM: 2.6.32-24-server #41-Ubuntu (now installing #42)

syslog crashed VM:
------------------------------------------------------------------------------------------
Sep 6 08:47:51 cluster qmail-smtpd: qmail-smtpd/VC started
Sep 6 08:49:03 cluster kernel: [538123.970006] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Sep 6 08:49:03 cluster kernel: [538123.993618] Modules linked in: fbcon psmouse tileblit font bitblit i2c_piix4 serio_raw softcursor lp vga16fb vgastate parport floppy
Sep 6 08:49:03 cluster kernel: [538123.993618] CPU 0:
Sep 6 08:49:03 cluster kernel: [538123.993618] Modules linked in: fbcon psmouse tileblit font bitblit i2c_piix4 serio_raw softcursor lp vga16fb vgastate parport floppy
Sep 6 08:49:03 cluster kernel: [538123.993618] Pid: 0, comm: swapper Not tainted 2.6.32-24-server #41-Ubuntu Bochs
Sep 6 08:49:03 cluster kernel: [538123.993618] RIP: 0010:[<ffffffff814abd6b>] [<ffffffff814abd6b>] __inet_lookup_established+0x1ab/0x2c0
Sep 6 08:49:03 cluster kernel: [538123.993618] RSP: 0018:ffff880001c03ba0 EFLAGS: 00000202
Sep 6 08:49:03 cluster kernel: [538123.993618] RAX: 000000000001cb93 RBX: ffff880001c03be0 RCX: 0000000091a53275
Sep 6 08:49:03 cluster kernel: [538123.993618] RDX: ffffc900002c8cb0 RSI: ffffffff81a49000 RDI: 000000000001cb93
Sep 6 08:49:03 cluster kernel: [538123.993618] RBP: ffffffff81013cb3 R08: 00000000e3a9806b R09: 0000000000509bf3
Sep 6 08:49:03 cluster kernel: [538123.993618] R10: 0000000000000002 R11: 0000000000000002 R12: ffff880001c03b20
Sep 6 08:49:03 cluster kernel: [538123.993618] R13: 46c0c8c363174858 R14: ffffffff81a46d80 R15: ffffffff8155fe2c
Sep 6 08:49:03 cluster kernel: [538123.993618] FS: 0000000000000000(0000) GS:ffff880001c00000(0000) knlGS:0000000000000000
Sep 6 08:49:03 cluster kernel: [538123.993618] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 6 08:49:03 cluster kernel: [538123.993618] CR2: 00000000f7056000 CR3: 00000000199f1000 CR4: 00000000000006f0
Sep 6 08:49:03 cluster kernel: [538123.993618] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 6 08:49:03 cluster kernel: [538123.993618] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 6 08:49:03 cluster kernel: [538123.993618] Call Trace:
Sep 6 08:49:03 cluster kernel: [538123.993618] <IRQ> [<ffffffff8146d35d>] ? __skb_checksum_complete_head+0x1d/0x70
Sep 6 08:49:03 cluster kernel: [538123.993618] [<ffffffff814c48bf>] ? tcp_v4_rcv+0x1cf/0x7e0
Sep 6 08:49:03 cluster kernel: [538123.993618] [<ffffffff8146906e>] ? consume_skb+0x1e/0x40
Sep 6 08:49:03 cluster kernel: [538123.993618] [<ffffffff814a2cdd>] ? ip_local_deliver_finish+0xdd/0x2d0
Sep 6 08:49:03 cluster kernel: [538123.993618] [<ffffffff814a2f60>] ? ip_local_deliver+0x90/0xa0
Sep 6 08:49:03 cluster kernel: [538123.993618] [<ffffffff814a241d>] ? ip_rcv_finish+0x12d/0x440
Sep 6 08:49:03 cluster kernel: [538123.993618] [<ffffffff814a29...

Ubuntu
linux package

Virtual machine soft lockup - CPU gets stuck for XX seconds

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Changed in linux (Ubuntu):
status:	Confirmed → Won't Fix

Ubuntulinux package

Virtual machine soft lockup - CPU gets stuck for XX seconds

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package