BUG: soft lockup - CPU#1 stuck for 61s! [kvm:19760]; RIP: 0010:[<ffffffffa0482ee0>] [<ffffffffa0482ee0>] kvm_write_guest_time+0x180/0x1b0 [kvm]

Bug #556919 reported by C de-Avillez on 2010-04-06
74
This bug affects 13 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
John Johansen

Bug Description

Happening sInce some time after beta1 (I stopped running KVM for a while after the beta1 tests were done). Running kvm (via testdrive) against the ISOs.

Almost always my laptop will hard-freeze. The screen is still painted under X, but no keys work. I cannot shift to a vTerm, and cannot use the magic keys to release from X. SSHing from another system fails (no route/response to/from host). Already established SSH sessions will not respond.

All in all, it really sounds like a back kernel panic. The only option is to cycle power, wait for fsck to finish, and start again.

On the few times I do not get a hard-freeze, a running KVM will suddenly display on the terminal header "QEMU stopped". After that, ending this KVM run and starting a new one will have KVM unable to recognise the guest vdisk (just-created/formatted). Rebooting returns KVM to work -- until the hard-freeze.

This has happened today about 20 times. I have been unable to find anything in the logs pointing to an error, so far. Will now boot the -16 and -18 kernels to see if it is reproducible there; will also try a vanilla kernel.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-19-generic 2.6.32-19.28
Regression: Yes
Reproducible: No
ProcVersionSignature: Ubuntu 2.6.32-19.28-generic 2.6.32.10+drm33.1
Uname: Linux 2.6.32-19-generic x86_64
AlsaVersion:
 Advanced Linux Sound Architecture Driver Version 1.0.22.1.
 Compiled on Apr 1 2010 for kernel 2.6.32-19-generic (SMP).
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: cerdea 3476 F.... pulseaudio
 /dev/snd/controlC0: cerdea 3476 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'SB'/'HDA ATI SB at 0xfebfc000 irq 16'
   Mixer name : 'SigmaTel STAC9205'
   Components : 'HDA:838476a0,102801fd,00100204 HDA:14f12c06,14f1000f,00100000'
   Controls : 19
   Simple ctrls : 11
Card1.Amixer.info:
 Card hw:1 'Pro'/'Creative Labs VF0400 Live! Cam Notebook Pro at usb-0000:00:13.4-1, full speed'
   Mixer name : 'USB Mixer'
   Components : 'USB041e:4061'
   Controls : 0
   Simple ctrls : 0
Card1.Amixer.values:

CheckboxSubmission: 23f39d583e289591efa3692115f3f5f2
CheckboxSystem: d00f84de8a555815fa1c4660280da308
Date: Tue Apr 6 18:06:40 2010
EcryptfsInUse: Yes
Frequency: multiple times per day.
HibernationDevice: RESUME=UUID=4212d0ae-0265-44ed-ac84-7b7cd5669fe4
MachineType: Dell Inc. Inspiron 1721
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-19-generic root=/dev/mapper/sys-root ro crashkernel=384M-2G:64M,2G-:128M console=tty1 security=apparmor vbe_mode=0x176
ProcEnviron:
 LC_TIME=en_DK.utf8
 PATH=(custom, no user)
 LANG=en_US.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.33
SourcePackage: linux
dmi.bios.date: 04/21/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A07
dmi.board.name: 0RT951
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA07:bd04/21/2008:svnDellInc.:pnInspiron1721:pvr:rvnDellInc.:rn0RT951:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Inspiron 1721
dmi.sys.vendor: Dell Inc.

C de-Avillez (hggdh2) wrote :
Dustin Kirkland  (kirkland) wrote :

John-

This sounds like a fairly serious kvm issue. What more would you need from Carlos to triage this?

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kvm
description: updated
Changed in linux (Ubuntu):
assignee: nobody → John Johansen (jjohansen)
Dustin Kirkland  (kirkland) wrote :

Carlos,

Can you test each of the last few Lucid kernels and tell us which do, and don't exhibit this behavior?

C de-Avillez (hggdh2) wrote :

unfortunately I only have the -18 and -19 nowadays (had to clean up /boot two weeks ago). So far, both of them hard-freeze.

C de-Avillez (hggdh2) wrote :

Well, ends up the /var/log/kern.log has the OOPSes, at least some of them.

I remember restarting the system more times than that, but here are the start lines for the ones in kern.log:

2010-04-06T09:51:15.847238-05:00 xango kernel: [404958.512507] Pid: 19760, comm: kvm Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T09:52:21.325918-05:00 xango kernel: [405024.012504] Pid: 19760, comm: kvm Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T11:54:34.871798-05:00 xango kernel: [ 615.262504] Pid: 29049, comm: kvm Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T11:55:40.361466-05:00 xango kernel: [ 680.752505] Pid: 29049, comm: kvm Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T11:56:45.861619-05:00 xango kernel: [ 746.252505] Pid: 29049, comm: kvm Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T16:37:37.301945-05:00 xango kernel: [ 1378.692507] Pid: 3715, comm: kvm Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T18:36:40.964533-05:00 xango kernel: [ 1006.102504] Pid: 3853, comm: kvm Not tainted 2.6.32-18-generic #27-Ubuntu Inspiron 1721
2010-04-06T18:44:53.063287-05:00 xango kernel: [ 415.964548] Pid: 0, comm: swapper Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T18:44:53.063483-05:00 xango kernel: [ 415.964652] <#DB[1]> <<EOE>> Pid: 0, comm: swapper Not tainted 2.6.32-19-generic #28-Ubuntu
2010-04-06T18:44:53.064378-05:00 xango kernel: [ 415.965340] Pid: 2011, comm: hald-addon-inpu Not tainted 2.6.32-19-generic #28-Ubuntu Inspiron 1721
2010-04-06T18:44:53.064613-05:00 xango kernel: [ 415.965447] <#DB[1]> <<EOE>> Pid: 2011, comm: hald-addon-inpu Not tainted 2.6.32-19-generic #28-Ubuntu

C de-Avillez (hggdh2) wrote :

I can install a mainline kernel if you wish, but I would like to know which one -- there are so many of them...

ar (aranenko-hotmail) on 2010-04-11
tags: added: intel32
ar (aranenko-hotmail) wrote :

Using KVM with 10.4 freezes all systems (keyboard, mouse) when switching between them no matter what system was the first AMD/Intel 32/64.
I think that system does not receive expected signals from KVM, only real devices are legal ones, even if comes through KVM.

ar (aranenko-hotmail) wrote :

All is needed: switch ports on KVM where keyboard and mouse are connected to, and then, correspondingly, switch cables on every computer that KVM serves. In other words, just reconnect keyboard and mouse, and cables on every computer.
Better have cables "colored" differently, say black for mouse, grey for keyboard, or vice versa ...
KVM somehow knows the difference between keyboard and mouse, and passes that knowledge to boxes.
ar

C de-Avillez (hggdh2) wrote :

@ar: this bug is about the kernel virtualisation, not about Keyboard/Video/Mouse switchers...

ar (aranenko-hotmail) wrote :

okay. sorry, about that ...

C de-Avillez (hggdh2) wrote :

mainline kernels do not finish boot, unfortunately

Manish Regmi (mregmi) wrote :

I am also having the same problems. The problem is same on lastest git pull from kvm project.
For me the problem only seem to exist when running kvm as root (with sdl).
I will try to reproduce the problem tonight.

Carlos-

Can you confirm? Does this only happen when running as root?

I am not sure. It is indeed a fact that all last tries were while I was running the automated server tests. It does happen that it requires me to run as root.

I also had problems while running testdrive, but now I am unsure if it was caused by having run the AST before testdrive , in the *same* boot.

What I have just found is that I can usually (partially) run some 3 ASTs (about 3 KVM created each) before the kernel barfs. It is partially because I am adjusting the responses, and after one such adjustment I have to cancel the run, and start over again.

So: yes, I have had all my latest OOPS while running as root. I will have to confirm if this would also happen while running as a standard user.

the problem is very strange to be. i tested it in 2 kernels 2.6.31-14-generic and 2.6.34-rc3 (from kvm git) below are the scenarios.

i did not find problem with 2.6.31-14-generic (but there might be problem with newer ubuntu shipped kernel but i haven't updated them because i use kvm git)

-no-kvm does not seem to have any effect (i also tried manually removing kvm-intel and kvm modules). so does not look like the problem with kernel modules.

running as normal user and sudo does not have problems

running as -nodisplay or in vnc does not have any problems.

only running as root with sdl seems to be problematic.

if someone else can conform this. that would be great. let me know if you need any log files. i did not see any hints in log files. may be i am looking at wrong place.

Thank you

Andrew Cowie (afcowie) wrote :

We just hit this, in the form of:

   CPU#1 stuck for 66s!

in dmesg. What's interesting is that the host pygtk virt-manager's "CPU graph" freezes (along with the rest of the UI). Not sure if it's the same bug, but it was a brand new lucid-server install as guest in a brand new lucid-server as host; I'd just finally gotten to the point where SSH was up in the guest, tried to do `apt-get update`, and it froze... and recovered, eventually, sometimes...

AfC

tags: added: kernel-core kernel-reviewed
Brad Figg (brad-figg) on 2010-12-03
tags: added: acpi-bad-address
Brad Figg (brad-figg) on 2011-04-06
Changed in linux (Ubuntu):
status: New → Confirmed
tags: removed: regression-potential
HughDaniel (hugh-toad) wrote :

This bug is STILL active in current (2012/02feb) Ubuntu's. I am getting bit by this on two server class machines, even hand running a single kvm client instance can freeze the server. This bug is happening on 2.6.32-38 Ubuntu-SMP kernels with the server kernel. I can also note that it got worse as of the third week of November 2011 (not sure what was updated then, but something...). I might be able to provide remote access to a machine with this issue, but as best I can tell it's a TOTAL freeze of the system.

marko (markoschuetz) wrote :

+1

this is on a i7 laptop running with 8 cores, 4 in the VM. I'm running Oneiric and the 3.0.0-16-generic kernel.

The irony in it is that I converted to KVM because I was expecting VirtualBox (4.1.8) to be responsibe for stability issues (reboots) and wanted to compare the behavior using KVM.

summary: - running KVM almost always hard-freezes the host
+ BUG: soft lockup - CPU#1 stuck for 61s! [kvm:19760]; RIP:
+ 0010:[<ffffffffa0482ee0>] [<ffffffffa0482ee0>]
+ kvm_write_guest_time+0x180/0x1b0 [kvm]
tags: added: kernel-bug
removed: intel32

C de-Avillez, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? Can you try with the latest development release of Ubuntu? ISO CD images are available from http://cdimage.ubuntu.com/releases/ .

If it remains an issue, could you run the following command in the development release from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux <replace-with-bug-number>

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please do not test the kernel in the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. As well, please comment on which kernel version specifically you tested.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream', and comment as to why specifically you were unable to test it.

Please let us know your results. Thanks in advance.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Patrick La Fratta (plafratt) wrote :

I am having a very simliar problem trying to boot Raring in testdrive. Host OS is Ubuntu 12.10. The error message I get says:

"BUG: soft lock - CPU#0 stuck for 22s! [modprobe 1428]"

Patrick La Fratta, if you have a bug in Ubuntu, could you please file a new report by executing the following in a terminal:
ubuntu-bug linux

For more on this, please see the Ubuntu Kernel team article:
https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports

the Ubuntu Bug Control team and Ubuntu Bug Squad team article:
https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue

and Ubuntu Community article:
https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Please note, not filing a new report may delay your problem being addressed as quickly as possible.

Thank you for your understanding.

Patrick La Fratta (plafratt) wrote :

Christopher,

I was able to get around this problem by using Virtual-Box instead of KVM. After talking with others, I get the impression that problems with KVM are somewhat common. Should I still file a bug report?

Thanks,
Patrick

Patrick La Fratta, if you would like the issue you are experiencing with KVM addressed, yes.

fmaste (fmaste) wrote :
Download full text (3.3 KiB)

Same here on 15.04

[ 9876.328007] perf interrupt took too long (3213 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[37825.813778] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [qemu-system-x86:1159]
[37825.813784] Modules linked in: vhost_net vhost macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables nls_iso8859_1 intel_rapl ppdev intel_powerclamp snd_hda_codec_hdmi coretemp kvm_intel snd_hda_codec_realtek snd_hda_codec_generic kvm joydev snd_intel_sst_acpi lpc_ich serio_raw snd_hda_intel snd_intel_sst_core snd_hda_controller snd_soc_rt5640 snd_soc_sst_mfld_platform snd_hda_codec snd_soc_rl6231 snd_soc_core snd_hwdep parport_pc snd_compress parport snd_pcm_dmaengine 8250_fintek snd_pcm dw_dmac dw_dmac_core i2c_hid snd_timer mei_txe snd spi_pxa2xx_platform rfkill_gpio mei mac_hid i2c_designware_platform i2c_designware_core soundcore snd_soc_sst_acpi iosf_mbi pwm_lpss_platform shpchp 8250_dw pwm_lpss autofs4 btrfs xor raid6_pq xts gf128mul dm_crypt hid_generic usbhid hid crct10dif_pclmul i915 crc32_pclmul ghash_clmulni_intel cryptd i2c_algo_bit drm_kms_helper psmouse r8169 drm ahci mii libahci video sdhci_acpi sdhci
[37825.813873] CPU: 0 PID: 1159 Comm: qemu-system-x86 Tainted: G W 3.19.0-30-generic #34-Ubuntu
[37825.813875] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q1900B-ITX, BIOS P1.70 11/04/2014
[37825.813878] task: ffff88013956c4b0 ti: ffff8800aa2f8000 task.ti: ffff8800aa2f8000
[37825.813881] RIP: 0010:[<ffffffff817cba92>] [<ffffffff817cba92>] _raw_spin_lock+0x32/0x80
[37825.813889] RSP: 0018:ffff8800aa2fbdc8 EFLAGS: 00000202
[37825.813891] RAX: 0000000000003346 RBX: ffff8800acb2a0d0 RCX: 0000000000000002
[37825.813894] RDX: 0000000000005600 RSI: 0000000000005602 RDI: ffffc9001074d144
[37825.813896] RBP: ffff8800aa2fbdc8 R08: 0000000000005602 R09: 0000000000000000
[37825.813898] R10: 0000000000000000 R11: 0000000000000000 R12: dead000000100100
[37825.813900] R13: ffffffff8120a2f0 R14: ffff8800aa2fbc58 R15: ffff880000000000
[37825.813903] FS: 00007f2ee3f02980(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[37825.813906] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[37825.813908] CR2: fffff8a0021db080 CR3: 00000000aa0ee000 CR4: 00000000001027f0
[37825.813910] Stack:
[37825.813912] ffff8800aa2fbe38 ffffffff810f0d4c ffffc9001074d140 ffffc9001074d144
[37825.813916] ffffffff00000004 00007f2ee40dc000 ffff8800933dc000 0000000000000800
[37825.813920] ffff880036176900 00007f2ee40dc800 0000000000000081 0000000000000000
[37825.813924] Call Trace:
[37825.813932] [<ffffffff810f0d4c>] futex_wake+0x9c/0x140
[37825.813937] [<ffffffff810f3a87>] do_futex+0x107/0x5d0
[37825.813943] [<ffffffff81356cf2>] ? common_file_perm+0x42/0xf0
[37825.813947] [<ffffffff810f3fc6>] SyS_futex+0x76/0x170
[37825.813953] [<ffffffff811f5b9a>] ? SyS_write+0x6a/0xb0
[37825.813957] [<ffffffff817cbe8d>] system_call_fastpath+0x16/0x1b
[37825.813959] Code: 8...

Read more...

fmaste, it will help immensely if you filed a new report via a terminal:
ubuntu-bug linux

Please feel free to subscribe me to it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers