kvm/qemu server "freeze"

Bug #710114 reported by Samuel Wolf
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Invalid
High
Serge Hallyn

Bug Description

I have running a Ubuntu 10.04 Server on a Xeon CPU with kvm/qemu.
Since a kernel update few weeks ago, sometimes my VM die and the VM host too.

I can ping the VM host and I get an answer but can not login via ssh.

Looking directly on the screen shows me the messages as I attached.

qemu-kvm 0.12.3+noroms-0ubuntu9.3

Linux vmhost 2.6.32-28-server #55-Ubuntu SMP Mon Jan 10 23:57:16 UTC 2011 x86_64 GNU/Linux

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :
Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

Happens today again.

affects: ubuntu → qemu-kvm (Ubuntu)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug and helping to make Ubuntu better.

This looks to be a bug in the rmap_remove() function in the kernel kvm module. Unfortunately that part of the code has changed quite a bit so this could be tough to track down. To help narrow things down, can you tell us when was the first time you encountered this? Had you been running for a long time on lucid before that with no problems? And is there anything in particular which you can do in your guest to reliably reproduce this?

Changed in qemu-kvm (Ubuntu):
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

Yes lucid running from june 2010 to october 2010 without problems on this hardware.
In november or december 2010 the server crash first time with the error.
Dont find a way to reliably reproduce the problem, happens one time in the night and the other two times in middle of the day.

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

I produce a crash today.

Server running with a older Kernel at the moment (for testing):
Linux vmhost 2.6.32-26-server #48-Ubuntu SMP Wed Nov 24 10:28:32 UTC 2010 x86_64 GNU/Linux

What I do:
- ssh -X root@vmhost (from my client)
- start virt-manager
- open the screen of a virtual winxp prof (which standing on the windows login screen)
- press shutdown on the windows login screen and a half second later the vmhost server crash

I do not want to try again :-/

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

So a lucid host, and winxp professional guest. Installed using the defaults from virt-manager? I"ll give that a shot when I can boot my host into lucid.

Changed in qemu-kvm (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Incomplete → Fix Committed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(*&$% guis

Changed in qemu-kvm (Ubuntu):
status: Fix Committed → In Progress
Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

Yes installed using the defaults from virt-manager.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@slu

The next time this happens, could you please run

apport-collect 710114

So we can hopefully get all the information we need to figure out what's going on?

Changed in qemu-kvm (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

@ Serge,

Server crash today again.
I start "apport-collect 710114" on the terminal, but I can not send the bugreport with my launchpad login :-(

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

can you look under /var/crash for the info collected by apport-collect? If you can log in on terminal (after a reboot if need be) and do

  apport-bug /var/crash/_qemu-kvm.YYYY.crash

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Am installing a winxp sp2 VM right now on a lucid host to test with.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hm, qemu-kvm 0.12.3+noroms-0ubuntu9.4 and kernel 2.6.32-30-generic #59-Ubuntu, I was not able to reproduce. Although I was doing it over vnc. Let me try over ssh -X.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Still could not reproduce.

Since KSM shows up in the stack trace, could you try disabling KSM by doing

    echo 0 > /sys/kernel/mm/ksm/run

and then try again?

Please also append your /etc/default/qemu-kvm and the result of

   virsh dumpxml (winxp_vm_name)

to this report.

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

Hi,

don't look so good:
root@vmhost:/var/crash# ls -la
insgesamt 8
drwxrwxrwt 2 root root 4096 2010-04-13 22:52 .
drwxr-xr-x 14 root root 4096 2010-06-04 01:37 ..
root@vmhost:/var/crash#

Linux vmhost 2.6.32-30-server #59-Ubuntu SMP Tue Mar 1 22:46:09 UTC 2011 x86_64 GNU/Linux
qemu-kvm 0.12.3+noroms-0ubuntu9.4

I have the server kernel, you test with the generic, maybe there is a difference?
The server work without problems for the last 6 weeks, maybe we have a hardware problem?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 710114] Re: kvm/qemu server "freeze"

Quoting slu (<email address hidden>):
> Hi,
>
> don't look so good:
> root@vmhost:/var/crash# ls -la
> insgesamt 8
> drwxrwxrwt 2 root root 4096 2010-04-13 22:52 .
> drwxr-xr-x 14 root root 4096 2010-06-04 01:37 ..
> root@vmhost:/var/crash#
>
> Linux vmhost 2.6.32-30-server #59-Ubuntu SMP Tue Mar 1 22:46:09 UTC 2011 x86_64 GNU/Linux
> qemu-kvm 0.12.3+noroms-0ubuntu9.4
>
> I have the server kernel, you test with the generic, maybe there is a difference?

Good thinking. I can pursue reproducing that.

> The server work without problems for the last 6 weeks, maybe we have a hardware problem?

That certainly is possible.

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

We have three Ubuntu 10.04 servers (64Bit) and two WinXP Prof. 32Bit on this quem-kvm server running.
Maybe there is no problem with the WinXP but in combination with the Ubuntu Servers.

It looks like there is no way to reproduce the problem reliable and I am the only person who is affected.

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

root@vmhost:~# cat /etc/default/qemu-kvm
# To disable qemu-kvm's page merging feature, set KSM_ENABLED=0 and
# sudo restart qemu-kvm

KSM_ENABLED=1
#SLEEP_MILLISECS=2000
root@vmhost:~#

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Tried to reproduce twice on the server kernel. still couldn't hit it.

Can you try and reproduce with ksm disabled?

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

KSM is disabled:
root@vmhost:~# cat /etc/default/qemu-kvm
# To disable qemu-kvm's page merging feature, set KSM_ENABLED=0 and
# sudo restart qemu-kvm

KSM_ENABLED=0
#SLEEP_MILLISECS=2000
root@vmhost:~#

Can not really reproduce the problem, the server running sometimes for four weeks without a crash.
My plan is now to reinstall the server with Debian 6.0, when we have the trouble again it musst be a hardware problem.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@slu,

I'm confused. In comment #18 you said KSM was enabled, and I thought the crash was 100% reproducible. Is comment #20 about a different server? Or are you saying that with KSM disabled you cannot reproduce the crash?

If it's not 100% reproducible (or even 50%) then maybe I just need to spend more time reproducing?

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

@ Serge,
yes KSM was enabled all the time and crashes, I disable it yesterday for testing as you write in comment #14.

And no I can not reproducing the crash, sorry, I have no idea where the problem is.

My first idea was, because the server crash every few weeks that the windows auto restart function after windows updates was the problem, so I disable the automatically restart of windows. But the server still crash every few weeks again.

To find out its a KSM problem I have to wait now, after disable KSM support, some weeks.
The hardware work 3 years fine, but maybe it is now really a hardware problem.

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

The server has been running for 10 weeks without any problem on debian 6.0.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

In comment #22, you say you are waiting to find out if disabling KSM solved it. Do I gather from comment #23 that you reproduced the crash with KSM disabled?

Revision history for this message
Samuel Wolf (samuel-wolf) wrote :

Server dont crash with disabled KSM, but I test it only 7 days that was not enough to give a answer!
Because we had a lot of trouble with the crashes and dont know its maybe a hardware failure we switch to debian with the same hardware.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the info, slu. I'm sorry we couldn't nail this down in lucid. Since we have no other reporters to help track down the cause and have lost the only hardware on which it was found, I'm going to mark this invalid.

---
Ubuntu Bug Squad volunteer triager
http://wiki.ubuntu.com/BugSquad

Changed in qemu-kvm (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.