Ubuntu

32bit rhel and centos 5.(5|6) hangs on boot on natty

Reported by gdahlman on 2011-06-10
52
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Natty
Medium
Stefan Bader
qemu-kvm (Ubuntu)
Medium
Unassigned
Natty
Undecided
Unassigned

Bug Description

SRU Justification:

Impact: An upstream change in the kvm code in 2.6.37 causes regressions running 32bit guest under 11.04 (Natty) KVM.

Fix: Cherry-picking a single upstream patch (which went into 2.6.39) fixes the issue.

Testcase: Booting 32bit rhel and centos 5.(5|6) guests under Natty used to hang, but has been verified to succeed with a test kernel including the upstream patch.

---

Binary package hint: qemu-kvm

I have several guest images that ran fine under 10.10, RHEL and or CENTOS 5.5 and 5.6 x86 all hang either from ISO or from disk image on boot, some times it will find the LVM and mount and crash sometimes it will crash after NASH starts.

Description: Ubuntu 11.04
Release: 11.04

root@usebskvm004:~# virsh version
Compiled against library: libvir 0.8.8
Using library: libvir 0.8.8
Using API: QEMU 0.8.8
Running hypervisor: QEMU 0.14.0

Description: Ubuntu 11.04
Release: 11.04

gdahlman (gdahlman) wrote :

64Bit Centos and RHEL work fine, this system worked fine until I ran a dist upgrade from 10.10 yesterday. I have also tried to boot CENTOS 5.6 x86 on a clean 11.04 install and it locks up.

I have tried all of the compatible CPU emulation types in virt-manager also.

from /var/log/libvirt/qemu/

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc-0.12 -cpu core2duo,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+pbe,+tm,+ht,+ss,+acpi,+ds -enable-kvm -m 8192 -smp 6,sockets=6,cores=1,threads=1 -name ebspatchapp -uuid 30078a69-9e54-ac48-de57-3e3d9111ea3f -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ebspatchapp.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=readline -rtc base=utc -boot order=dc,menu=off -drive file=/dev/usebskvm004/ebspatchapp_root,if=none,id=drive-virtio-disk0,boot=on,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive file=/var/lib/libvirt/images/LinuxISOs/ubuntu-8.04-server-i386.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=/dev/usebskvm004/ebspatchapp_a01,if=none,id=drive-virtio-disk1,format=raw -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -netdev tap,fd=18,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:09:29,bus=pci.0,addr=0x3 -usb -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
kvm: -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:79:09:29,bus=pci.0,addr=0x3: pci_add_option_rom: failed to find romfile "pxe-virtio.bin"

gdahlman (gdahlman) wrote :

As an update I can boot when the guest is restricted to a single cpu, thus this must be a regression with SMP.

gdahlman (gdahlman) on 2011-06-10
tags: added: kvm libvirt qemu-kvm virtualization
gdahlman (gdahlman) wrote :

apport-bug qemu-kvm from a fresh install host having the issue

gdahlman (gdahlman) wrote :

apport-bug qemu-kvm from system upgraded from 10.10 that could run the images before

Scott Moser (smoser) wrote :

In your kvm command line above, it says "file=/var/lib/libvirt/images/LinuxISOs/ubuntu-8.04-server-i386.iso". So I'm guessing its at least not the exact log that failed (since you mention centos 5.5 and 5.6).

Also when the guest crashes, could you get a dmesg ?

Scott Moser (smoser) wrote :

I tried to reproduce using the CentOS 5.5 netboot iso and also the live cd but did not get any crashes on initial boot. I tried on a much smaller system, though, (2cpu 4G).

Were you using DVD image?

Download full text (3.5 KiB)

I tried many ISO's and the systems that worked before I did a dist upgrade failed to boot also
I just tried it with 2 cpu's and 2GB of ram and it still failed.
I do not get to a point where I can get dmesg output but I have included a screenshot
I have reproduced this issue on a Dell R710, 2950 and R610.
I installed with the 11.04 Server ISO image.
Re-installing 10.10 server allowed the guests to boot as did dropping the vcpu number down to 1
Here are the ISO images I was using for the guests.
root@usmirror001:/export/isos/Operating_Systems/Linux# isoinfo -d -i CentOS-5.5-i386-bin-DVD.isoCD-ROM is in ISO 9660 formatSystem id: LINUXVolume id: CentOS_5.5_FinalVolume set id: Publisher id: CentOS ProjectData preparer id: CentOSApplication id: CentOS_5.5_FinalCopyright File id: Abstract File id: Bibliographic File id: Volume set size is: 1Volume set sequence number is: 1Logical block size is: 2048Volume size is: 2043515El Torito VD version 1 found, boot catalog is in sector 406Joliet with UCS level 3 foundRock Ridge signatures version 1 foundEltorito validation header: Hid 1 Arch 0 (x86) ID 'CentOS Project' Key 55 AA Eltorito defaultboot header: Bootid 88 (bootable) Boot media 0 (No Emulation Boot) Load segment 0 Sys type 0 Nsect 4 Bootoff 197 407
root@usmirror001:/export/isos/Operating_Systems/Linux# isoinfo -d -i CentOS-5.6-i386-LiveCD.iso CD-ROM is in ISO 9660 formatSystem id: LINUXVolume id: CentOS-5.6-i386-LiveCDVolume set id: Publisher id: Data preparer id: Application id: MKISOFS ISO 9660/HFS FILESYSTEM BUILDER & CDRECORD CD-R/DVD CREATOR (C) 1993 E.YOUNGDALE (C) 1997 J.PEARSON/J.SCHILLINGCopyright File id: Abstract File id: Bibliographic File id: Volume set size is: 1Volume set sequence number is: 1Logical block size is: 2048Volume size is: 354800El Torito VD version 1 found, boot catalog is in sector 36Joliet with UCS level 3 foundRock Ridge signatures version 1 foundEltorito validation header: Hid 1 Arch 0 (x86) ID '' Key 55 AA Eltorito defaultboot header: Bootid 88 (bootable) Boot media 0 (No Emulation Boot) Load segment 0 Sys type 0 Nsect 4 Bootoff 25 37

> Date: Thu, 23 Jun 2011 18:50:42 +0000
> From: <email address hidden>
> To: <email address hidden>
> Subject: [Bug 795717] Re: 32bit rhel and centos 5.(5|6) hangs on boot on natty
>
> I tried to reproduce using the CentOS 5.5 netboot iso and also the live
> cd but did not get any crashes on initial boot. I tried on a much
> smaller system, though, (2cpu 4G).
>
> Were you using DVD image?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/795717
>
> Title:
> 32bit rhel and centos 5.(5|6) hangs on boot on natty
>
> Status in “qemu-kvm” package in Ubuntu:
> New
>
> Bug description:
> Binary package hint: qemu-kvm
>
> I have several guest images that ran fine under 10.10, RHEL and or
> CENTOS 5.5 and 5.6 x86 all hang either from ISO or from disk image on
> boot, some times it will find the LVM and mount and crash sometimes it
> will crash after NASH starts.
>
> Description: Ubu...

Read more...

gdahlman (gdahlman) wrote :

I tried many ISO's and the systems that worked before I did a dist upgrade failed to boot also

I just tried it with 2 cpu's and 2GB of ram and it still failed.

I do not get to a point where I can get dmesg output but I have included a screenshot

I have reproduced this issue on a Dell R710, 2950 and R610.

I installed with the 11.04 Server ISO image.

Re-installing 10.10 server allowed the guests to boot as did dropping the vcpu number down to 1

Here are the ISO images I was using for the guests.

root@usmirror001:/export/isos/Operating_Systems/Linux# isoinfo -d -i CentOS-5.5-i386-bin-DVD.iso
CD-ROM is in ISO 9660 format
System id: LINUX
Volume id: CentOS_5.5_Final
Volume set id:
Publisher id: CentOS Project
Data preparer id: CentOS
Application id: CentOS_5.5_Final
Copyright File id:
Abstract File id:
Bibliographic File id:
Volume set size is: 1
Volume set sequence number is: 1
Logical block size is: 2048
Volume size is: 2043515
El Torito VD version 1 found, boot catalog is in sector 406
Joliet with UCS level 3 found
Rock Ridge signatures version 1 found
Eltorito validation header:
    Hid 1
    Arch 0 (x86)
    ID 'CentOS Project'
    Key 55 AA
    Eltorito defaultboot header:
        Bootid 88 (bootable)
        Boot media 0 (No Emulation Boot)
        Load segment 0
        Sys type 0
        Nsect 4
        Bootoff 197 407

root@usmirror001:/export/isos/Operating_Systems/Linux# isoinfo -d -i CentOS-5.6-i386-LiveCD.iso
CD-ROM is in ISO 9660 format
System id: LINUX
Volume id: CentOS-5.6-i386-LiveCD
Volume set id:
Publisher id:
Data preparer id:
Application id: MKISOFS ISO 9660/HFS FILESYSTEM BUILDER & CDRECORD CD-R/DVD CREATOR (C) 1993 E.YOUNGDALE (C) 1997 J.PEARSON/J.SCHILLING
Copyright File id:
Abstract File id:
Bibliographic File id:
Volume set size is: 1
Volume set sequence number is: 1
Logical block size is: 2048
Volume size is: 354800
El Torito VD version 1 found, boot catalog is in sector 36
Joliet with UCS level 3 found
Rock Ridge signatures version 1 found
Eltorito validation header:
    Hid 1
    Arch 0 (x86)
    ID ''
    Key 55 AA
    Eltorito defaultboot header:
        Bootid 88 (bootable)
        Boot media 0 (No Emulation Boot)
        Load segment 0
        Sys type 0
        Nsect 4
        Bootoff 25 37

Serge Hallyn (serge-hallyn) wrote :

Let's try to simplify. What happens if you run kvm by hand as such:

qemu-img create root.img 10G
kvm -m 8192 -smp 6 -vga cirrus -boot d -drive file=root.img,if=virtio,index=0 -cdrom /var/lib/libvirt/images/LinuxISOs/ubuntu-8.04-server-i386.iso

?

Dave Walker (davewalker) wrote :

Marking incomplete as we are awaiting feedback on how to reproduce.

Changed in qemu-kvm (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
Adam Jacob (adamhjk) wrote :

I'm having identical behavior - with more than one vcpu, I can't get a 32 bit CentOS system to boot. 64bit seems fine.

Serge Hallyn (serge-hallyn) wrote :

@Adam,

can you reproduce this booting from an iso? If so, can you give us the exact kvm command line (or libvirt .xml) and the url of the iso image you used?

Dick Tump (dicktump) wrote :

If I boot from the ISO "CentOS-5.5-i386-bin-DVD.iso" from the CentOS FTP, the instance will hang at the same point as booting the instance from the virtual disk. It is solved by disabling SMP (using just 1 CPU).

This is my KVM commandline:
kvm -daemonize -enable-kvm -smp 8 -smp 8,cores=4 -m 2048 -vnc 0.0.0.0:4 -usb -usbdevice mouse -drive file=/dev/vservers/test10,index=0,media=disk,if=virtio -drive file=/root/iso/CentOS-5.5-i386-bin-DVD.iso,index=0,media=cdrom,if=ide -boot order=d -net nic,vlan=0,macaddr=00:77:95:ab:91:be,model=virtio -net tap,vlan=0,ifname=test10,script=/usr/share/vcluster/ifup.pl,downscript=/usr/share/vcluster/ifdown.pl -net nic,vlan=1,macaddr=00:77:95:e9:05:5c,model=virtio -net tap,vlan=1,ifname=test11,script=/usr/share/vcluster/ifup.pl,downscript=/usr/share/vcluster/ifdown.pl -monitor unix:/tmp/test1,server,nowait -pidfile /tmp/test1.pid

I am running KVM 0.14.0+noroms-0ubuntu4.3 (compiled on 10.04 using source debs) on Ubuntu 10.04 LTS with kernel 2.6.32-33-server. I know this is probably not a supported combination, but since this is the same problem, I thought it might be useful to still answer on this bugreport.

Attached is a VNC screenshot. It is already hanging for about 15 minutes on this point.

Dick Tump (dicktump) wrote :

After some testing, I have found a possible cause. The instance with SMP and CentOS i386 seems to boot fine when using a different clocksource.

You can test this by booting the CD/DVD image with: linux clocksource=acpi_pm

I have tested the KVM clocksource and the acpi_pm clocksource several times and the KVM one always seems to crash. The acpi_pm one always works.

This might be a bug in de CentOS kernel?

gdahlman (gdahlman) wrote :

I did set clocksource=acpi_pm and it will boot,

The regression is in the Ubuntu kernel, it works fine on fc12 and 10.10 but 11.04 will not let them boot

It would seem really odd that the kvm-clock paravirt clock source in the current kernel would be the prime suspect IMHO.

gdahlman (gdahlman) wrote :

I think I found the kernel bug that causes this, but I can not find a good link that is not spam mail list aggregation site.

The subject is "KVM: fix kvmclock regression due to missing clock update" and it appears there is a fix in 2.6.39

I will try to find hardware to test this on.

gdahlman (gdahlman) wrote :

I have verified that by downloading and upgrading to the following kernel the issue is resolved, the issue still exists with 2.6.38-10

http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.39-rc4-natty/

I successfully booted from CD on RHEL 5.4 and Centos 5.5 on CD and installed with no issue with 8 cpu's

Here is the text of the commit I think fixes the issue.

commit 1aa8ceef0312a6aae7dd863a120a55f1637b361d
Author: Nikola Ciprich <email address hidden>
Date: Wed Mar 9 23:36:51 2011 +0100

    KVM: fix kvmclock regression due to missing clock update

    commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu),
    breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function
    to proper place.

    Signed-off-by: Nikola Ciprich <email address hidden>
    Acked-by: Zachary Amsden <email address hidden>
    Signed-off-by: Avi Kivity <email address hidden>

gdahlman (gdahlman) wrote :

Here is the fix for the current natty kernel source, it appears to be a typo with a closing bracket one line too high.

Serge Hallyn (serge-hallyn) wrote :

Thanks very much for finding the fix!

Marked the bug as affecting kernel. This patch is not yet in the oneiric kernel.

Changed in qemu-kvm (Ubuntu):
status: Incomplete → Invalid
Changed in linux (Ubuntu):
importance: Undecided → Medium
Stefan Bader (smb) wrote :

As far as I can see the patch

commit 1aa8ceef0312a6aae7dd863a120a55f1637b361d
Author: Nikola Ciprich <email address hidden>
Date: Wed Mar 9 23:36:51 2011 +0100

    KVM: fix kvmclock regression due to missing clock update

    commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time
    breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock up
    to proper place.

    Signed-off-by: Nikola Ciprich <email address hidden>
    Acked-by: Zachary Amsden <email address hidden>
    Signed-off-by: Avi Kivity <email address hidden>

is in Oneiric (came in with 2.6.39) but the attached patch is the inverse of it. It does not have been submitted to stable, so Natty would still be affected.

Stefan Bader (smb) wrote :

For Oneiric, this should be fixed already.

Changed in qemu-kvm (Ubuntu Natty):
status: New → Invalid
Changed in linux (Ubuntu):
status: New → Fix Released
Stefan Bader (smb) wrote :

To proceed with getting this into Natty I created test kernels at http://people.canonical.com/~smb/lp795717/
If somebody can confirm that using those works, then we can proceed with getting the fix SRUed.

Changed in linux (Ubuntu Natty):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → Medium
status: New → In Progress
gdahlman (gdahlman) wrote :

Thanks Stefan,

I have confirmed that the following kernels work.

linux-image-2.6.38-11-server_2.6.38-11.48+lp795717v1_amd64.deb
linux-image-2.6.38-11-generic_2.6.38-11.48+lp795717v1_amd64.deb

They were installed on three machines

Dell PowerEdge R710 with 96GB ram and 2 * Xeon X5680
Dell PowerEdge 2950 with 32GB ram and 2 * Xeon E5420
Dell OptiPlex 960 with 8GB ram and 1 * Core2 Duo E8400

I can install and boot the following operating systems, all with 8 vcpus.

Ubuntu 8.04 server i386 (current_clocksource was tsc)
Ubuntu 10.10 desktop i386
RHEL 5.4 i386
RHEL 5.5 i386 (boot existing image only)
Centos 5.5 i386
Centos 5.5 amd64
Centos 5.6 i386 LiveCD (no install)

I also had no issues booting 64bit windows 7 guests with two vcpus.

Thanks,

Greg

Stefan Bader (smb) on 2011-08-10
description: updated
Stefan Bader (smb) wrote :

Thanks Greg. I have gone ahead and submitted the patch for SRU.

Stefan Bader (smb) on 2011-08-11
Changed in linux (Ubuntu Natty):
status: In Progress → Fix Committed
Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Stefan Bader (smb) wrote :

Greg, could you check with the proposed kernel, too? Process requires it and I don't have the setup ready. Would also help to ensure that the new kernel brings no other problem. :) Thanks.

Herton R. Krzesinski (herton) wrote :

Verification that this bug is fixed has not been completed by the deadline for the current
stable kernel release cycle. The change was reverted and this bug is being set to
incomplete.

In order to have this fix considered for reapplication to the kernel, please follow the
process documented here:

https://wiki.ubuntu.com/Kernel/StableReleaseCadence

Discussions about the new process tend to take place in #ubuntu-kernel on IRC, so please
contribute to the discussion there if you would like.

Thank you!

tags: added: verification-reverted-natty
removed: verification-needed-natty
Changed in linux (Ubuntu Natty):
status: Fix Committed → Incomplete
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-11.50

---------------
linux (2.6.38-11.50) natty-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #848246

  [ Upstream Kernel Changes ]

  * Revert "eCryptfs: Handle failed metadata read in lookup"
  * Revert "KVM: fix kvmclock regression due to missing clock update"
  * Revert "ath9k: use split rx buffers to get rid of order-1 skb
    allocations"

linux (2.6.38-11.49) natty-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #836903

  [ Adam Jackson ]

  * SAUCE: drm/i915/pch: Fix integer math bugs in panel fitting
    - LP: #753994

  [ Keng-Yu Lin ]

  * SAUCE: Input: ALPS - Enable Intellimouse mode for Lenovo Zhaoyang E47
    - LP: #632884, #803005

  [ Stefan Bader ]

  * [Config] Force perf to use libiberty for demangling
    - LP: #783660

  [ Tim Gardner ]

  * [Config] Add enic/fnic to udebs
    - LP: #801610

  [ Upstream Kernel Changes ]

  * eeepc-wmi: add keys found on EeePC 1215T
    - LP: #812644
  * eCryptfs: Handle failed metadata read in lookup
    - LP: #509180
  * pagemap: close races with suid execve, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * report errors in /proc/*/*map* sanely, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * close race in /proc/*/environ, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * auxv: require the target to be tracable (or yourself), CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * deal with races in /proc/*/{syscall, stack, personality}, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * vmscan: fix a livelock in kswapd
    - LP: #813797
  * mmc: Add PCI fixup quirks for Ricoh 1180:e823 reader
    - LP: #773524
  * mmc: Added quirks for Ricoh 1180:e823 lower base clock frequency
    - LP: #773524
  * rose: Add length checks to CALL_REQUEST parsing, CVE-2011-1493
    - LP: #816550
    - CVE-2011-1493
  * pata_marvell: Add support for 88SE91A0, 88SE91A4
    - LP: #777325
  * GFS2: make sure fallocate bytes is a multiple of blksize, CVE-2011-2689
    - LP: #819572
    - CVE-2011-2689
  * Bluetooth: l2cap and rfcomm: fix 1 byte infoleak to userspace.
    - LP: #819569
    - CVE-2011-2492
  * drm/nv50-nvc0: work around an evo channel hang that some people see
    - LP: #583760
  * KVM: fix kvmclock regression due to missing clock update
    - LP: #795717
  * Add mount option to check uid of device being mounted = expect uid,
    CVE-2011-1833
    - LP: #732628
    - CVE-2011-1833
  * proc: fix oops on invalid /proc/<pid>/maps access, CVE-2011-1020
    - LP: #813026
    - CVE-2011-1020
  * ipv6: make fragment identifications less predictable, CVE-2011-2699
    - LP: #827685
    - CVE-2011-2699
  * ath9k: use split rx buffers to get rid of order-1 skb allocations
    - LP: #728835
  * perf: Fix software event overflow, CVE-2011-2918
    - LP: #834121
    - CVE-2011-2918
 -- Herton Ronaldo Krzesinski <email address hidden> Mon, 12 Sep 2011 17:23:38 -0300

Changed in linux (Ubuntu Natty):
status: Incomplete → Fix Released
Stefan Bader (smb) wrote :

Note that the automatic statement of "released" is misleading. This patch has been reverted as it was not verified and the kernel without the change is now released.

Thilo Uttendorfer (t-lo) wrote :

I'm a bit confused, maybe someone can clarify this.

At https://lkml.org/lkml/2011/3/25/54 it is mentioned that there still exists a underlying bug with kvmclock. I assume that this is not fixed in 2.6.38-11.50. Is it fixed in newer kernels? I assume that older kernels (like the one in lucid) are probably also affected by this underlying bug?

Serge Hallyn (serge-hallyn) wrote :

@Stefan,

given comment #27, shouldn't the status of this bug for natty be 'incomplete'?

Stefan Bader (smb) wrote :

@Thilo, it seems that the discussion was about whether there should be a better fix but there does not seem to be any confirmation about any real problems found with either version. As of today only the patch as it was queued has made it upstream. So this would be the change that would go into natty. And no, there will be no fix before someone who is able to verify the change steps up and asks for the patch to be reconsidered. Please see comment #27.

@Serge, right. I'll change it back.

Changed in linux (Ubuntu Natty):
status: Fix Released → Incomplete
gdahlman (gdahlman) wrote :

Sorry after installing Serge's test kernel I did not have the time to test the proposed version but his did fix the issue.

Unfortunately it appears that the Ubuntu kernel in 11.04 also has this issue.

                if (tsc_delta < 0)
                        mark_tsc_unstable("KVM discovered backwards TSC");
                if (check_tsc_unstable()) {
                        kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta);
                        vcpu->arch.tsc_catchup = 1;
                }
                kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
                if (vcpu->cpu != cpu)
                        kvm_migrate_timers(vcpu);
                vcpu->cpu = cpu;
        }
}

I have made the same diff change I added above and the issue goes away.

gdahlman (gdahlman) wrote :

Please ignore #33... apparently I forgot how to run uname

The host had reverted to an older kernel on the upgrade to 11.10

This patch has yet to be put in lucid-proposed 2.6.38 backport package: linux-lts-backport-natty

I found this out while working on LP: #882579

I added the missing patch to linux-lts-backport-natty-2.6.38-12.51~lucid1
and uploaded the resulting linux-lts-backport-natty-2.6.38-12.51.1~ppa1~lucid1 to this PPA:
https://launchpad.net/~nutznboltz/+archive/kvm-clock-fix-for-2.6.38-on-lucid

Stefan Bader (smb) wrote :

If you loock at comment #32 and comment #27, the patch got reverted in natty because there was no one verifying the change. It also gives a link that should tell how to get it reconsidered to go into natty. If that is done, it will be in the lts backport automatically.

@Stefan I'm in a good position to test as soon as the PPA completes building. I've had all sorts of issues relating to this and a number of systems to try it on.

I build a 32-bit RHEL 5 compatible Qemu/KVM guest, really Scientific Linux 5.7. The guest was built from SL.57.090911.DVD.i386.disc1.iso plus today's updates.

The guest runs fine with 1 CPU defined and hangs trying to boot with 2 CPUs defined so I can reproduce this 100% with the following VM host:

$ lsb_release -ds
Ubuntu 10.04.3 LTS
$ uname -srvm
Linux 2.6.38-12-server #51~lucid1-Ubuntu SMP Thu Sep 29 20:09:53 UTC 2011 x86_64

Qemu/KVM/Libvirt is the one from Natty backported to Lucid via PPA:
https://launchpad.net/~nutznboltz/+archive/kvm-libvirt-lts

$ dpkg -l | grep 0.14.0+noroms-0ubuntu4.4~ppa1
ii qemu-common 0.14.0+noroms-0ubuntu4.4~ppa1 qemu common functionality (bios, documentati
ii qemu-kvm 0.14.0+noroms-0ubuntu4.4~ppa1 Full virtualization on i386 and amd64 hardwa

$ dpkg -l | grep 0.8.8-1ubuntu6.5~ppa1
ii libvirt-bin 0.8.8-1ubuntu6.5~ppa1 the programs for the libvirt library
ii libvirt0 0.8.8-1ubuntu6.5~ppa1 library for interfacing with different virtu
ii python-libvirt 0.8.8-1ubuntu6.5~ppa1 libvirt Python bindings

When the 2.6.38-12 backport kernel plus the kvm clock patch finishes compiling I will install it and test to see if the 32-bit RHELish guest still hangs or not.

I will also test on regular x86_64 Natty soon.

The same VM host describe in comment #38 is now running 2.6.38-12.51.1~ppa1~lucid1 and the 32-bit RHEL-compatible VM guest does not hang while using 2 CPUs anymore.

The VM host is also running two Ubuntu 10.04 and one Ubuntu 11.04 VM guests OK as well.

On Wed Nov 2, 2011 the two VM hosts described in
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/882579/comments/5
are going to be rebooted with linux 2.6.38-12.51.1~ppa1~lucid1

Since I promised to test on Natty too:

VM host details:
AMD Phenom(tm) II X6 1100T Processor
$ lsb_release -sd
Ubuntu 11.04
$ uname -srvm
Linux 2.6.38-12-generic #51-Ubuntu SMP Wed Sep 28 14:27:32 UTC 2011 x86_64
$ dpkg -l | egrep 'qemu|libvirt|bios'
ii libvirt-bin 0.8.8-1ubuntu6.5 the programs for the libvirt library
ii libvirt0 0.8.8-1ubuntu6.5 library for interfacing with different virtualization systems
ii python-libvirt 0.8.8-1ubuntu6.5 libvirt Python bindings
ii qemu-arm-static 0.12.5+noroms-0ubuntu7.2 dummy transitional package for qemu-kvm-extras-static
ii qemu-common 0.14.0+noroms-0ubuntu4.4 qemu common functionality (bios, documentation, etc)
ii qemu-keymaps 0.14.50-2011.03-1-0ubuntu2 QEMU keyboard maps
ii qemu-kvm 0.14.0+noroms-0ubuntu4.4 Full virtualization on i386 and amd64 hardware
ii qemu-kvm-extras 0.14.50-2011.03-1-0ubuntu2 QEMU system and user mode emulation (transitional package)
ii qemu-kvm-extras-static 0.14.50-2011.03-1-0ubuntu2 QEMU static user mode emulation binaries (transitional package)
ii qemu-system 0.14.50-2011.03-1-0ubuntu2 QEMU full system emulation binaries
ii qemu-user 0.14.50-2011.03-1-0ubuntu2 QEMU user mode emulation binaries
ii qemu-user-static 0.14.50-2011.03-1-0ubuntu2 QEMU user mode emulation binaries (static version)
ii qemulator 0.5-3.2ubuntu1 a solution for easy setup and management of qemu
ii seabios 0.6.1.2-0ubuntu1 legacy BIOS implementation which can be used as a coreboot payload
ii vgabios 0.6c-2ubuntu3 VGA BIOS software for the Bochs and Qemu emulated VGA card

VM guest details:
32-bit RHEL 5 compatible Qemu/KVM guest, really Scientific Linux 5.7. The guest was built from SL.57.090911.DVD.i386.disc1.iso plus today's updates.
With 1 CPU it boots fine.
With 2 CPUs it hangs.
See attached screenshot of virt-manager plus virt-top. The disk and network activity in virt-top are always zeros when the VM is hung while the CPU activity is high. Note VM host has 6-cpu cores so 5.7% CPU is averaged across them. Regular /usr/bin/top shows /usr/bin/kvm getting 100% for this guest while it is hug.

linux_2.6.38-12.51.1~ppa1~natty1 (2.6.38-12 for Natty plus the patch this ticket is dealing with) has been uploaded to:
https://launchpad.net/~nutznboltz/+archive/kvm-clock-fix-for-2.6.38-on-lucid

When it compiles I'll repeat my test described in Comment #41 with the patch.

In comment #2 https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/795717/comments/2
gdahlman's observation:
{{ As an update I can boot when the guest is restricted to a single cpu ... }}

I think means that on the VM host if /usr/bin/kvm is forced to run on only one CPU via setting its processor affinity (via /usr/bin/taskset from the util-linux package) this issue does not occur. I can and will test that too soon.

If it's true it explains a bit of why Ubuntu VM guests only sometime hang. The /usr/bin/kvm process running them might only sometimes switch CPUs while the VM guest is rebooting while /usr/bin/kvm running 32-bit RHEL 5 with 2 CPUs might always switch CPUs while the VM guest is rebooting.

Also of note, 64-bit RHEL 5 does not use kvm-clock as a clocksource. For example:

64-bit RHEL 5
$ sudo cat /sys/devices/system/clocksource/clocksource0/available_clocksource
jiffies

32-bit RHEL 5
$ sudo cat /sys/devices/system/clocksource/clocksource0/available_clocksource
acpi_pm jiffies hpet tsc pit kvm-clock

Both x86 and x86_64 RHEL 6 use kvm-clock but also linux kernel 2.6.32

I think by not using kvm-clock 64-bit RHEL 5 is not 100% vulnerable the way 32-bit RHEL 5 is.

I could not reproduce booting the guest by setting processor affinity to a single host CPU.

1. Start 32-bit 2-CPU RHEL-5 compatible VM guest and pause on grub screen
2. locked it to host CPU #1
$ sudo taskset -p 0x00000001 3127
pid 3127's current affinity mask: 3f
pid 3127's new affinity mask: 1
3. Let the guest proceed booting from grub and it hung at a different time, right after displaying:
Time: kvm-clock clocksource has been installed.

gdahlman probably mean single guest CPU not single host CPU.

32-bit 2-CPU RHEL-5 compatible VM guest boots successfully on Natty with the patch which I installed into a kernel in this PPA:
https://launchpad.net/~nutznboltz/+archive/kvm-clock-fix-for-2.6.38-on-lucid

$ uname -srvm
Linux 2.6.38-12-generic #1~ppa2~natty1-Ubuntu SMP Tue Nov 1 22:41:06 UTC 2011 x86_64

The history of this from my perspective is I first encountered issues with the libvirt virsh command getting stuck or not being able to reboot VM guests successfully. I chased the issue down to the bug in this ticket only to see it slipped QA testing. I went back and repeated all the tests myself to be certain. Now it's time for me to open a new bug report to petition for an SRU.

Opened LP: #885170 to resubmit SRU request.

The servers which had the large outage incident on Wed Oct 26, 2011 described in
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/882579/comments/5
were rebooted last night Wed Nov 2, 2011 off of the patched backport kernel in
https://launchpad.net/~nutznboltz/+archive/kvm-clock-fix-for-2.6.38-on-lucid
and are doing fine.

Adding link to pending SRU page for convenience
http://people.canonical.com/~ubuntu-archive/pending-sru.html

Changed in linux (Ubuntu Natty):
status: Incomplete → Confirmed
tags: added: patch testcase
Changed in linux (Ubuntu Natty):
status: Confirmed → Fix Committed
Herton R. Krzesinski (herton) wrote :

@nutznboltz: the natty kernel with the fix readded is available in -proposed now (2.6.38-13.52).

Please test the kernel from -proposed and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
removed: verification-reverted-natty

How long does this take to show up?

$ curl http://archive.ubuntu.com/ubuntu/pool/main/l/linux/ 2> /dev/null | grep 38-13.52 | egrep -v 'diff|dsc' | sed 's/^.*deb">//' | sed 's;</a>.*$;;'
linux-doc_2.6.38-13.52_all.deb
linux-libc-dev_2.6.38-13.52_amd64.deb
linux-libc-dev_2.6.38-13.52_i386.deb
linux-source-2.6.38_2.6.38-13.52_all.deb
linux-tools-common_2.6.38-13.52_all.deb

Compare that to

$ curl http://archive.ubuntu.com/ubuntu/pool/main/l/linux/ 2> /dev/null | grep 38-12.51 | egrep -v 'diff|dsc' | sed 's/^.*deb">//' | sed 's;</a>.*$;;' | wc -l
166

Couldn't you at least start you "one week or die" countdown after it becomes possible to complete the task?

I also checked in universe

$ curl http://archive.ubuntu.com/ubuntu/pool/universe/l/linux/ 2> /dev/null | grep 38-12.51 | egrep -v 'diff|dsc' | sed 's/^.*deb">//' | sed 's;</a>.*$;;' | grep image.*generic

Clint Byrum (clint-fewbar) wrote :

Excerpts from nutznboltz's message of Fri Nov 11 16:53:52 UTC 2011:
> How long does this take to show up?
>
> $ curl http://archive.ubuntu.com/ubuntu/pool/main/l/linux/ 2> /dev/null | grep 38-13.52 | egrep -v 'diff|dsc' | sed 's/^.*deb">//' | sed 's;</a>.*$;;'
> linux-doc_2.6.38-13.52_all.deb
> linux-libc-dev_2.6.38-13.52_amd64.deb
> linux-libc-dev_2.6.38-13.52_i386.deb
> linux-source-2.6.38_2.6.38-13.52_all.deb
> linux-tools-common_2.6.38-13.52_all.deb

I'm not sure what you're waiting for, Herton suggested that 2.6.38-13.52
is the version with the fix.

@Clint

deb http://archive.ubuntu.com/ubuntu/ natty-proposed main restricted universe multiverse

$ apt-cache search 6.38-13
linux-backports-modules-cw-2.6.39-2.6.38-13-generic - compat-wireless Linux modules for version 2.6.38 on x86/x86_64
linux-backports-modules-cw-2.6.39-2.6.38-13-server - compat-wireless Linux modules for version 2.6.38 on x86_64
linux-backports-modules-cw-3.0.0-2.6.38-13-generic - compat-wireless Linux modules for version 2.6.38 on x86/x86_64
linux-backports-modules-cw-3.0.0-2.6.38-13-server - compat-wireless Linux modules for version 2.6.38 on x86_64
linux-backports-modules-net-2.6.38-13-generic - Linux ethernet modules for version 2.6.38 on x86/x86_64
linux-backports-modules-net-2.6.38-13-server - Linux ethernet modules for version 2.6.38 on x86_64
linux-headers-2.6.38-13 - Header files related to Linux kernel version 2.6.38
linux-headers-2.6.38-13-generic - Linux kernel headers for version 2.6.38 on x86/x86_64
linux-headers-2.6.38-13-server - Linux kernel headers for version 2.6.38 on x86_64
linux-headers-2.6.38-13-virtual - Linux kernel headers for version 2.6.38 on x86/x86_64
linux-headers-lbm-2.6.38-13-generic - Header files related to linux-backports-modules version 2.6.38
linux-headers-lbm-2.6.38-13-server - Header files related to linux-backports-modules version 2.6.38
linux-image-2.6.38-13-generic - Linux kernel image for version 2.6.38 on x86/x86_64
linux-image-2.6.38-13-server - Linux kernel image for version 2.6.38 on x86_64
linux-image-2.6.38-13-virtual - Linux kernel image for version 2.6.38 on x86/x86_64
linux-tools-2.6.38-13 - Linux kernel tools for version 2.6.38-13

$ sudo aptitude install linux-image-2.6.38-13-generic
The following NEW packages will be installed:
  linux-image-2.6.38-13-generic
0 packages upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 36.4 MB of archives. After unpacking 148 MB will be used.
Err http://archive.ubuntu.com/ubuntu/ natty-proposed/universe linux-image-2.6.38-13-generic amd64 2.6.38-13.52
  404 Not Found [IP: 91.189.92.176 80]
E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/universe/l/linux/linux-image-2.6.38-13-generic_2.6.38-13.52_amd64.deb: 404 Not Found [IP: 91.189.92.176 80]

I found a copy on 91.189.92.169

So I added to /etc/hosts this line:
91.189.92.169 archive.ubuntu.com

Now I can install it.

$ uname -srvi
Linux 2.6.38-13-generic #52-Ubuntu SMP Tue Nov 8 16:53:51 UTC 2011 x86_64
$ sudo virsh dumpxml opsi-1720 | grep cpu
  <vcpu>2</vcpu>
$ sudo virsh start opsi-1720
$ ssh root@opsi-1720
root@opsi-1720's password:
Last login: Fri Nov 11 17:30:14 2011
[root@opsi-1720 ~]# lsb_release -a
LSB Version: :core-4.0-ia32:core-4.0-noarch:graphics-4.0-ia32:graphics-4.0-noarch:printing-4.0-ia32:printing-4.0-noarch
Distributor ID: ScientificSL
Description: Scientific Linux SL release 5.7 (Boron)
Release: 5.7
Codename: Boron
[root@opsi-1720 ~]# grep QEM /proc/cpuinfo
model name : QEMU Virtual CPU version 0.14.0
model name : QEMU Virtual CPU version 0.14.0
[root@opsi-1720 ~]#

works for me

tags: added: verification-done-natty
removed: verification-needed-natty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-13.52

---------------
linux (2.6.38-13.52) natty-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #887379

  [ Konrad Rzeszutek Wilk ]

  * SAUCE: x86/paravirt: Partially revert "remove lazy mode in interrupts"
    - LP: #854050

  [ Ming Lei ]

  * SAUCE: [media] uvcvideo: Set alternate setting 0 on resume if the bus
    has been reset
    - LP: #816484

  [ Seth Forshee ]

  * SAUCE: acer-wmi: Add wireless quirk for Lenovo 3000 N200
    - LP: #857297

  [ Upstream Kernel Changes ]

  * Make TASKSTATS require root access, CVE-2011-2494
    - LP: #866021
    - CVE-2011-2494
  * proc: restrict access to /proc/PID/io, CVE-2011-2495
    - LP: #866025
    - CVE-2011-2495
  * proc: fix a race in do_io_accounting(), CVE-2011-2495
    - LP: #866025
    - CVE-2011-2495
  * staging: comedi: fix infoleak to userspace, CVE-2011-2909
    - LP: #869261
    - CVE-2011-2909
  * perf tools: do not look at ./config for configuration, CVE-2011-2905
    - LP: #869259
    - CVE-2011-2905
  * e1000e: workaround for packet drop on 82579 at 100Mbps
    - LP: #870127
  * eCryptfs: Remove unnecessary grow_file() function
    - LP: #745836
  * eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flag
    - LP: #745836
  * block: blkdev_get() should access ->bd_disk only after success
    - LP: #857170
  * ipv6: restore correct ECN handling on TCP xmit
    - LP: #872179
  * nl80211: fix overflow in ssid_len - CVE-2011-2517
    - LP: #869245
    - CVE-2011-2517
  * ksm: fix NULL pointer dereference in scan_get_next_rmap_item() -
    CVE-2011-2183
    - LP: #869227
    - CVE-2011-2183
  * NLM: Don't hang forever on NLM unlock requests - CVE-2011-2491
    - LP: #869237
    - CVE-2011-2491
  * KVM: fix kvmclock regression due to missing clock update
    - LP: #795717
  * drm/i915: don't enable plane, pipe and PLL prematurely
    - LP: #812638
  * drm/i915: add pipe/plane enable/disable functions
    - LP: #812638
 -- Herton Ronaldo Krzesinski <email address hidden> Mon, 07 Nov 2011 22:11:51 -0200

Changed in linux (Ubuntu Natty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers