cpu soft lockup in Centos guest using KVM 0.14.1

Bug #1011925 reported by Michael Chapman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I'm running Openstack Diablo, I'm not sure if this is an issue with KVM or something else. The VM launches and after a short time this happens:

[9223372036.854787] BUG: soft lockup - CPU#2 stuck for 8589930346s! [swapper:0]
[9223372036.855730] Modules linked in: acpiphp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc video output input_polldev parport_pc lp parport nvram evbug psmouse serio_raw virtio_balloon snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_piix4 snd soundcore snd_page_alloc pcspkr virtio_blk 8139too virtio_pci virtio_ring virtio 8139cp mii floppy raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor
[9223372036.858776] Modules linked in: acpiphp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc video output input_polldev parport_pc lp parport nvram evbug psmouse serio_raw virtio_balloon snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_piix4 snd soundcore snd_page_alloc pcspkr virtio_blk 8139too virtio_pci virtio_ring virtio 8139cp mii floppy raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor
[9223372036.858776] Call Trace:
[9223372036.858776] [<ffffffff8027561c>] ? tick_nohz_restart_sched_tick+0xec/0x140
[9223372036.858776] [<ffffffff80210eb5>] cpu_idle+0x95/0xc0
[9223372036.858776] [<ffffffff80698f23>] start_secondary+0x9e/0xcb
[9223372036.858776] BUG: soft lockup - CPU#1 stuck for 8589934442s! [gzip:2310]
[9223372036.858776] Modules linked in: acpiphp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc video output input_polldev parport_pc lp parport nvram evbug psmouse serio_raw virtio_balloon snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_piix4 snd soundcore snd_page_alloc pcspkr virtio_blk 8139too virtio_pci virtio_ring virtio 8139cp mii floppy raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor
[9223372036.858776] Modules linked in: acpiphp ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc video output input_polldev parport_pc lp parport nvram evbug psmouse serio_raw virtio_balloon snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_piix4 snd soundcore snd_page_alloc pcspkr virtio_blk 8139too virtio_pci virtio_ring virtio 8139cp mii floppy raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear fbcon tileblit font bitblit softcursor
[9223372036.858776] Call Trace:
[9223372036.858776] [<ffffffff8024a58d>] ? default_wake_function+0xd/0x10
[9223372036.858776] [<ffffffff80255f41>] current_fs_time+0x11/0x30
[9223372036.858776] [<ffffffff8069e569>] ? _spin_lock+0x9/0x10
[9223372036.858776] [<ffffffff8030019e>] ? mnt_want_write+0x7e/0xb0
[9223372036.858776] [<ffffffff802fc3cd>] touch_atime+0xad/0x140
[9223372036.858776] [<ffffffff80240495>] ? __wake_up_sync+0x55/0x70
[9223372036.858776] [<ffffffff802f0011>] pipe_read+0x4b1/0x4c0
[9223372036.858776] [<ffffffff802e7961>] do_sync_read+0xf1/0x140
[9223372036.858776] [<ffffffff80268930>] ? autoremove_wake_function+0x0/0x40
[9223372036.858776] [<ffffffff803f5ee3>] ? apparmor_file_permission+0x23/0x30
[9223372036.858776] [<ffffffff803d07d1>] ? security_file_permission+0x11/0x20
[9223372036.858776] [<ffffffff802e8148>] vfs_read+0xc8/0x130
[9223372036.858776] [<ffffffff802e82a0>] sys_read+0x50/0x90
[9223372036.858776] [<ffffffff8021253a>] system_call_fastpath+0x16/0x1b

This is on ubuntu 11.10 with libvirt 0.9.2, kvm 0.14.1.

The guest is Centos 5 running 2.6.28-11-generic.

Robie Basak (racb)
Changed in kvm (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug.

To be sure, the kernel messages you are showing are on the host, not in the guest? (I don't see kvm in the list of modules)

Can you provide the kvm command line and/or libvirt xml for your guest?

Changed in kvm (Ubuntu):
status: New → Incomplete
Revision history for this message
Michael Chapman (michael-chapman-anu) wrote :

Serge,

I wasn't clear, sorry. The kernel messages are appearing on the guest and it becomes unresponsive.

Here is the libvirt.xml:

<domain type='kvm'>
    <name>instance-00000092</name>
    <memory>8388608</memory>
    <os>
            <type>hvm</type>
            <kernel>/var/lib/nova/instances/instance-00000092/kernel</kernel>
                <cmdline>root=/dev/vda console=ttyS0</cmdline>
                <initrd>/var/lib/nova/instances/instance-00000092/ramdisk</initrd>
    </os>
    <features>
        <acpi/>
    </features>
    <vcpu>4</vcpu>
    <devices>
        <disk type='file'>
            <driver type='qcow2'/>
            <source file='/var/lib/nova/instances/instance-00000092/disk'/>
            <target dev='vda' bus='virtio'/>
        </disk>
            <disk type='file'>
                <driver type='qcow2'/>
                <source file='/var/lib/nova/instances/instance-00000092/disk.local'/>
                <target dev='vdb' bus='virtio'/>
            </disk>

        <interface type='bridge'>
            <source bridge='br100'/>
            <mac address='02:16:3e:72:4f:bb'/>
            <filterref filter="nova-instance-instance-00000092-02163e724fbb">
                <parameter name="IP" value="10.0.0.27" />
                <parameter name="DHCPSERVER" value="10.0.0.44" />
            </filterref>
        </interface>

        <!-- The order is significant here. File must be defined first -->
        <serial type="file">
            <source path='/var/lib/nova/instances/instance-00000092/console.log'/>
            <target port='1'/>
        </serial>

        <console type='pty' tty='/dev/pts/2'>
            <source path='/dev/pts/2'/>
            <target port='0'/>
        </console>

        <serial type='pty'>
            <source path='/dev/pts/2'/>
            <target port='0'/>
        </serial>

        <graphics type='vnc' port='-1' autoport='yes' keymap='en-us' listen='0.0.0.0'/>
    </devices>
</domain>

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for the clarification. In that case, while it could be a bug in virtio (since gzip is locking up) or the qemu timer, it would seem most likely to be a bug in the centos kernel. lockup messages like that are quite common on bare metal as well. Is the centos image uptodate? Is this very reproducible - any particular command you can issue on the guest to always make it happen?

Changed in kvm (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Michael Chapman (michael-chapman-anu) wrote :

It's not reproducable, though I have seen it occur on multiple occasions on different machines. It was occurring when trying to run a euca-bundle command in the guest this time, which I guess was calling bzip at some point. Predictably it only occurs when the guest is under any load.

The CentOS kernel is almost certainly out-of-date so I'll update the image and get back to you if that doesn't fix it.

Thanks for your help.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for kvm (Ubuntu) because there has been no activity for 60 days.]

Changed in kvm (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.