Seabios causing hang when adding a virtio disk

Bug #931371 reported by Richard W.M. Jones
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Medium
seabios (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

Attaching a completely blank disk image to a virtual machine causes the following stack trace when loading the virtio block driver:

[ 1.106728] loop: module loaded
[ 1.125680] vda: unknown partition table
[ 1.789721] Switching to clocksource tsc
[ 8.373409] Clocksource tsc unstable (delta = 87849991 ns)
[ 8.374642] Switching to clocksource jiffies
[ 241.037694] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 241.037966] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 241.038424] swapper/0 D ffffffff81806240 0 1 0 0x00000000
[ 241.039028] ffff88001ed7d870 0000000000000046 0000000000000000 000000004dbb1e54
[ 241.039460] ffff88001ed7dfd8 ffff88001ed7dfd8 ffff88001ed7dfd8 0000000000013780
[ 241.039839] ffffffff81c0d020 ffff88001ed80000 ffff88001ed7d850 ffff88001f014040
[ 241.040220] Call Trace:
[ 241.041349] [<ffffffff811157a0>] ? __lock_page+0x70/0x70
[ 241.041712] [<ffffffff8165574f>] schedule+0x3f/0x60
[ 241.041798] [<ffffffff816557ff>] io_schedule+0x8f/0xd0
[ 241.041943] [<ffffffff811157ae>] sleep_on_page+0xe/0x20
[ 241.045386] [<ffffffff81655eca>] __wait_on_bit_lock+0x5a/0xc0
[ 241.045467] [<ffffffff81115797>] __lock_page+0x67/0x70
[ 241.049105] [<ffffffff81089e60>] ? autoremove_wake_function+0x40/0x40
[ 241.049416] [<ffffffff81116a20>] do_read_cache_page+0x160/0x180
[ 241.049656] [<ffffffff811ad630>] ? blkdev_write_begin+0x30/0x30
[ 241.049783] [<ffffffff81116a89>] read_cache_page_async+0x19/0x20
[ 241.049931] [<ffffffff81116a9e>] read_cache_page+0xe/0x20
[ 241.053417] [<ffffffff811e14bd>] read_dev_sector+0x2d/0x90
[ 241.053629] [<ffffffff811e2604>] adfspart_check_ICS+0x74/0x2d0
[ 241.053834] [<ffffffff81311f64>] ? snprintf+0x34/0x40
[ 241.053987] [<ffffffff811e2590>] ? rescan_partitions+0x300/0x300
[ 241.054210] [<ffffffff811e1c38>] check_partition+0xf8/0x200
[ 241.057501] [<ffffffff811e236a>] rescan_partitions+0xda/0x300
[ 241.057746] [<ffffffff811ae68c>] __blkdev_get+0x2bc/0x420
[ 241.057888] [<ffffffff810899f7>] ? bit_waitqueue+0x17/0xc0
[ 241.058000] [<ffffffff811ae84e>] blkdev_get+0x5e/0x1e0
[ 241.058080] [<ffffffff812f6b62>] register_disk+0x162/0x180
[ 241.058177] [<ffffffff812f6c24>] add_disk+0xa4/0x1b0
[ 241.061902] [<ffffffff8162925a>] virtblk_probe+0x43d/0x4e2
[ 241.064195] [<ffffffff814083f0>] ? virtblk_config_changed+0x30/0x30
[ 241.065436] [<ffffffff813a3020>] ? vp_find_vqs+0xc0/0xc0
[ 241.065529] [<ffffffff813a15e3>] virtio_dev_probe+0xe3/0x140
[ 241.065617] [<ffffffff813f08d8>] really_probe+0x68/0x190
[ 241.065701] [<ffffffff813f0b65>] driver_probe_device+0x45/0x70
[ 241.065789] [<ffffffff813f0c3b>] __driver_attach+0xab/0xb0
[ 241.065873] [<ffffffff813f0b90>] ? driver_probe_device+0x70/0x70
[ 241.065962] [<ffffffff813f0b90>] ? driver_probe_device+0x70/0x70
[ 241.066051] [<ffffffff813ef9cc>] bus_for_each_dev+0x5c/0x90
[ 241.066137] [<ffffffff813f069e>] driver_attach+0x1e/0x20
[ 241.069422] [<ffffffff813f02f0>] bus_add_driver+0x1a0/0x270
[ 241.073426] [<ffffffff81d2fcb8>] ? loop_init+0x12f/0x12f
[ 241.073641] [<ffffffff813f11a6>] driver_register+0x76/0x140
[ 241.073801] [<ffffffff81d2fcb8>] ? loop_init+0x12f/0x12f
[ 241.073951] [<ffffffff813a1840>] register_virtio_driver+0x20/0x30
[ 241.077413] [<ffffffff81d2fd0a>] init+0x52/0x7c
[ 241.077611] [<ffffffff81002040>] do_one_initcall+0x40/0x180
[ 241.077760] [<ffffffff81cf9ce9>] kernel_init+0xcf/0x14e
[ 241.077913] [<ffffffff81661d74>] kernel_thread_helper+0x4/0x10
[ 241.078061] [<ffffffff81cf9c1a>] ? start_kernel+0x3c7/0x3c7
[ 241.081352] [<ffffffff81661d70>] ? gs_change+0x13/0x13
[ 361.089003] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 361.089312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 361.090036] swapper/0 D ffffffff81806240 0 1 0 0x00000000
[ 361.090322] ffff88001ed7d870 0000000000000046 0000000000000000 000000004dbb1e54
[ 361.090716] ffff88001ed7dfd8 ffff88001ed7dfd8 ffff88001ed7dfd8 0000000000013780
[ 361.091024] ffffffff81c0d020 ffff88001ed80000 ffff88001ed7d850 ffff88001f014040
[ 361.091332] Call Trace:
[ 361.091570] [<ffffffff811157a0>] ? __lock_page+0x70/0x70
[ 361.091797] [<ffffffff8165574f>] schedule+0x3f/0x60
[ 361.091990] [<ffffffff816557ff>] io_schedule+0x8f/0xd0
[ 361.092189] [<ffffffff811157ae>] sleep_on_page+0xe/0x20
[ 361.092392] [<ffffffff81655eca>] __wait_on_bit_lock+0x5a/0xc0
[ 361.092702] [<ffffffff81115797>] __lock_page+0x67/0x70
[ 361.092852] [<ffffffff81089e60>] ? autoremove_wake_function+0x40/0x40
[ 361.092852] [<ffffffff81116a20>] do_read_cache_page+0x160/0x180
[ 361.092852] [<ffffffff811ad630>] ? blkdev_write_begin+0x30/0x30
[ 361.092945] [<ffffffff81116a89>] read_cache_page_async+0x19/0x20
[ 361.093390] [<ffffffff81116a9e>] read_cache_page+0xe/0x20
[ 361.093600] [<ffffffff811e14bd>] read_dev_sector+0x2d/0x90
[ 361.097020] [<ffffffff811e2604>] adfspart_check_ICS+0x74/0x2d0
[ 361.097194] [<ffffffff81311f64>] ? snprintf+0x34/0x40
[ 361.097328] [<ffffffff811e2590>] ? rescan_partitions+0x300/0x300
[ 361.097476] [<ffffffff811e1c38>] check_partition+0xf8/0x200
[ 361.097617] [<ffffffff811e236a>] rescan_partitions+0xda/0x300
[ 361.100852] [<ffffffff811ae68c>] __blkdev_get+0x2bc/0x420
[ 361.100956] [<ffffffff810899f7>] ? bit_waitqueue+0x17/0xc0
[ 361.101103] [<ffffffff811ae84e>] blkdev_get+0x5e/0x1e0
[ 361.101226] [<ffffffff812f6b62>] register_disk+0x162/0x180
[ 361.101328] [<ffffffff812f6c24>] add_disk+0xa4/0x1b0
[ 361.101461] [<ffffffff8162925a>] virtblk_probe+0x43d/0x4e2
[ 361.101677] [<ffffffff814083f0>] ? virtblk_config_changed+0x30/0x30
[ 361.108947] [<ffffffff813a3020>] ? vp_find_vqs+0xc0/0xc0
[ 361.109136] [<ffffffff813a15e3>] virtio_dev_probe+0xe3/0x140
[ 361.109238] [<ffffffff813f08d8>] really_probe+0x68/0x190
[ 361.109373] [<ffffffff813f0b65>] driver_probe_device+0x45/0x70
[ 361.109519] [<ffffffff813f0c3b>] __driver_attach+0xab/0xb0
[ 361.112917] [<ffffffff813f0b90>] ? driver_probe_device+0x70/0x70
[ 361.113131] [<ffffffff813f0b90>] ? driver_probe_device+0x70/0x70
[ 361.113290] [<ffffffff813ef9cc>] bus_for_each_dev+0x5c/0x90
[ 361.113604] [<ffffffff813f069e>] driver_attach+0x1e/0x20
[ 361.113733] [<ffffffff813f02f0>] bus_add_driver+0x1a0/0x270
[ 361.117064] [<ffffffff81d2fcb8>] ? loop_init+0x12f/0x12f
[ 361.117238] [<ffffffff813f11a6>] driver_register+0x76/0x140
[ 361.117398] [<ffffffff81d2fcb8>] ? loop_init+0x12f/0x12f
[ 361.117537] [<ffffffff813a1840>] register_virtio_driver+0x20/0x30
[ 361.117689] [<ffffffff81d2fd0a>] init+0x52/0x7c
[ 361.121097] [<ffffffff81002040>] do_one_initcall+0x40/0x180
[ 361.121325] [<ffffffff81cf9ce9>] kernel_init+0xcf/0x14e
[ 361.121538] [<ffffffff81661d74>] kernel_thread_helper+0x4/0x10
[ 361.121762] [<ffffffff81cf9c1a>] ? start_kernel+0x3c7/0x3c7
[ 361.124914] [<ffffffff81661d70>] ? gs_change+0x13/0x13

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

Kernel version is:
Linux tmpubuntu1204 3.2.0-12-generic #21-Ubuntu SMP Tue Jan 31 18:48:57 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

This is Ubuntu 12.04 alpha 2.

Revision history for this message
Fred van Zwieten (fvzwieten) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Do you know if this issue happened in a previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.3 kernel[1] (Not a kernel in the daily directory). Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag(Only that one tag, please leave the other tags). This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text.

If this bug is fixed by the mainline kernel, please add the following tag 'kernel-fixed-upstream-KERNEL-VERSION'. For example, if kernel version 3.3-rc2 fixed the issue, the tag would be: 'kernel-fixed-upstream-v3.3-rc2'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-rc2-precise/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key needs-upstream-testing precise
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

This didn't happen on older versions of Ubuntu (eg. 11.10).

I installed the following kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.3-rc3-precise/
and it *does* happen on this kernel so the flag should be
'kernel-bug-exists-upstream'.

tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :
Download full text (3.1 KiB)

FYI is the call stack from the mainline kernel. It's basically the same
as for the precise kernel.

[ 241.283078] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 241.283078] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 241.283078] swapper/0 D ffffffff8180c6c0 0 1 0 0x00000000
[ 241.283078] ffff88001ed69880 0000000000000046 ffff88001ed69fd8 0000000000013600
[ 241.283078] ffff88001ed68010 0000000000013600 0000000000013600 0000000000013600
[ 241.287078] ffff88001ed69fd8 0000000000013600 ffffffff81c0d020 ffff88001ed60000
[ 241.291079] Call Trace:
[ 241.291079] [<ffffffff8111e090>] ? __lock_page+0x70/0x70
[ 241.291079] [<ffffffff8165e82f>] schedule+0x3f/0x60
[ 241.291079] [<ffffffff8165e8dc>] io_schedule+0x8c/0xd0
[ 241.291079] [<ffffffff8111e09e>] sleep_on_page+0xe/0x20
[ 241.291079] [<ffffffff8165cc0a>] __wait_on_bit_lock+0x5a/0xc0
[ 241.291079] [<ffffffff8111e087>] __lock_page+0x67/0x70
[ 241.291079] [<ffffffff81075970>] ? autoremove_wake_function+0x40/0x40
[ 241.295079] [<ffffffff811b8630>] ? blkdev_write_begin+0x30/0x30
[ 241.295079] [<ffffffff811202f4>] do_read_cache_page+0x94/0x110
[ 241.295079] [<ffffffff811203b9>] read_cache_page_async+0x19/0x20
[ 241.295079] [<ffffffff811203ce>] read_cache_page+0xe/0x20
[ 241.299079] [<ffffffff8130230d>] read_dev_sector+0x2d/0x90
[ 241.299079] [<ffffffff813033aa>] adfspart_check_ICS+0x7a/0x290
[ 241.299079] [<ffffffff813228d4>] ? snprintf+0x34/0x40
[ 241.299079] [<ffffffff81303330>] ? check_partition+0x210/0x210
[ 241.299079] [<ffffffff81303224>] check_partition+0x104/0x210
[ 241.299079] [<ffffffff81302aba>] rescan_partitions+0xda/0x310
[ 241.299079] [<ffffffff8165f54e>] ? _raw_spin_lock+0xe/0x20
[ 241.303079] [<ffffffff811b9184>] __blkdev_get+0x2d4/0x450
[ 241.303079] [<ffffffff811b935c>] blkdev_get+0x5c/0x210
[ 241.303079] [<ffffffff81301017>] register_disk+0x177/0x1a0
[ 241.303079] [<ffffffff813010e6>] add_disk+0xa6/0x1b0
[ 241.307080] [<ffffffff8164879a>] virtblk_probe+0x44c/0x4f1
[ 241.307080] [<ffffffff813b5423>] virtio_dev_probe+0xd3/0x120
[ 241.307080] [<ffffffff81407bb8>] really_probe+0x68/0x190
[ 241.307080] [<ffffffff81407d25>] driver_probe_device+0x45/0x70
[ 241.315080] [<ffffffff81407deb>] __driver_attach+0x9b/0xa0
[ 241.315080] [<ffffffff81407d50>] ? driver_probe_device+0x70/0x70
[ 241.315080] [<ffffffff81406308>] bus_for_each_dev+0x68/0x90
[ 241.319080] [<ffffffff81407a0e>] driver_attach+0x1e/0x20
[ 241.323081] [<ffffffff81407560>] bus_add_driver+0xd0/0x270
[ 241.323081] [<ffffffff81d200f5>] ? max_loop_setup+0x1a/0x1a
[ 241.323081] [<ffffffff814084a0>] driver_register+0x80/0x150
[ 241.323081] [<ffffffff81d200f5>] ? max_loop_setup+0x1a/0x1a
[ 241.327081] [<ffffffff813b5590>] register_virtio_driver+0x20/0x30
[ 241.327081] [<ffffffff81d2014c>] init+0x57/0x81
[ 241.327081] [<ffffffff81002042>] do_one_initcall+0x42/0x180
[ 241.331081] [<ffffffff81ce9681>] kernel_init+0xd2/0x156
[ 241.335081] [<ffffffff81668fe4>] kernel_thread_helper+0x4/0x10
[ 241.335081] [<ffffffff81ce95af>] ? parse_early_options+0x20/0x20
[ 241.335081] [<ffffffff81668fe0>] ...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report at bugzilla.kernel.org [1]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

If you are comfortable with opening a bug upstream, It would be great if you can report back the upstream bug number in this bug report. That will allow us to link this bug to the upstream report.

[1] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :
Changed in linux:
importance: Unknown → Medium
status: Unknown → Confirmed
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote : Re: QEMU 1.0 causing hang when adding a virtio disk

I think this is a bug in the Ubuntu qemu-kvm package, not in the kernel. (How to change the package that this bug is reported against?)

I tried compiling the Ubuntu qemu-kvm package without any patches (except define_AT_EMPTY_PATH.patch which is required in order for the package to compile). However the bug was still reproducible. So the bug is not caused by any Ubuntu patch.

I compiled qemu from git tag 'v1.0' => bug was reproducible.

I compiled qemu from git HEAD (currently da12872a097) => bug was NOT reproducible.

This proves that some commit in that range fixes the problem, but I haven't yet worked out which one it is.

tags: removed: kernel-bug-exists-upstream kernel-da-key precise
summary: - ADFS partition checking code hangs on empty virtio disk
+ QEMU 1.0 causing hang when adding a virtio disk
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

There are 1912 commits to check!

None of them is obviously a fix for a hanging bug in virtio/block code.

Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

I've identified the commit which fixes the bug, which is:

commit 41bd360325168b3c1db78eb7311420a1607d521f
Author: Jan Kiszka <email address hidden>
Date: Sun Jan 15 17:48:25 2012 +0100

    seabios: Update to release 1.6.3.1

    User visible changes in seabios:
     - Probe HPET existence (fix for -no-hpet)
     - Probe PCI existence (fix for -machine isapc)
     - usb: fix boot paths

    Signed-off-by: Jan Kiszka <email address hidden>

diff --git a/pc-bios/bios.bin b/pc-bios/bios.bin
index bd9ad0e..41e2b38 100644
Binary files a/pc-bios/bios.bin and b/pc-bios/bios.bin differ
diff --git a/roms/seabios b/roms/seabios
index 8e30147..80d11e8 160000
--- a/roms/seabios
+++ b/roms/seabios
@@ -1 +1 @@
-Subproject commit 8e301472e324b6d6496d8b4ffc66863e99d7a505
+Subproject commit 80d11e8577bf03e98f2eb1b0cb3a281ab2879c9e

So in fact the bug is in seabios, not in qemu-kvm.

summary: - QEMU 1.0 causing hang when adding a virtio disk
+ Seabios causing hang when adding a virtio disk
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

Luckily my first guess was correct. The following patch needs to be applied to SeaBIOS:

commit 3c5fcec00ce1317cda56d549259550fcc018c834
Author: Kevin O'Connor <email address hidden>
Date: Sat Oct 1 12:35:32 2011 -0400

    Fix alignment bug in pci_bios_init_root_regions().

    If there are no memory allocations for a given type then the "max" bar
    size is zero. However, ALIGN_DOWN does not handle an alignment of
    zero properly. Catch and handle the zero case.

    Signed-off-by: Kevin O'Connor <email address hidden>

diff --git a/src/pciinit.c b/src/pciinit.c
index a857da0..0d8758e 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -536,7 +536,7 @@ static void pci_bios_init_bus_bases(struct pci_bus *bus)
     }
 }

-#define ROOT_BASE(top, sum, align) ALIGN_DOWN((top)-(sum),(align))
+#define ROOT_BASE(top, sum, max) ALIGN_DOWN((top)-(sum),(max) ?: 1)

 static int pci_bios_init_root_regions(u32 start, u32 end)
 {

I tested this by applying this patch to seabios 0.6.2-0ubuntu2 and it completely cures the problem.

Changed in linux:
status: Confirmed → Invalid
affects: linux (Ubuntu) → seabios (Ubuntu)
Revision history for this message
Richard W.M. Jones (rich-annexia) wrote :

Can we fix this? It's causing lots of people to have problems with libguestfs, and will cause random failures for *anyone* using virtio disks. I would say this bug should be urgent.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.