Sparc guest assert error

Bug #671831 reported by Nigel Horne
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Stefan Hajnoczi

Bug Description

The latest version in git (d33ea50a958b2e050d2b28e5f17e3b55e91c6d74) crashes with an assert error when booting a Sparc/Linux guest.

The last time I tried it (about a week ago) it worked fine. Yesterdai, I did a git pull, make clean, reran configure and compiled.

Host OS: Debian Linux/x86_64 5.0
C Compiler: 4.4.5
Guest OS: Linux/Sparc (2.4)
Command Line: qemu-system-sparc -hda ~njh/qemu/sparc/debian.img -nographic -m 128
Build Configure: ./configure --enable-linux-aio --enable-io-thread --enable-kvm
GIT commit: d33ea50a958b2e050d2b28e5f17e3b55e91c6d74

Output:

Adding Swap: 122532k swap-space (priority -1)
.
Will now check root file system:fsck 1.40-WIP (14-Nov-2006)
[/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda2
qemu-system-sparc: /home/njh/src/qemu/hw/scsi-disk.c:201: scsi_read_data: Assertion `r->req.aiocb == ((void *)0)' failed.

It crashes in the same place every time.

(gdb) thread apply all bt:

Thread 3 (Thread 17643):
#0 0x00007f4db21bc8d3 in select () at ../sysdeps/unix/syscall-template.S:82
#1 0x00000000004d02c4 in main_loop_wait (nonblocking=<value optimized out>)
    at /home/njh/src/qemu/vl.c:1246
#2 0x00000000004d0e57 in main_loop (argc=<value optimized out>,
    argv=<value optimized out>, envp=<value optimized out>)
    at /home/njh/src/qemu/vl.c:1309
#3 main (argc=<value optimized out>, argv=<value optimized out>,
    envp=<value optimized out>) at /home/njh/src/qemu/vl.c:2999

Thread 2 (Thread 17645):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:211
#1 0x000000000042450b in cond_timedwait (unused=<value optimized out>)
    at posix-aio-compat.c:104
#2 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:325
#3 0x00007f4db3b818ba in start_thread (arg=<value optimized out>)
    at pthread_create.c:300
#4 0x00007f4db21c302d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#5 0x0000000000000000 in ?? ()
Current language: auto
The current source language is "auto; currently asm".

Thread 1 (Thread 17644):
#0 0x00007f4db2126165 in *__GI_raise (sig=<value optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007f4db2128f70 in *__GI_abort () at abort.c:92
#2 0x00007f4db211f2b1 in *__GI___assert_fail (
    assertion=0x52690a "r->req.aiocb == ((void *)0)",
    file=<value optimized out>, line=201, function=0x527480 "scsi_read_data")
    at assert.c:81
#3 0x000000000044f363 in scsi_read_data (d=<value optimized out>, tag=0)
    at /home/njh/src/qemu/hw/scsi-disk.c:201
#4 0x00000000004ebd6c in esp_do_dma (s=0x20679d0)
    at /home/njh/src/qemu/hw/esp.c:377
#5 0x00000000004ec781 in handle_ti (opaque=0x20679d0,
    addr=<value optimized out>, val=<value optimized out>)
    at /home/njh/src/qemu/hw/esp.c:443
#6 esp_mem_writeb (opaque=0x20679d0, addr=<value optimized out>,
    val=<value optimized out>) at /home/njh/src/qemu/hw/esp.c:595
#7 0x0000000041b2d971 in ?? ()
#8 0xffffffffffffffff in ?? ()
#9 0x00000000031ad000 in ?? ()
#10 0x0000000301adfa20 in ?? ()
#11 0x0000100000000007 in ?? ()
#12 0x00007f4daf80e8a0 in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x0000000000000000 in ?? ()

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Qemu-devel] [Bug 671831] [NEW] Sparc guest assert error

On Sat, Nov 6, 2010 at 1:24 PM, Nigel Horne <email address hidden> wrote:
> Public bug reported:
>
> The latest version in git (d33ea50a958b2e050d2b28e5f17e3b55e91c6d74)
> crashes with an assert error when booting a Sparc/Linux guest.
[...]
> Output:
>
> Adding Swap: 122532k swap-space (priority -1)
> .
> Will now check root file system:fsck 1.40-WIP (14-Nov-2006)
> [/sbin/fsck.ext3 (1) -- /] fsck.ext3 -a -C0 /dev/sda2
> qemu-system-sparc: /home/njh/src/qemu/hw/scsi-disk.c:201: scsi_read_data: Assertion `r->req.aiocb == ((void *)0)' failed.

Kevin,
The assert I suggested in your recent scsi-disk patch series has
triggered. I need to study the scsi-disk.c code more to understand
how to solve this.

Stefan

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

Hi Nigel,
Is there a disk image available to reproduce this bug? I searched for
Debian SPARC 2.4-based disk images but wasn't able to find one.

If it's not easy to share your disk image, could you please test this
QEMU tree which backports the assert:

http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert

You can grab the code like this:
git clone -b scsi_assert git://repo.or.cz/qemu/stefanha.git

If the assert triggers in that world then data corruption was
previously possible but hidden (a SCSI request has a one data buffer
and concurrent reads are being issued to the same buffer!). This
would mean an existing bug that needs to be fixed has been exposed.

If the assert doesn't trigger, then the issue was introduced recently
and we can dig more into that.

Thanks!
Stefan

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

On 11/08/2010 05:42 AM, Stefan Hajnoczi wrote:
> Hi Nigel,
> Is there a disk image available to reproduce this bug? I searched for
> Debian SPARC 2.4-based disk images but wasn't able to find one.
>
I got the image http://wiki.qemu.org/Download. It was sometime ago and
it may no longer be there - the image on that site now mentions Sparc
2.6, I guess 2.4 has been removed.

I have no means for you to take a copy of my image. Sorry.
> If it's not easy to share your disk image, could you please test this
> QEMU tree which backports the assert:
>
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert
>

Will do - thanks for making it available. It may take me a bit of time
to get around to it, but I will do so as soon as I can.
> You can grab the code like this:
> git clone -b scsi_assert git://repo.or.cz/qemu/stefanha.git
>

-Nigel

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

Stefan,
> If it's not easy to share your disk image, could you please test this
> QEMU tree which backports the assert:
>
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert
>
> You can grab the code like this:
> git clone -b scsi_assert git://repo.or.cz/qemu/stefanha.git
>
>
That retrieves a directory called scsi_assert which is empty. Are you
expecting that? Should I therefore use the directory in there called
stehana?

Regards,

-Nigel

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Mon, Nov 8, 2010 at 2:04 PM, Nigel Horne <email address hidden> wrote:
>> You can grab the code like this:
>> git clone -b scsi_assert git://repo.or.cz/qemu/stefanha.git
>>
>>
> That retrieves a directory called scsi_assert which is empty.  Are you
> expecting that?  Should I therefore use the directory in there called
> stehana?

Yes, 'stefanha' is the source tree directory.

When I do that git clones to a directory called 'stefanha', which
contains the QEMU source tree from the scsi_assert branch. But you
can also use following alternative if your git version does not
support -b <branch-name>:

git clone git://repo.or.cz/qemu/stefanha.git
cd stefanha
git checkout origin/scsi_assert

Stefan

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

I just gave the SPARC Linux 2.6 image from qemu.org a spin on sun4m but no luck:

sparc-softmmu/qemu-system-sparc -kernel
'/tmp/sparc-test/vmlinux-2.6.11+tcx' -initrd
'/tmp/sparc-test/linux.img' -append "root=/dev/ram" -drive
if=scsi,file=test.raw,cache=none

The kernel correctly notices the sda device and detects partitions (so
it is doing disk reads). There is no assertion error so this problem
may be specific to the Linux 2.4 ESP driver.

Stefan

Revision history for this message
blueswirl (blauwirbel) wrote :

On Mon, Nov 8, 2010 at 2:36 PM, Stefan Hajnoczi <email address hidden> wrote:
> I just gave the SPARC Linux 2.6 image from qemu.org a spin on sun4m but no luck:
>
> sparc-softmmu/qemu-system-sparc -kernel
> '/tmp/sparc-test/vmlinux-2.6.11+tcx' -initrd
> '/tmp/sparc-test/linux.img' -append "root=/dev/ram" -drive
> if=scsi,file=test.raw,cache=none
>
> The kernel correctly notices the sda device and detects partitions (so
> it is doing disk reads).  There is no assertion error so this problem
> may be specific to the Linux 2.4 ESP driver.

At least Gentoo 2004.1 (Linux 2.4.25-sparc-r1) live CD works.

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

Stefan,
> You can grab the code like this:
> git clone -b scsi_assert git://repo.or.cz/qemu/stefanha.git
>
> If the assert triggers in that world then data corruption was
> previously possible but hidden (a SCSI request has a one data buffer
> and concurrent reads are being issued to the same buffer!). This
> would mean an existing bug that needs to be fixed has been exposed
>

> If the assert doesn't trigger, then the issue was introduced recently
> and we can dig more into that.
>

I built your tree - it does not trigger the assertion. Linux/sparc/2.4
boots and seems to run fine.

Thanks for your help.

-Nigel

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Tue, Nov 9, 2010 at 12:56 AM, Nigel Horne <email address hidden> wrote:
> Stefan,
>> You can grab the code like this:
>> git clone -b scsi_assert git://repo.or.cz/qemu/stefanha.git
>>
>> If the assert triggers in that world then data corruption was
>> previously possible but hidden (a SCSI request has a one data buffer
>> and concurrent reads are being issued to the same buffer!).  This
>> would mean an existing bug that needs to be fixed has been exposed
>>
>
>> If the assert doesn't trigger, then the issue was introduced recently
>> and we can dig more into that.
>>
>
> I built your tree - it does not trigger the assertion.  Linux/sparc/2.4
> boots and seems to run fine.

Thanks for testing. This narrows down the problem to just a few commits.

Would it be possible to share your the kernel image (vmlinuz) and
initrd file? That's all we need to reproduce the bug.

Stefan

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

Stefan,
> Would it be possible to share your the kernel image (vmlinuz) and
> initrd file? That's all we need to reproduce the bug.
>

Sure - how do I create them from the Image file I have? (I don't use an
external kernel image and initrd file to boot)
> Stefan
>
>
-Nigel

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Tue, Nov 9, 2010 at 1:28 PM, Nigel Horne <email address hidden> wrote:
> Stefan,
>> Would it be possible to share your the kernel image (vmlinuz) and
>> initrd file?  That's all we need to reproduce the bug.
>>
>
> Sure - how do I create them from the Image file I have?  (I don't use an
> external kernel image and initrd file to boot)

$ /sbin/fdisk -lu ~njh/qemu/sparc/debian.img
...hopefully this displays the slices/partitions...
                       Device Boot Start End Blocks
 Id System
test.raw1 * 2048 1026047 512000 83 Linux

$ mount -o loop,offset=$((2048 * 512)) /mnt
(The offset is calculated by taking the start block number from fdisk
and multiplying it by 512 bytes)

$ ls /mnt/boot
...there should be a vmlinuz and initrd, you could check the silo.conf
or other boot configuration if you don't know the exact kernel/initrd
filenames.

$ cp /mnt/boot/{vmlinuz,initrd} /tmp
$ umount /mnt

Stefan

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

Stefan,
> $ /sbin/fdisk -lu ~njh/qemu/sparc/debian.img
> ...hopefully this displays the slices/partitions...
> Device Boot Start End Blocks
> Id System
> test.raw1 * 2048 1026047 512000 83 Linux
>
> $ mount -o loop,offset=$((2048 * 512)) /mnt
> (The offset is calculated by taking the start block number from fdisk
> and multiplying it by 512 bytes)
>
> $ ls /mnt/boot
> ...there should be a vmlinuz and initrd, you could check the silo.conf
> or other boot configuration if you don't know the exact kernel/initrd
> filenames.
>
> $ cp /mnt/boot/{vmlinuz,initrd} /tmp
> $ umount /mnt
>

Sadly that doesn't work:

njh@compaq:~/qemu/sparc$ /sbin/fdisk -lu debian.img

Disk debian.img: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders, total 0 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk debian.img doesn't contain a valid partition table

But, it has occurred to me, - I boot using your code and get
/boot/vmlinuz and /boot/initrd.img that way. Would that be OK?

> Stefan
>
>
-Nigel

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Tue, Nov 9, 2010 at 5:07 PM, Nigel Horne <email address hidden> wrote:
> Stefan,
>> $ /sbin/fdisk -lu ~njh/qemu/sparc/debian.img
>> ...hopefully this displays the slices/partitions...
>>                         Device Boot      Start         End      Blocks
>>   Id  System
>> test.raw1   *        2048     1026047      512000   83  Linux
>>
>> $ mount -o loop,offset=$((2048 * 512)) /mnt
>> (The offset is calculated by taking the start block number from fdisk
>> and multiplying it by 512 bytes)
>>
>> $ ls /mnt/boot
>> ...there should be a vmlinuz and initrd, you could check the silo.conf
>> or other boot configuration if you don't know the exact kernel/initrd
>> filenames.
>>
>> $ cp /mnt/boot/{vmlinuz,initrd} /tmp
>> $ umount /mnt
>>
>
> Sadly that doesn't work:
>
> njh@compaq:~/qemu/sparc$ /sbin/fdisk -lu debian.img
>
> Disk debian.img: 0 MB, 0 bytes
> 255 heads, 63 sectors/track, 0 cylinders, total 0 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 512 bytes
> Disk identifier: 0x00000000
>
> Disk debian.img doesn't contain a valid partition table

fdisk doesn't like the SPARC partition table. I used to have tools to
deal with this but have forgotten which ones they were.

> But, it has occurred to me, - I boot using your code and get
> /boot/vmlinuz and /boot/initrd.img that way.  Would that be OK?

Sure :).

Stefan

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

initrd

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

vmlinuz

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

>> But, it has occurred to me, - I boot using your code and get
>> /boot/vmlinuz and /boot/initrd.img that way. Would that be OK?
>>
Done.

-Nigel

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Qemu-devel] [Bug 671831] Re: Sparc guest assert error
Download full text (5.0 KiB)

Thanks for providing the kernel and initrd. Unfortunately I wasn't
able to get them far enough to trigger the assert. More on that at
the bottom of this message but in the meantime I looked over the
relevant commits and spotted an issue with the assert.

Please try this branch:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert_moved

That branch contains a small code change that moves the assert later
into the read code because we're doing the assert too early.
Hopefully this fixes the issue for you.

Regarding the kernel and initrd, I'm unable to get it far enough to
fsck, which is where you hit
the assertion. Either kernel parameters need to be set or the image depends on
a root filesystem (which I don't have).

$ sparc-softmmu/qemu-system-sparc -m 128 -hda test.raw -kernel
~/sparc-images/nigel/vmlinuz -initrd ~/sparc-images/nigel/initrd.img
-nographic

Configuration device id QEMU version 1 machine id 32
CPUs: 1 x FMI,MB86904
UUID: 00000000-0000-0000-0000-000000000000
Welcome to OpenBIOS v1.0 built on Aug 26 2010 17:52
  Type 'help' for detailed information
[sparc] Kernel already loaded
switching to new context:
PROMLIB: obio_ranges 1
bootmem_init: Scan sp_banks, init_bootmem(spfn[20b],bpfn[20b],mlpfn[7f18])
free_bootmem: base[0] size[7f18000]
reserve_bootmem: base[800000] size[244000]
reserve_bootmem: base[0] size[20b000]
reserve_bootmem: base[20b000] size[fe4]
Booting Linux...
mem_init: Calling free_all_bootmem().
PROMLIB: Sun Boot Prom Version 3 Revision 2
Linux version 2.4.27-4-sparc32 (pbuilder@sparc) (gcc version 3.3.5
(Debian 1:3.3.5-13)) #1 Tue Mar 4 08:22:06 UTC 2008
ARCH: SUN4M
TYPE: SPARCstation 5
Ethernet address: 52:54:0:12:34:56
Boot time fixup v1.6. 4/Mar/98 Jakub Jelinek (<email address hidden>).
Patching kernel for srmmu[Fujitsu TurboSparc]/iommu
On node 0 totalpages: 31432
zone(0): 32536 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Found CPU 0 <node=ffd73e14,mid=0>
Found 1 CPU prom device tree node(s).
Power off control detected.
Kernel command line:
Calibrating delay loop... 222.82 BogoMIPS
Memory: 122120k available (1436k kernel code, 220k data, 128k init, 0k
highmem) [f0000000,07f18000]
Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
Inode cache hash table entries: 8192 (order: 4, 65536 bytes)
Mount cache hash table entries: 512 (order: 0, 4096 bytes)
Buffer cache hash table entries: 4096 (order: 2, 16384 bytes)
Page-cache hash table entries: 32768 (order: 5, 131072 bytes)
POSIX conformance testing by UNIFIX
IOMMU: impl 0 vers 5 page table at f05c0000 of size 262144 bytes
sbus0: Clock 21.1250 MHz
dma0: Revision 2
dma1: Revision 2
Sparc Zilog8530 serial driver version 1.68.2.2
Sun Mouse-Systems mouse driver version 1.00
tty00 at 0xffeab004 (irq = 44) is a Zilog8530
tty01 at 0xffeab000 (irq = 44) is a Zilog8530
tty02 at 0xffeac004 (irq = 44) is a Zilog8530
tty03 at 0xffeac000 (irq = 44) is a Zilog8530
keyboard: not present
Console: ttyS0 (Zilog8530)
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
ioremap: done with statics, switching to malloc
apc: power management initialized
Starting kswapd
VFS: Disk quotas vdquot_6.5.1
d...

Read more...

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

On 11/11/2010 09:17 AM, Stefan Hajnoczi wrote:
> Thanks for providing the kernel and initrd. Unfortunately I wasn't
> able to get them far enough to trigger the assert. More on that at
> the bottom of this message but in the meantime I looked over the
> relevant commits and spotted an issue with the assert.
>
> Please try this branch:
> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert_moved
>

I tried lots of different ways to get your code, but failed. The one
closest to the instructions that you gave me last time failed with this:

njh@compaq:~/tmp/foo2$ git clone -b scsi_assert_moved
http://repo.or.cz/w/qemu/stefanha.git/
Cloning into stefanha...
warning: Remote branch scsi_assert_moved not found in upstream origin,
using HEAD instead
warning: remote HEAD refers to nonexistent ref, unable to checkout.

Regards,

-Nigel

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Thu, Nov 11, 2010 at 5:00 PM, Nigel Horne <email address hidden> wrote:
> On 11/11/2010 09:17 AM, Stefan Hajnoczi wrote:
>> Thanks for providing the kernel and initrd.  Unfortunately I wasn't
>> able to get them far enough to trigger the assert.  More on that at
>> the bottom of this message but in the meantime I looked over the
>> relevant commits and spotted an issue with the assert.
>>
>> Please try this branch:
>> http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert_moved
>>
>
> I tried lots of different ways to get your code, but failed.  The one
> closest to the instructions that you gave me last time failed with this:
>
> njh@compaq:~/tmp/foo2$ git clone -b scsi_assert_moved
> http://repo.or.cz/w/qemu/stefanha.git/
> Cloning into stefanha...
> warning: Remote branch scsi_assert_moved not found in upstream origin,
> using HEAD instead
> warning: remote HEAD refers to nonexistent ref, unable to checkout.

Try:
git clone -b scsi_assert_moved git://repo.or.cz/qemu/stefanha.git

Sorry for that. You can get the repo clone info by clicking
"qemu/stefanha.git" at the top of the
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/scsi_assert_moved
page, the git:// URI is usually the best one to use.

Stefan

Revision history for this message
Nigel Horne (njh-bandsman) wrote :

Stefan,

> Try:
> git clone -b scsi_assert_moved git://repo.or.cz/qemu/stefanha.git
>

I have tried that branch. I get no assertion failure with it.

Thanks,

-Nigel Horne

Revision history for this message
Stefan Hajnoczi (stefanha) wrote :

On Thu, Nov 11, 2010 at 10:45 PM, Nigel Horne <email address hidden> wrote:
> Stefan,
>
>> Try:
>> git clone -b scsi_assert_moved git://repo.or.cz/qemu/stefanha.git
>>
>
> I have tried that branch.  I get no assertion failure with it.

Great, thank you. I will submit the patch for qemu.git.

Stefan

Changed in qemu:
status: New → In Progress
assignee: nobody → Stefan Hajnoczi (stefanha)
Revision history for this message
Nigel Horne (njh-bandsman) wrote :

It's been fixed now, thanks. There seems to be no way to close this bug, but you can assume, now, that it is closed.

Thanks for your help in tracking it down and fixing it quickly.

Changed in qemu:
status: In Progress → Fix Released
Revision history for this message
Nigel Horne (njh-bandsman) wrote :

Found out how to change the bug's status. I've marked it as fixed. Thanks again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.