Qemu crashes with chrooted m68k

Bug #1404690 reported by Michel Boaventura
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

I'm using qemu-m68k 2.2.0 to chroot into a m68k coldfire linux, which works fine on the coldfire machine.

I've been able to use binfmt_msc and used the above code to use qemu with strace:

#include <unistd.h>
#include <string.h>

int main(int argc, char **argv, char **envp) {
        char *newargv[argc + 4];

        newargv[0] = argv[0];
        newargv[1] = "-cpu";
        newargv[2] = "cfv4e";
        newargv[3] = "-strace";

        memcpy(&newargv[4], &argv[1], sizeof(*argv) * (argc - 1));
        newargv[argc + 3] = NULL;
        return execve("/usr/bin/qemu-m68k", newargv, envp);
}

Everything works fine. I can run bash, busybox, ash, but when I try to run a ls or just type an invalid command, I got the attached sequence of messages, which end like so:

11351 waitpid(-1,0xf6fffa00,0x3) = -1 errno=10 (No child processes)
qemu: fatal: Illegal instruction: 0000 @ f6fffa30
D0 = ffffffff A0 = f67dcf50 F0 = 0000000000000000 ( 0)
D1 = 0000000a A1 = f66e0898 F1 = 0000000000000000 ( 0)
D2 = f6fffaa8 A2 = f67df268 F2 = 0000000000000000 ( 0)
D3 = 00000000 A3 = 00000000 F3 = 0000000000000000 ( 0)
D4 = 00000008 A4 = 800026c4 F4 = 0000000000000000 ( 0)
D5 = 00000000 A5 = f67d98e0 F5 = 0000000000000000 ( 0)
D6 = f6fffaa8 A6 = f6fffa7c F6 = 0000000000000000 ( 0)
D7 = 00000002 A7 = f6fffa24 F7 = 0000000000000000 ( 0)
PC = f6fffa30 SR = 0000 ----- FPRESULT = 0
Aborted

How can I debug it further to try to figure out if this is a qemu issue or not? Thanks

Revision history for this message
Michel Boaventura (michel-boaventura) wrote :
Revision history for this message
Peter Maydell (pmaydell) wrote : Re: [Qemu-devel] [Bug 1404690] [NEW] Qemu crashes with chrooted m68k

On 21 December 2014 at 16:21, Michel Boaventura
<email address hidden> wrote:
> Everything works fine. I can run bash, busybox, ash, but when I try to
> run a ls or just type an invalid command, I got the attached sequence of
> messages, which end like so:
>
> 11351 waitpid(-1,0xf6fffa00,0x3) = -1 errno=10 (No child processes)
> qemu: fatal: Illegal instruction: 0000 @ f6fffa30
> D0 = ffffffff A0 = f67dcf50 F0 = 0000000000000000 ( 0)
> D1 = 0000000a A1 = f66e0898 F1 = 0000000000000000 ( 0)
> D2 = f6fffaa8 A2 = f67df268 F2 = 0000000000000000 ( 0)
> D3 = 00000000 A3 = 00000000 F3 = 0000000000000000 ( 0)
> D4 = 00000008 A4 = 800026c4 F4 = 0000000000000000 ( 0)
> D5 = 00000000 A5 = f67d98e0 F5 = 0000000000000000 ( 0)
> D6 = f6fffaa8 A6 = f6fffa7c F6 = 0000000000000000 ( 0)
> D7 = 00000002 A7 = f6fffa24 F7 = 0000000000000000 ( 0)
> PC = f6fffa30 SR = 0000 ----- FPRESULT = 0
> Aborted
>
> How can I debug it further to try to figure out if this is a qemu issue
> or not? Thanks

This is almost certainly a QEMU bug -- m68k/Coldfire is
unmaintained upstream right now.

I would start debugging this with QEMU's -d and -D options:
you can use "-d in_asm,exec,cpu -D logfile.out" to write
a lot of logging information about the guest code QEMU executes.
(This can be a huge volume, so best to use the shortest possible
reproducing command. Take out 'exec' if it's really too slow.)
Then you can see what the code the guest executed was and figure
out what happened. Alternatively you can try using qemu's
debug stub: pass QEMU "-g 1234" and then connect a coldfire gdb
using gdb's "target remote :1234" command. Either way, what you
want to find out is what exactly happened in the instructions
between the waitpid and the crash.

My current guess is that either we've messed up the waitpid
(seems unlikely, it's not very complicated), or we're not
getting signal handling right (much more complicated and
likely to have bugs): your crash happens at about the point
where the shell is due to receive a SIGCHLD for the child
process it spawned. If you send the shell a signal in some
other way (eg type ctrl-C at it, or send it a SIGINT from
outside QEMU) does it die in a similar way?

PS: an easier way to enable strace for linux-user in a binfmt-misc
chroot is to use the QEMU_STRACE environment variable. (All the
linux-user command line options have environment variable versions;
check --help.)

You can also just use a command like
  'chroot my-chroot /usr/bin/qemu-m68k-static -strace /bin/sh -c /bin/ls'
for a one-off command run.

-- PMM

Revision history for this message
Peter Maydell (pmaydell) wrote :

Oh, if you're able to put your chroot up for download somewhere so I can reproduce the problem, I can have a look at it for you. (I don't otherwise have a coldfire chroot or toolchain.)

Revision history for this message
Michel Boaventura (michel-boaventura) wrote :

Hi Peter,

Thanks for you time! I've attached my mini chroot environment. As you can see, it is very minimal, but enough to be chrooted and to test this. I will try your suggestions in a couple of days, but if you could please try it before, I will really appreciate.

Michel

Revision history for this message
Peter Maydell (pmaydell) wrote :

I have a fix for this (our code for setting up the signal return trampoline used the wrong types and only worked on 32 bit hosts).

I notice that /bin/ls can't ls directories (it seems ok with single files) but that's a different bug.

Revision history for this message
Peter Maydell (pmaydell) wrote :
Revision history for this message
Peter Maydell (pmaydell) wrote :

I've identified the cause of "ls" not returning any output, but I don't think we can fix it in QEMU.

This happens if the host fs is ext3 or ext4 on a 64 bit system. Here the "d_off" entry in a linux_dirent64 is actually a hashtable hash, and so can be a full 64 bits. Unfortunately the guest binary here is trying to convert getdents64() syscall return information into a dirent with only a 32 bit offset field, and so it (guest libc, I think) just ignores dirents with offsets >4GB, which is all of them.

Sadly although ext3/4 support an f_mode bit for "make hash offsets fit in 32 bit", this is only for the benefit of kernel internal APIs (it's used by NFS) and AFAICT can't be set by userspace. So I can't at the moment think of any way for QEMU to deal with this...

Revision history for this message
Michel Boaventura (michel-boaventura) wrote :

Hi Peter,

Thank you very much for your help, I really appreciate it. I've tested both your patch and your workaround to make ls work (I've created a xfs partition to put my image) and everything works greatly.

Merry Xmas.

Michel

Revision history for this message
Thomas Huth (th-huth) wrote :

Peter's patch had been included here:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=1669add752d9f2928
==> Fix released

Changed in qemu:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.