QEMU

Qemu crashes with chrooted m68k

Bug #1404690 reported by Michel Boaventura on 2014-12-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	QEMU	Fix Released	Undecided	Unassigned

Bug Description

I'm using qemu-m68k 2.2.0 to chroot into a m68k coldfire linux, which works fine on the coldfire machine.

I've been able to use binfmt_msc and used the above code to use qemu with strace:

#include <unistd.h>
#include <string.h>

int main(int argc, char **argv, char **envp) {
char *newargv[argc + 4];

        newargv[0] = argv[0];
        newargv[1] = "-cpu";
        newargv[2] = "cfv4e";
        newargv[3] = "-strace";

        memcpy(&newargv[4], &argv[1], sizeof(*argv) * (argc - 1));
        newargv[argc + 3] = NULL;
        return execve("/usr/bin/qemu-m68k", newargv, envp);
}

Everything works fine. I can run bash, busybox, ash, but when I try to run a ls or just type an invalid command, I got the attached sequence of messages, which end like so:

11351 waitpid(-1,0xf6fffa00,0x3) = -1 errno=10 (No child processes)
qemu: fatal: Illegal instruction: 0000 @ f6fffa30
D0 = ffffffff A0 = f67dcf50 F0 = 0000000000000000 ( 0)
D1 = 0000000a A1 = f66e0898 F1 = 0000000000000000 ( 0)
D2 = f6fffaa8 A2 = f67df268 F2 = 0000000000000000 ( 0)
D3 = 00000000 A3 = 00000000 F3 = 0000000000000000 ( 0)
D4 = 00000008 A4 = 800026c4 F4 = 0000000000000000 ( 0)
D5 = 00000000 A5 = f67d98e0 F5 = 0000000000000000 ( 0)
D6 = f6fffaa8 A6 = f6fffa7c F6 = 0000000000000000 ( 0)
D7 = 00000002 A7 = f6fffa24 F7 = 0000000000000000 ( 0)
PC = f6fffa30 SR = 0000 ----- FPRESULT = 0
Aborted

How can I debug it further to try to figure out if this is a qemu issue or not? Thanks

Revision history for this message

Michel Boaventura (michel-boaventura) wrote on 2014-12-21:

Error log Edit (5.7 KiB, text/plain)

Revision history for this message

Peter Maydell (pmaydell) wrote on 2014-12-21: Re: [Qemu-devel] [Bug 1404690] [NEW] Qemu crashes with chrooted m68k

On 21 December 2014 at 16:21, Michel Boaventura
<email address hidden> wrote:
> Everything works fine. I can run bash, busybox, ash, but when I try to
> run a ls or just type an invalid command, I got the attached sequence of
> messages, which end like so:
>
> 11351 waitpid(-1,0xf6fffa00,0x3) = -1 errno=10 (No child processes)
> qemu: fatal: Illegal instruction: 0000 @ f6fffa30
> D0 = ffffffff A0 = f67dcf50 F0 = 0000000000000000 ( 0)
> D1 = 0000000a A1 = f66e0898 F1 = 0000000000000000 ( 0)
> D2 = f6fffaa8 A2 = f67df268 F2 = 0000000000000000 ( 0)
> D3 = 00000000 A3 = 00000000 F3 = 0000000000000000 ( 0)
> D4 = 00000008 A4 = 800026c4 F4 = 0000000000000000 ( 0)
> D5 = 00000000 A5 = f67d98e0 F5 = 0000000000000000 ( 0)
> D6 = f6fffaa8 A6 = f6fffa7c F6 = 0000000000000000 ( 0)
> D7 = 00000002 A7 = f6fffa24 F7 = 0000000000000000 ( 0)
> PC = f6fffa30 SR = 0000 ----- FPRESULT = 0
> Aborted
>
> How can I debug it further to try to figure out if this is a qemu issue
> or not? Thanks

This is almost certainly a QEMU bug -- m68k/Coldfire is
unmaintained upstream right now.

I would start debugging this with QEMU's -d and -D options:
you can use "-d in_asm,exec,cpu -D logfile.out" to write
a lot of logging information about the guest code QEMU executes.
(This can be a huge volume, so best to use the shortest possible
reproducing command. Take out 'exec' if it's really too slow.)
Then you can see what the code the guest executed was and figure
out what happened. Alternatively you can try using qemu's
debug stub: pass QEMU "-g 1234" and then connect a coldfire gdb
using gdb's "target remote :1234" command. Either way, what you
want to find out is what exactly happened in the instructions
between the waitpid and the crash.

My current guess is that either we've messed up the waitpid
(seems unlikely, it's not very complicated), or we're not
getting signal handling right (much more complicated and
likely to have bugs): your crash happens at about the point
where the shell is due to receive a SIGCHLD for the child
process it spawned. If you send the shell a signal in some
other way (eg type ctrl-C at it, or send it a SIGINT from
outside QEMU) does it die in a similar way?

PS: an easier way to enable strace for linux-user in a binfmt-misc
chroot is to use the QEMU_STRACE environment variable. (All the
linux-user command line options have environment variable versions;
check --help.)

You can also just use a command like
'chroot my-chroot /usr/bin/qemu-m68k-static -strace /bin/sh -c /bin/ls'
for a one-off command run.

-- PMM

On 21 December 2014 at 16:21, Michel Boaventura
<michel.boaventura@gmail.com> wrote:
> Everything works fine. I can run bash, busybox, ash, but when I try to
> run a ls or just type an invalid command, I got the attached sequence of
> messages, which end like so:
>
> 11351 waitpid(-1,0xf6fffa00,0x3) = -1 errno=10 (No child processes)
> qemu: fatal: Illegal instruction: 0000 @ f6fffa30
> D0 = ffffffff   A0 = f67dcf50   F0 = 0000000000000000 (           0)
> D1 = 0000000a   A1 = f66e0898   F1 = 0000000000000000 (           0)
> D2 = f6fffaa8   A2 = f67df268   F2 = 0000000000000000 (           0)
> D3 = 00000000   A3 = 00000000   F3 = 0000000000000000 (           0)
> D4 = 00000008   A4 = 800026c4   F4 = 0000000000000000 (           0)
> D5 = 00000000   A5 = f67d98e0   F5 = 0000000000000000 (           0)
> D6 = f6fffaa8   A6 = f6fffa7c   F6 = 0000000000000000 (           0)
> D7 = 00000002   A7 = f6fffa24   F7 = 0000000000000000 (           0)
> PC = f6fffa30   SR = 0000 ----- FPRESULT =            0
> Aborted
>
> How can I debug it further to try to figure out if this is a qemu issue
> or not? Thanks

This is almost certainly a QEMU bug -- m68k/Coldfire is
unmaintained upstream right now.

You can also just use a command like
  'chroot my-chroot /usr/bin/qemu-m68k-static -strace /bin/sh -c /bin/ls'
for a one-off command run.

-- PMM

Revision history for this message

Peter Maydell (pmaydell) wrote on 2014-12-21:

Oh, if you're able to put your chroot up for download somewhere so I can reproduce the problem, I can have a look at it for you. (I don't otherwise have a coldfire chroot or toolchain.)

Revision history for this message

Michel Boaventura (michel-boaventura) wrote on 2014-12-22:

Coldfire mini linux Edit (1.7 MiB, application/x-tar)

Hi Peter,

Thanks for you time! I've attached my mini chroot environment. As you can see, it is very minimal, but enough to be chrooted and to test this. I will try your suggestions in a couple of days, but if you could please try it before, I will really appreciate.

Michel

Revision history for this message

Peter Maydell (pmaydell) wrote on 2014-12-22:

I have a fix for this (our code for setting up the signal return trampoline used the wrong types and only worked on 32 bit hosts).

I notice that /bin/ls can't ls directories (it seems ok with single files) but that's a different bug.

Revision history for this message

Peter Maydell (pmaydell) wrote on 2014-12-22:

Patch fixing this: https://patchwork.ozlabs.org/patch/423460/

Revision history for this message

Peter Maydell (pmaydell) wrote on 2014-12-22:

I've identified the cause of "ls" not returning any output, but I don't think we can fix it in QEMU.

This happens if the host fs is ext3 or ext4 on a 64 bit system. Here the "d_off" entry in a linux_dirent64 is actually a hashtable hash, and so can be a full 64 bits. Unfortunately the guest binary here is trying to convert getdents64() syscall return information into a dirent with only a 32 bit offset field, and so it (guest libc, I think) just ignores dirents with offsets >4GB, which is all of them.

Sadly although ext3/4 support an f_mode bit for "make hash offsets fit in 32 bit", this is only for the benefit of kernel internal APIs (it's used by NFS) and AFAICT can't be set by userspace. So I can't at the moment think of any way for QEMU to deal with this...

Revision history for this message

Michel Boaventura (michel-boaventura) wrote on 2014-12-24:

Hi Peter,

Thank you very much for your help, I really appreciate it. I've tested both your patch and your workaround to make ls work (I've created a xfs partition to put my image) and everything works greatly.

Merry Xmas.

Michel

Revision history for this message

Thomas Huth (th-huth) wrote on 2017-01-11:

Peter's patch had been included here:
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=1669add752d9f2928
==> Fix released