qemu-arm segfaults executing msgmerge (gettext)

Reported by Jan-Simon Möller on 2010-10-30
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Linaro QEMU
Undecided
Unassigned
QEMU
Undecided
Unassigned

Bug Description

upstream qemu.git revision b45e9c05dbacba8e992f0bffeca04c6379c3ad45

Starting program: /usr/bin/qemu-arm msgmerge-static ar.po anjuta.pot

[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff4bc3ff0 (LWP 26108)]
[New Thread 0x7ffff4b8aff0 (LWP 26109)]
[New Thread 0x7ffff4b51ff0 (LWP 26110)]
[New Thread 0x7ffff4b18ff0 (LWP 26111)]
[New Thread 0x7ffff4adfff0 (LWP 26112)]
[New Thread 0x7ffff4aa6ff0 (LWP 26113)]
[New Thread 0x7ffff4a6dff0 (LWP 26114)]
[New Thread 0x7ffff4a34ff0 (LWP 26115)]
[New Thread 0x7ffff49fbff0 (LWP 26116)]
[New Thread 0x7ffff49c2ff0 (LWP 26117)]
[New Thread 0x7ffff4989ff0 (LWP 26118)]
[New Thread 0x7ffff4950ff0 (LWP 26119)]
[New Thread 0x7ffff4917ff0 (LWP 26120)]
[New Thread 0x7ffff48deff0 (LWP 26121)]
[New Thread 0x7ffff48a5ff0 (LWP 26122)]
[New Thread 0x7ffff486cff0 (LWP 26123)]
[New Thread 0x7ffff4833ff0 (LWP 26124)]
[New Thread 0x7ffff47faff0 (LWP 26125)]
[New Thread 0x7ffff47c1ff0 (LWP 26126)]
[New Thread 0x7ffff4788ff0 (LWP 26127)]
[New Thread 0x7ffff474fff0 (LWP 26128)]
[New Thread 0x7ffff4716ff0 (LWP 26129)]
[New Thread 0x7ffff46ddff0 (LWP 26130)]
.........................
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4aa6ff0 (LWP 26113)]
0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
1333 tb1 = tb1->jmp_next[n1];

(gdb) bt
#0 0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
#1 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63540)
    at /home/user/git/qemu/exec.c:1361
#2 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c634d8, n=0)
    at /home/user/git/qemu/exec.c:1355
#3 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c634d8)
    at /home/user/git/qemu/exec.c:1361
#4 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63470, n=0)
    at /home/user/git/qemu/exec.c:1355
#5 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63470)
    at /home/user/git/qemu/exec.c:1361
#6 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63408, n=1)
    at /home/user/git/qemu/exec.c:1355
#7 0x00000000600481d1 in tb_reset_jump_recursive (tb=0x7ffff4c63408)
    at /home/user/git/qemu/exec.c:1362
#8 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c633a0, n=0)
    at /home/user/git/qemu/exec.c:1355
#9 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c633a0)
    at /home/user/git/qemu/exec.c:1361
#10 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63338, n=0)
    at /home/user/git/qemu/exec.c:1355
#11 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63338)
    at /home/user/git/qemu/exec.c:1361
#12 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c632d0, n=0)
    at /home/user/git/qemu/exec.c:1355
---Type <return> to continue, or q <return> to quit---
#13 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c632d0)
    at /home/user/git/qemu/exec.c:1361
#14 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63268, n=1)
    at /home/user/git/qemu/exec.c:1355
#15 0x00000000600481d1 in tb_reset_jump_recursive (tb=0x7ffff4c63268)
    at /home/user/git/qemu/exec.c:1362
#16 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63200, n=0)
    at /home/user/git/qemu/exec.c:1355
#17 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63200)
    at /home/user/git/qemu/exec.c:1361
#18 0x00000000600487c5 in cpu_unlink_tb (env=0x62385400) at /home/user/git/qemu/exec.c:1617
#19 0x00000000600488e8 in cpu_exit (env=0x62385400) at /home/user/git/qemu/exec.c:1662
#20 0x0000000060000798 in start_exclusive () at /home/user/git/qemu/linux-user/main.c:152
#21 0x0000000060000a4b in do_kernel_trap (env=0x62359940)
    at /home/user/git/qemu/linux-user/main.c:493
#22 0x00000000600023f3 in cpu_loop (env=0x62359940) at /home/user/git/qemu/linux-user/main.c:797
#23 0x00000000600123df in clone_func (arg=0x7ffffffd76e0)
    at /home/user/git/qemu/linux-user/syscall.c:3561
#24 0x00000000600b382d in start_thread (arg=<value optimized out>) at pthread_create.c:297
#25 0x00000000600f1809 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#26 0x0000000000000000 in ?? ()
(gdb)

Its interesting to see this :
#0 0x00000000600480d4 in tb_reset_jump_recursive2 (tb=0x7ffff4c63540, n=0)
    at /home/user/git/qemu/exec.c:1333
        tb1 = 0x0 <<<<<<<<<<
        tb_next = 0xf4c63610 <<<<<<<<<<
        ptb = 0x60341c91 <<<<<<<<<<
        n1 = 0
#1 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63540)
    at /home/user/git/qemu/exec.c:1361
No locals.
#2 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c634d8, n=0)
    at /home/user/git/qemu/exec.c:1355
        tb1 = 0x7ffff4c634d8 <<<<<<<<<<<
        tb_next = 0x7ffff4c63540 <<<<<<<<<<<
        ptb = 0x7ffff4c63860 <<<<<<<<<<<
        n1 = 0
#3 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c634d8)
    at /home/user/git/qemu/exec.c:1361
No locals.
#4 0x0000000060048160 in tb_reset_jump_recursive2 (tb=0x7ffff4c63470, n=0)
    at /home/user/git/qemu/exec.c:1355
        tb1 = 0x7ffff4c63470
        tb_next = 0x7ffff4c634d8
        ptb = 0x7ffff4c63530
        n1 = 0
#5 0x00000000600481c0 in tb_reset_jump_recursive (tb=0x7ffff4c63470)
    at /home/user/git/qemu/exec.c:1361

Jan-Simon Möller (dl9pf) wrote :

We always see this in :

exec.c1662:

void cpu_exit(CPUState *env)
{
    cpu_unlink_tb(env);
    env->exit_request = 1;
}

A quick test with the statement cpu_unlink_tb(env) removed passed the test.

Jan-Simon Möller (dl9pf) wrote :

Alternative testcase:
compile and "export OMP_NUM_THREADS=6" before running.

/******************************************************************************
* FILE: omp_mm.c
* DESCRIPTION:
* OpenMp Example - Matrix Multiply - C Version
* Demonstrates a matrix multiply using OpenMP. Threads share row iterations
* according to a predefined chunk size.
* AUTHOR: Blaise Barney
* LAST REVISED: 06/28/05
******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define NRA 620 /* number of rows in matrix A */
#define NCA 150 /* number of columns in matrix A */
#define NCB 70 /* number of columns in matrix B */

int main (int argc, char *argv[])
{
int tid, nthreads, i, j, k, chunk;
double a[NRA][NCA], /* matrix A to be multiplied */
        b[NCA][NCB], /* matrix B to be multiplied */
        c[NRA][NCB]; /* result matrix C */

chunk = 10; /* set loop iteration chunk size */

/*** Spawn a parallel region explicitly scoping all variables ***/
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(tid,i,j,k)
  {
  tid = omp_get_thread_num();
  if (tid == 0)
    {
    nthreads = omp_get_num_threads();
    printf("Starting matrix multiple example with %d threads\n",nthreads);
    printf("Initializing matrices...\n");
    }
  /*** Initialize matrices ***/
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NRA; i++)
    for (j=0; j<NCA; j++)
      a[i][j]= i+j;
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NCA; i++)
    for (j=0; j<NCB; j++)
      b[i][j]= i*j;
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NRA; i++)
    for (j=0; j<NCB; j++)
      c[i][j]= 0;

  /*** Do matrix multiply sharing iterations on outer loop ***/
  /*** Display who does which iterations for demonstration purposes ***/
  printf("Thread %d starting matrix multiply...\n",tid);
  #pragma omp for schedule (static, chunk)
  for (i=0; i<NRA; i++)
    {
    printf("Thread=%d did row=%d\n",tid,i);
    for(j=0; j<NCB; j++)
      for (k=0; k<NCA; k++)
        c[i][j] += a[i][k] * b[k][j];
    }
  } /*** End of parallel region ***/

/*** Print results ***/
printf("******************************************************\n");
printf("Result Matrix:\n");
for (i=0; i<NRA; i++)
  {
  for (j=0; j<NCB; j++)
    printf("%6.2f ", c[i][j]);
  printf("\n");
  }
printf("******************************************************\n");
printf ("Done.\n");

}

Peter Maydell (pmaydell) wrote :

Thanks for the test case. I can confirm that I can reproduce this. (Detail: compiled with "gcc -fopenmp 668799.c -o 668799 -static" on an Ubuntu maverick ARM system. qemu-arm-user on x86-64 then segfaults when run with "OMP_NUM_THREADS=6 ./arm-linux-user/qemu-arm /tmp/668799".) The point when it segfaults varies, and occasionally it deadlocks instead for variety...
Looks like a fun bug; I'll have a deeper look at this later this week.

Jan-Simon Möller (dl9pf) wrote :

To me it looks like racy/double lockings. We already lock meantime before some functions up the code-path of cpu_unlink_tb . IMHO the spinlock in cpu_unlink_tb is now unnecessary - at least in this code-path. Maybe we exit the cpu before the previous/upper lock is released. HTH, have phun!

Jan-Simon Möller (dl9pf) wrote :

See linux-user/main.c function start_exclusive.

Peter Maydell (pmaydell) wrote :

The following patch stops the segfault (which happens because cpu_unlink_tb() is fiddling with the links between tbs without taking the tb_lock, so another thread can come in via eg tb_add_jump() and cause corruption of the linked lists). However, there are a number of comments in the TB handling code about things being non-thread-safe or not SMP safe, so I need to have a more careful think about the whole thing.

diff --git a/exec.c b/exec.c
index db9ff55..5f4a50b 100644
--- a/exec.c
+++ b/exec.c
@@ -1606,9 +1606,8 @@ static void cpu_unlink_tb(CPUState *env)
        emulation this often isn't actually as bad as it sounds. Often
        signals are used primarily to interrupt blocking syscalls. */
     TranslationBlock *tb;
- static spinlock_t interrupt_lock = SPIN_LOCK_UNLOCKED;

- spin_lock(&interrupt_lock);
+ spin_lock(&tb_lock);
     tb = env->current_tb;
     /* if the cpu is currently executing code, we must unlink it and
        all the potentially executing TB */
@@ -1616,7 +1615,7 @@ static void cpu_unlink_tb(CPUState *env)
         env->current_tb = NULL;
         tb_reset_jump_recursive(tb);
     }
- spin_unlock(&interrupt_lock);
+ spin_unlock(&tb_lock);
 }

 /* mask must never be zero, except for A20 change call */

Brian Harring (ferringb) wrote :

@Peter, that patch, against 0.13 results in some odd deadlocks; specifically a racey deadlock during signal handling best I can tell.

Attached is an strac'ing of the make process- nothing special, just forking off some children, wait'ing on the results- if you look earlier in the log you'll see the fork/wait working fine (look for SIGCHLD delivery), then see the final one go out to lunch waiting on a futex.

At this point it takes me about 4-5 runs to trigger it, but it's proving fairly reproducible on the builds I've been attempting- cmake in particular seems to trigger it rather quickly.

Brian Harring (ferringb) wrote :

Additional note... it *looks* like the deadlock potential is there already in 0.13, it's just heavily exacerbated by this patch- out of about 600 builds I've seen 2 lockup in the same fashion (rate was far higher with the patch on this ticket).

2010/11/28 Brian Harring <email address hidden>:
> Additional note... it *looks* like the deadlock potential is there
> already in 0.13, it's just heavily exacerbated by this patch- out of
> about 600 builds I've seen 2 lockup in the same fashion (rate was far
> higher with the patch on this ticket).

Thanks for the testing. I had a nasty feeling this wouldn't
be the only problem, which is why I didn't propose that patch
as a real fix.

(I think that running multithreaded programs under user-mode
emulation is effectively hitting a lot of the same locking issues
that you would get for emulating an MP core in multiple threads.)

-- PMM

On 28 November 2010 11:24, Peter Maydell <email address hidden> wrote:
> 2010/11/28 Brian Harring <email address hidden>:
>> Additional note... it *looks* like the deadlock potential is there
>> already in 0.13, it's just heavily exacerbated by this patch- out of
>> about 600 builds I've seen 2 lockup in the same fashion (rate was far
>> higher with the patch on this ticket).

> (I think that running multithreaded programs under user-mode
> emulation is effectively hitting a lot of the same locking issues
> that you would get for emulating an MP core in multiple threads.)

Having looked in a bit more detail at the code, I definitely think
the locking is insufficient for the TCG TranslationBlock structures.
In particular:
 * cpu_exit() updates the linked lists in the TB ->jmp_next[] arrays
 * cpu_exit() is called:
   + by other threads, in linux-user mode
   + by signal handlers (both in linux user mode for taking signals
and in system mode via qemu_notify_event() when timers expire)
   + by normal generated code

At the moment nothing blocks signals when it is modifying the TB
jmp_next arrays (either via cpu_exit() or tb_add_jump()), so if you're
unlucky and you take a signal while you're in the middle of modifying
a jmp_next list you might end up with the list corrupted. This is
more likely to happen with multithreaded linux-user mode code I
think, but there's still a possibility there even in pure system mode.

I'm not sure what the best approach to solving this is. We could
add "block signals; take mutex" around all the places in the code
that touch the TB data structures. That seems a bit heavyweight
and it's also not totally clear to me what the best points in the
exec.c code to put the locking are; but it would fix the problem.

Alternatively we could try making cpu_exit() not have to actually
fiddle with the TB graph. To do that you'd need to do one of:
 * explicit checks for a "should we exit" flag at backwards and
indirect branches and every few hundred insns. This is extra
straight-line-code overhead, but on the other hand you get to
avoid having all your cached next-tb links trashed every time
something has to call cpu_exit(), so it's not totally obvious that
it would be a net loss
 * have cpu_exit() force itself to be running in the thread for that
virtual CPU by sending a signal, to avoid the "thread has executed
its way out of the TB" problem that currently requires us to trace
through the whole TB call graph. Then we could do something
simpler and atomic/reentrant to stop the cpu rather than chasing
and editing linked lists

I think on balance I maybe favour the last one, but I'm not
sure. Does anybody have an opinion?

-- PMM

Jan-Simon Möller (dl9pf) wrote :

Am Mittwoch, 1. Dezember 2010, 20:40:37 schrieb Peter Maydell:
> On 28 November 2010 11:24, Peter Maydell <email address hidden> wrote:
> > 2010/11/28 Brian Harring <email address hidden>:
> >> Additional note... it *looks* like the deadlock potential is there
> >> already in 0.13, it's just heavily exacerbated by this patch- out of
> >> about 600 builds I've seen 2 lockup in the same fashion (rate was far
> >> higher with the patch on this ticket).
>
> > (I think that running multithreaded programs under user-mode
> > emulation is effectively hitting a lot of the same locking issues
> > that you would get for emulating an MP core in multiple threads.)
>
> Having looked in a bit more detail at the code, I definitely think
> the locking is insufficient for the TCG TranslationBlock structures.
> In particular:
> * cpu_exit() updates the linked lists in the TB ->jmp_next[] arrays
> * cpu_exit() is called:
> + by other threads, in linux-user mode
> + by signal handlers (both in linux user mode for taking signals
> and in system mode via qemu_notify_event() when timers expire)
> + by normal generated code

Thanks for investigation this further!

> At the moment nothing blocks signals when it is modifying the TB
> jmp_next arrays (either via cpu_exit() or tb_add_jump()), so if you're
> unlucky and you take a signal while you're in the middle of modifying
> a jmp_next list you might end up with the list corrupted. This is
> more likely to happen with multithreaded linux-user mode code I
> think, but there's still a possibility there even in pure system mode.
>
> I'm not sure what the best approach to solving this is. We could
> add "block signals; take mutex" around all the places in the code
> that touch the TB data structures. That seems a bit heavyweight
> and it's also not totally clear to me what the best points in the
> exec.c code to put the locking are; but it would fix the problem.

Adding locks everywhere is probably the "save but horribly slow" solution.

> Alternatively we could try making cpu_exit() not have to actually
> fiddle with the TB graph. To do that you'd need to do one of:
> * explicit checks for a "should we exit" flag at backwards and
> indirect branches and every few hundred insns. This is extra
> straight-line-code overhead, but on the other hand you get to
> avoid having all your cached next-tb links trashed every time
> something has to call cpu_exit(), so it's not totally obvious that
> it would be a net loss
> * have cpu_exit() force itself to be running in the thread for that
> virtual CPU by sending a signal, to avoid the "thread has executed
> its way out of the TB" problem that currently requires us to trace
> through the whole TB call graph. Then we could do something
> simpler and atomic/reentrant to stop the cpu rather than chasing
> and editing linked lists
>
> I think on balance I maybe favour the last one, but I'm not
> sure. Does anybody have an opinion?

Sounds reasonable to me.

Brainstorming:
Would per-thread data-structures make any sense ?

-- JSM

For information, I'm *not* able to reproduce this problem with:
    * QEMU user-mode v1.0.1
    * host system: Slackware64-current
    * guest system: Slackware/ARM 13.37
    * build command: gcc -fopenmp 668799.c -o 668799 -static
    * run commands:
          * ./668799
          * OMP_NUM_THREADS=2 ./668799
          * OMP_NUM_THREADS=6 ./668799
          * OMP_NUM_THREADS=16 ./668799
          * msgmerge /dev/null /dev/null (initial test-case)

Cédric.

Ying-Chun Liu (paulliu) wrote :

Just a note.

We still have this issue when building unity on ARM. It crashed when running msgmerge.

Erik de Castro Lopo (erikd) wrote :

The test I'm using in LP:1098729 hangs or segfaults nearly every single run.

Peter Maydell (pmaydell) wrote :

Patches have now been committed to QEMU which fix the subset of "multithreaded guests crash" which this bug covers [ie ones where there was a race between tb_unlink_cpu() and the cpu thread using or modifying the TB graph], so I'm closing this bug.

Note that there are still other classes of QEMU bug which also manifest as "my multithreaded guest crashes" -- those are covered by LP:1098729.

Changed in qemu:
status: New → Fix Committed
Peter Maydell (pmaydell) on 2013-03-07
Changed in qemu-linaro:
status: New → In Progress
Peter Maydell (pmaydell) on 2013-04-26
Changed in qemu-linaro:
status: In Progress → Fix Committed
Peter Maydell (pmaydell) on 2013-06-13
Changed in qemu-linaro:
status: Fix Committed → Fix Released
Changed in qemu:
status: Fix Committed → Fix Released
Changed in qemu-linaro:
milestone: none → 2013.06
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers