Comment 0 for bug 1836913

Revision history for this message
Christian Ehrhardt  (paelzer) wrote : crash (on ppc64) hen restarting numad while huge guest is active

I found that "by accident" while verifying another fix for numad.
It seems (at least on a power 9 box) that if you have a huge kvm guest running and restart numad that it crashes.

The crash seems related to some re-init of a static structure:

stack trace ---
#0 tcache_get (tc_idx=<optimized out>) at malloc.c:2950
        e = 0x9a5ddc1950
        e = <optimized out>
        __PRETTY_FUNCTION__ = "tcache_get"
#1 __GI___libc_malloc (bytes=16) at malloc.c:3058
        ar_ptr = <optimized out>
        victim = <optimized out>
        hook = <optimized out>
        tbytes = <optimized out>
        tc_idx = <optimized out>
        __PRETTY_FUNCTION__ = "__libc_malloc"
#2 0x0000009a300279a0 in ?? ()
No symbol table info available.
#3 0x0000009a3002cad8 in ?? ()
No symbol table info available.
#4 0x0000009a30023794 in ?? ()
No symbol table info available.
#5 0x00007a6150998278 in generic_start_main (main=0x9a30022a00, argc=<optimized out>, argv=0x7fffe93a7828, auxvec=0x7fffe93a7880, init=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>, fini=<optimized out>) at ../csu/libc-start.c:308
        self = 0x7a6150dc38d0
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {8465053667230565969, 134558384812288, 8465057470262718529, 0 <repeats 13 times>, 134558387008032, 0, 134558387008040, 662230455376, 0, 2449962883098869759, 0 <repeats 42 times>}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x7fffe93a7700, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = -382044416}}}
        not_first_call = <optimized out>
#6 0x00007a6150998484 in __libc_start_main (argc=<optimized out>, argv=<optimized out>, ev=<optimized out>, auxvec=<optimized out>, rtld_fini=<optimized out>, stinfo=<optimized out>, stack_on_entry=<optimized out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116
No locals.
#7 0x0000000000000000 in ?? ()
No symbol table info available.
--- source code stack trace ---
#0 tcache_get (tc_idx=<optimized out>) at malloc.c:2950
  [Error: malloc.c was not found in source tree]
#1 __GI___libc_malloc (bytes=16) at malloc.c:3058
  [Error: malloc.c was not found in source tree]
#2 0x0000009a300279a0 in ?? ()
#3 0x0000009a3002cad8 in ?? ()
#4 0x0000009a30023794 in ?? ()
#5 0x00007a6150998278 in generic_start_main (main=0x9a30022a00, argc=<optimized out>, argv=0x7fffe93a7828, auxvec=0x7fffe93a7880, init=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>, fini=<optimized out>) at ../csu/libc-start.c:308
  [Error: libc-start.c was not found in source tree]
#6 0x00007a6150998484 in __libc_start_main (argc=<optimized out>, argv=<optimized out>, ev=<optimized out>, auxvec=<optimized out>, rtld_fini=<optimized out>, stinfo=<optimized out>, stack_on_entry=<optimized out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116
  [Error: libc-start.c was not found in source tree]
#7 0x0000000000000000 in ?? ()

I thought at first this would be related to my debug rebuilds, but it seems to appear as-is.