Alright, here is what is happening: Whenever program is stuck, thread #2 backtrace is this: (gdb) bt #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0x0000aaaaaabd41b0 in qemu_futex_wait (val=, f=) at ./util/qemu-thread-posix.c:438 #2 qemu_event_wait (ev=ev@entry=0xaaaaaac87ce8 ) at ./util/qemu-thread-posix.c:442 #3 0x0000aaaaaabee03c in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:261 #4 0x0000aaaaaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #5 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 #6 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Meaning that code is waiting for a futex inside kernel. (gdb) print rcu_call_ready_event $4 = {value = 4294967295, initialized = true} The QemuEvent "rcu_call_ready_event->value" is set to INT_MAX and I don't know why yet. rcu_call_ready_event->value is only touched by: qemu_event_init() -> bool init ? EV_SET : EV_FREE qemu_event_reset() -> atomic_or(&ev->value, EV_FREE) qemu_event_set() -> atomic_xchg(&ev->value, EV_SET) qemu_event_wait() -> atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY)' And there should be no 0x7fff value for "ev->value". qemu_event_init() is the one initializing the global: static QemuEvent rcu_call_ready_event; and it is called by "rcu_init_complete()" which is called by "rcu_init()": static void __attribute__((__constructor__)) rcu_init(void) a constructor function. So, "fixing" this issue by: (gdb) print rcu_call_ready_event $8 = {value = 4294967295, initialized = true} (gdb) watch rcu_call_ready_event Hardware watchpoint 1: rcu_call_ready_event (gdb) set rcu_call_ready_event.initialized = 1 (gdb) set rcu_call_ready_event.value = 0 and note that I added a watchpoint to rcu_call_ready_event global: Thread 1 "qemu-img" received signal SIGINT, Interrupt. (gdb) thread 2 [Switching to thread 2 (Thread 0xffffbec61d90 (LWP 33625))] (gdb) bt #0 0x0000aaaaaabd4110 in qemu_event_reset (ev=ev@entry=0xaaaaaac87ce8 ) #1 0x0000aaaaaabedff8 in call_rcu_thread (opaque=opaque@entry=0x0) at ./util/rcu.c:255 #2 0x0000aaaaaabd34c8 in qemu_thread_start (args=) at ./util/qemu-thread-posix.c:498 #3 0x0000ffffbf26a880 in start_thread (arg=0xfffffffff5bf) at pthread_create.c:486 #4 0x0000ffffbf1c4b9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 (gdb) print rcu_call_ready_event $9 = {value = 0, initialized = true} You can see I advanced in the qemu_event_{reset,set,wait} logic. (gdb) disassemble /m 0x0000aaaaaabd4110 Dump of assembler code for function qemu_event_reset: 408 in ./util/qemu-thread-posix.c 409 in ./util/qemu-thread-posix.c 410 in ./util/qemu-thread-posix.c 411 in ./util/qemu-thread-posix.c 0x0000aaaaaabd40f0 <+0>: ldrb w1, [x0, #4] 0x0000aaaaaabd40f4 <+4>: cbz w1, 0xaaaaaabd411c 0x0000aaaaaabd411c <+44>: stp x29, x30, [sp, #-16]! 0x0000aaaaaabd4120 <+48>: adrp x3, 0xaaaaaac20000 0x0000aaaaaabd4124 <+52>: add x3, x3, #0x908 0x0000aaaaaabd4128 <+56>: mov x29, sp 0x0000aaaaaabd412c <+60>: adrp x1, 0xaaaaaac20000 0x0000aaaaaabd4130 <+64>: adrp x0, 0xaaaaaac20000 0x0000aaaaaabd4134 <+68>: add x3, x3, #0x290 0x0000aaaaaabd4138 <+72>: add x1, x1, #0xc00 0x0000aaaaaabd413c <+76>: add x0, x0, #0xd40 0x0000aaaaaabd4140 <+80>: mov w2, #0x19b // #411 0x0000aaaaaabd4144 <+84>: bl 0xaaaaaaaff190 <__assert_fail@plt> 412 in ./util/qemu-thread-posix.c 0x0000aaaaaabd40f8 <+8>: ldr w1, [x0] 413 in ./util/qemu-thread-posix.c 0x0000aaaaaabd40fc <+12>: dmb ishld 414 in ./util/qemu-thread-posix.c 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 0x0000aaaaaabd4104 <+20>: ret 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 0x0000aaaaaabd4118 <+40>: ret And I'm currently inside the STLXR and LDAXR logic. To make sure my program counter is advancing, I added a breakpoint at 0x0000aaaaaabd4108, so CBNZ instruction would branch indefinitely into LDXAR instruction again, until the LDAXR<->STLXR logic is satisfied (inside qemu_event_wait()). (gdb) break *(0x0000aaaaaabd4108) Breakpoint 2 at 0xaaaaaabd4108: file ./util/qemu-thread-posix.c, line 414. which is basically this: if (value == EV_SET) { EV_SET == 0 atomic_or(&ev->value, EV_FREE); EV_FREE = 1 } and we can see that this logic being called one time after another: (gdb) c Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset ( ev=ev@entry=0xaaaaaac87ce8 ) at ./util/qemu-thread-posix.c:414 (gdb) c Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset ( ev=ev@entry=0xaaaaaac87ce8 ) at ./util/qemu-thread-posix.c:414 (gdb) c Thread 2 "qemu-img" hit Breakpoint 3, 0x0000aaaaaabd4108 in qemu_event_reset ( ev=ev@entry=0xaaaaaac87ce8 ) at ./util/qemu-thread-posix.c:414 EVEN when rcu_call_ready_event->value is already EV_SET (0): (gdb) print rcu_call_ready_event $11 = {value = 0, initialized = true} (gdb) info break Num Type Disp Enb Address What 1 hw watchpoint keep y rcu_call_ready_event 3 breakpoint keep n 0x0000aaaaaabd4108 qemu-thread-posix.c:414 breakpoint already hit 23 times 4 breakpoint keep y 0x0000aaaaaabd4148 qemu-thread-posix.c:424 IF I enable only rcu_call_ready_event HW watchpoint, nothing is triggered. (gdb) watch *(rcu_call_ready_event->value) Hardware watchpoint 6: *(rcu_call_ready_event->value) not if I set it directly to QemuEvent->value... assert(ev->initialized); value = atomic_read(&ev->value); smp_mb_acquire(); if (value == EV_SET) { atomic_or(&ev->value, EV_FREE); } meaning that "value" and "ev->value" might have a diff value... is that so ? (gdb) print value $14 = can't say.. checking registers AND stack: 0x0000aaaaaabd4100 <+16>: cbz w1, 0xaaaaaabd4108 0x0000aaaaaabd4104 <+20>: ret 0x0000aaaaaabd4108 <+24>: ldaxr w1, [x0] 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 => 0x0000aaaaaabd4110 <+32>: stlxr w2, w1, [x0] 0x0000aaaaaabd4114 <+36>: cbnz w2, 0xaaaaaabd4108 x0 0xaaaaaac87ce8 187649986428136 x1 0x1 1 x2 0x1 1 x3 0x0 0 x4 0xffffbec61e98 281473882398360 x5 0xffffbec61c90 281473882397840 x6 0xffffbec61c90 281473882397840 x7 0x1 1 x8 0x65 101 x9 0x0 0 x10 0x0 0 x11 0x0 0 x12 0xffffbec61d90 281473882398096 x13 0x0 0 x14 0x0 0 x15 0x2 2 x16 0xffffbf67ccf0 281473892994288 x17 0xffffbf274938 281473888766264 x18 0x23f 575 x19 0x0 0 x20 0xaaaaaac87ce8 187649986428136 x21 0x0 0 x22 0xfffffffff5bf 281474976708031 x23 0xaaaaaac87ce0 187649986428128 x24 0xaaaaaac29000 187649986039808 x25 0xfffffffff658 281474976708184 x26 0x1000 4096 x27 0xffffbf28c000 281473888862208 x28 0xffffbec61d90 281473882398096 x29 0xffffbec61420 281473882395680 x30 0xaaaaaabedff8 187649985798136 sp 0xffffbec61420 0xffffbec61420 pc 0xaaaaaabd4110 0xaaaaaabd4110 cpsr 0x0 [ EL=0 ] fpsr 0x0 0 fpcr 0x0 0 AND the ORR instruction is ALWAYS being executed against 0x1 (not 0x0, which is what I just changed by changing .value): (gdb) print value $14 = 0x0000aaaaaabd410c <+28>: orr w1, w1, #0x1 #0x1 is being used instead of contents of "value" local variable (volatile). I'll recompile QEMU flagging all those local "unsigned value" variables as being volatile and check if optimization changes. Or even try to disable optimizations.