Comment 18 for bug 1805256

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote : Re: qemu_futex_wait() lockups in ARM64: 2 possible issues

Quick update...

> value INT_MAX (4294967295) seems WRONG for qemu_futex_wait():
>
> - EV_BUSY, being -1, and passed as an argument qemu_futex_wait(void *,
> unsigned), is a two's complement, making argument into a INT_MAX when
> that's not what is expected (unless I missed something).
>
> *** If that is the case, unsure if you, Paolo, prefer declaring
> *(QemuEvent)->value as an integer or changing EV_BUSY to "2" would okay
> here ***
>
> BUG: description:
> https://bugs.launchpad.net/qemu/+bug/1805256/comments/15

I realized this might be intentional, but, still, I tried:

    https://pastebin.ubuntu.com/p/6rkkY6fJdm/

looking for anything that could have misbehaved in arm64 (specially
concerned on casting and type conversions between the functions).

> QUESTION:
>
> - Should qemu_event_set() check return code from
> qemu_futex_wake()->qemu_futex()->syscall() in order to know if ANY
> waiter was ever woken up ? Maybe even loop until at least 1 is awaken ?

And I also tried:

- qemu_futex(f, FUTEX_WAKE, n, NULL, NULL, 0);
+ while(qemu_futex(pval, FUTEX_WAKE, val, NULL, NULL, 0) == 0)
+ continue;

and it made little difference (took way more time for me to reproduce
the issue though):

"""
(gdb) run
Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
./disk01.ext4.qcow2 ./output.qcow2

[New Thread 0xffffbec5ad90 (LWP 72839)]
[New Thread 0xffffbe459d90 (LWP 72840)]
[New Thread 0xffffbdb57d90 (LWP 72841)]
[New Thread 0xffffacac9d90 (LWP 72859)]
[New Thread 0xffffa7ffed90 (LWP 72860)]
[New Thread 0xffffa77fdd90 (LWP 72861)]
[New Thread 0xffffa6ffcd90 (LWP 72862)]
[New Thread 0xffffa67fbd90 (LWP 72863)]
[New Thread 0xffffa5ffad90 (LWP 72864)]

[Thread 0xffffa5ffad90 (LWP 72864) exited]
[Thread 0xffffa6ffcd90 (LWP 72862) exited]
[Thread 0xffffa77fdd90 (LWP 72861) exited]
[Thread 0xffffbdb57d90 (LWP 72841) exited]
[Thread 0xffffa67fbd90 (LWP 72863) exited]
[Thread 0xffffacac9d90 (LWP 72859) exited]
[Thread 0xffffa7ffed90 (LWP 72860) exited]

<HUNG w/ 3 threads in the stack trace showed before>
"""

All the tasks left are blocked in a system call, so no task left to call
qemu_futex_wake() to unblock thread #2 (in futex()), which would unblock
thread #1 (doing poll() in a pipe with thread #2).

Those 7 threads exit before disk conversion is complete (sometimes in
the beginning, sometimes at the end).

I'll try to check why those tasks exited.

Any thoughts ?

Tks