There was another suggestion [1] but with it applied the case still hangs (after 12 and 1 iteration(s), so not much later than usual). But the threads looked slightly different this time: Id Target Id Frame * 1 Thread 0x7f1eab8efb40 (LWP 13686) "libvirtd" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 2 Thread 0x7f1eab434700 (LWP 13688) "libvirtd" futex_wait_cancelable (private=, expected=0, futex_word=0x557ce654a534) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 [...] I see only one process directly in the lowlevellock.S (gdb) bt #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007f1eaf378945 in __GI___pthread_mutex_lock (mutex=0x7f1e8c0016d0) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f1eaf4ef095 in virMutexLock (m=) at ../../../src/util/virthread.c:89 #3 0x00007f1eaf580fbc in virChrdevFDStreamCloseCb (st=st@entry=0x7f1e9c0128f0, opaque=opaque@entry=0x7f1e9c031090) at ../../../src/conf/virchrdev.c:252 #4 0x00007f1eaf48f180 in virFDStreamCloseInt (st=0x7f1e9c0128f0, streamAbort=) at ../../../src/util/virfdstream.c:742 #5 0x00007f1eaf6bbec9 in virStreamAbort (stream=0x7f1e9c0128f0) at ../../../src/libvirt-stream.c:1244 #6 0x0000557ce5bd83aa in daemonStreamHandleAbort (client=client@entry=0x557ce65cc650, stream=stream@entry=0x7f1e9c0315b0, msg=msg@entry=0x557ce65d1e20) at ../../../src/remote/remote_daemon_stream.c:636 #7 0x0000557ce5bd8ee3 in daemonStreamHandleWrite (stream=0x7f1e9c0315b0, client=0x557ce65cc650) at ../../../src/remote/remote_daemon_stream.c:749 [...] On a retry this was the same again, so the suggeste dpatch did change something. But not enough yet. I need to find which lock that actually is and if possible who holds it at the moment. The lock is on virMutexLock(&priv->devs->lock); in virChrdevFDStreamCloseCb (gdb) p priv->devs->lock $1 = {lock = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = Yes, Shared = No, Protocol = Priority protect, Priority ceiling = 0}} (gdb) p &priv->devs->lock $2 = (virMutex *) 0x7f4554020b20 Interesting that it lists Status as "Not aquired" I wanted to check which path that would be, but the value for the hash seems wrong: p priv->devs->hash $6 = (virHashTablePtr) 0x25 Code would usually access hash and 0x25 is not a valid address. The code after the lock would have failed in virHashRemoveEntry(priv->devs->hash, priv->path); This will access 0x25 nextptr = table->table + virHashComputeKey(table, name); So we are seeing a not fully cleaned up structure here. Most likely if not being a lock issue it would be a crash instead. OTOH that might be due to our unlocking with the most recent patch [1], allowing it the struct to go partially away. I dropped the patch and debugged again if it would be more useful to check in there for the actual lock and path. I was back at my two backtraces fighting for the lock. The lock now was in a "better" state. (gdb) p priv->devs->lock $9 = {lock = pthread_mutex_t = {Type = Normal, Status = Acquired, possibly with waiters, Owner ID = 23102, Robust = No, Shared = No, Protocol = None}} (gdb) p priv->devs->hash $10 = (virHashTablePtr) 0x7f2928000c00 It is a one entry list: (gdb) p priv->devs->hash->table.next Cannot access memory at address 0x0 (gdb) p (virHashEntry)priv->devs->hash->table $13 = {next = 0x7f2928000fe0, name = 0xa4b28ee3, payload = 0x3} Letting the FDST unlock in between did not help (if anything it made it worse by a stale partial struct that would crash). [1]: https://www.redhat.com/archives/libvir-list/2019-April/msg00207.html