Thanks James, but now I'm unsure where to go from here as it isn't reproducible with many tries at different scales that James and I did. @Sean/Lee Since you wondered if it might be due to Ubuntu Delta on top of 4.2 - there are two things we could compare Ubuntu's qemu to then: 1. qemu 4.2 as released by upstream 2. qemu 4.2 as build in centos (they have some delta as well) Not sure I can provide you #2 easily and #1 will always need a bit of delta to build&integrate correctly. All of that would be doable still, but in general if I'd provide you PPA builds could you try those in your environment to !reliably! trigger the issue allowing us to play bisect-ping-pong? The word "reliable" is important here as we'd need to sort out builds/patches by a reliable yes/no on each step. @Sean/Lee - which "the centos 8 build of the same qemu" version is that exactly? I might take a look at comparing the patches applied. We are at 4.2.1 already, the latest I found there was 4.2.0-29.el8.3.x86_64.rpm which already is the advanced-virt version (otherwise 2.12 based). @Sean/Lee All of the following suggested approaches depend on the question if you can test this reliably with different qemu PPA builds: - qemu 4.2 (as upstream) vs usual Ubuntu build -> find the offending patch in our delta - test Ubuntu 20.10 which has qemu 5.0 and libvirt 6.6 -> if fixed there find by which change - qemu 4.2 Ubuntu vs qemu 4.2 as in CentOS (but build for Ubuntu) -> if the latter works better then let us find by which (set of) patches. I was checking the Delta on Centos8 advanced-virt qemu 4.2 as that was reported to (maybe) work better. I was comparing which patches are applied, that are no on Ubuntu and which of those might be related. Among several individual fixes for some issues the biggest patch sets are feature backports for Enhanced LUKS/backup/snapshot handling, multifd migration/cancel, block-mirror, HMAT changes, virtiofs, qemu-img zero write, arm time handling and some related build time self tests. Due to the nature of these changes some affect the block handling by affecting block/job/hotplug. But they might only do so by accident, nothing is clearly for addressing the issue that was reported. And even of the list below most seem unrelated - so as Sean assumed maybe it just isn't exercised on centos enough to be seen there? eca0f3524a4eb57d03a56b0cbcef5527a0981ce4 backup: don't acquire aio_context in backup_clean 58226634c4b02af7b10862f7fbd3610a344bfb7f backup: Improve error for bdrv_getlength() failure 958a04bd32af18d9a207bcc78046e56a202aebc2 backup: Make sure that source and target size match 7b8e4857426f2e2de2441749996c6161b550bada block: Add flags to bdrv(_co)_truncate() 92b92799dc8662b6f71809100a4aabc1ae408ebb block: Add flags to BlockDriver.bdrv_co_truncate() 087ab8e775f48766068e65de1bc99d03b40d1670 block: always fill entire LUKS header space with zeros 8c6242b6f383e43fd11d2c50f8bcdd2bba1100fc block-backend: Add flags to blk_truncate() 564806c529d4e0acad209b1e5b864a8886092f1f block-backend: Reorder flush/pdiscard function definitions 0abf2581717a19d9749d5c2ff8acd0ac203452c2 block/backup-top: Don't acquire context while dropping top 1de6b45fb5c1489b450df7d1a4c692bba9678ce6 block: bdrv_reopen() with backing file in different AioContext e1d7f8bb1ec0c6911dcea81641ce6139dbded02d block.c: adding bdrv_co_delete_file 69032253c33ae1774233c63cedf36d32242a85fc block/curl: HTTP header field names are case insensitive 7788a319399f17476ff1dd43164c869e320820a2 block/curl: HTTP header fields allow whitespace around values 91005a495e228ebd7e5e173cd18f952450eef82d blockdev: Acquire AioContext on dirty bitmap functions 471ded690e19689018535e3f48480507ed073e22 blockdev: fix coding style issues in drive_backup_prepare 3ea67e08832775a28d0bd2795f01bc77e7ea1512 blockdev: honor bdrv_try_set_aio_context() context requirements c6996cf9a6c759c29919642be9a73ac64b38301b blockdev: Promote several bitmap functions to non-static 377410f6fb4f6b0d26d4a028c20766fae05de17e blockdev: Return bs to the proper context on snapshot abort bb4e58c6137e80129b955789dd4b66c1504f20dc blockdev: Split off basic bitmap operations for qemu-img 5b7bfe515ecbd584b40ff6e41d2fd8b37c7d5139 blockdev: unify qmp_blockdev_backup and blockdev-backup transaction paths 2288ccfac96281c316db942d10e3f921c1373064 blockdev: unify qmp_drive_backup and drive-backup transaction paths 7f16476fab14fc32388e0ebae793f64673848efa block: Fix blk->in_flight during blk_wait_while_drained() 30dd65f307b647eef8156c4a33bd007823ef85cb block: Fix cross-AioContext blockdev-snapshot eeea1faa099f82328f5831cf252f8ce0a59a9287 block: Fix leak in bdrv_create_file_fallback() fd17146cd93d1704cd96d7c2757b325fc7aac6fd block: Generic file creation fallback fbb92b6798894d3bf62fe3578d99fa62c720b242 block: Increase BB.in_flight for coroutine and sync interfaces 17e1e2be5f9e84e0298e28e70675655b43e225ea block: Introduce 'bdrv_reopen_commit_post' step 9bffae14df879255329473a7bd578643af2d4c9c block: introducing 'bdrv_co_delete_file' interface c7a0f2be8f95b220cdadbba9a9236eaf115951dc block: Make bdrv_get_cumulative_perm() public ef893b5c84f3199d777e33966dc28839f71b1a5c block: Make it easier to learn which BDS support bitmaps 78c81a3f108870d325b0a39d88711366afe6f703 block/nbd: Fix hang in .bdrv_close() b92902dfeaafbceaf744ab7473f2d070284f6172 block: pass BlockDriver reference to the .bdrv_co_create 65eb7c85a3e62529e2bad782e94d5a7b11dd5a92 block/qcow2: Move bitmap reopen into bdrv_reopen_commit_post d29d3d1f80b3947fb26e7139645c83de66d146a9 block: Relax restrictions for blockdev-snapshot 5a5e7f8cd86b7ced0732b1b6e28c82baa65b09c9 block: trickle down the fallback image creation function use to the block drivers 955c7d6687fefcd903900a1e597fcbc896c661cd block: truncate: Don't make backing file data visible 1bba30da24e1124ceeb0693c81382a0d77e20ca5 crypto.c: cleanup created file when block_crypto_co_create_opts_luks fails 87ca3b8fa615b278b33cabf9ed22b3f44b5214ba file-posix: Drop hdev_co_create_opts() 2f0c6e7a650de133eccd94e9bb6cf7b2070f07f1 file-posix: Support BDRV_REQ_ZERO_WRITE for truncate 89b6fc45614bb45dcd58f1590415afe5c2791abd hmp: Allow using qdev ID for qemu-io command 80f0900905b555f00d644894c786b6d66ac2e00e iscsi: Drop iscsi_co_create_opts() 0501e1aa1d32a6e02dd06a79bba97fbe9d557cb5 hw/pci/pcie: Forbid hot-plug if it's disabled on the slot 0dabc0f6544f2c0310546f6d6cf3b68979580a9c hw/pci/pcie: Move hot plug capability check to pre_plug callback 6a1e073378353eb6ac0565e0dc649b3db76ed5dc hw/pci/pcie: Replace PCI_DEVICE() casts with existing variable b660a84bbb0eb1a76b505648d31d5e82594fb75e job: take each job's lock individually in job_txn_apply 530a0963184e57e71a5b538e9161f115df533e96 pcie_root_port: Add hotplug disabling optio c6bdc312f30d5c7326aa2fdca3e0f98c15eb541a qapi: Add '@allow-write-only-overlay' feature for 'blockdev-snapshot' 5d72c68b49769c927e90b78af6d90f6a384b26ac qcow2: Expose bitmaps' size during measure eb8a0cf3ba26611f3981f8f45ac6a868975a68cc qcow2: Forward ZERO_WRITE flag for full preallocation f01643fb8b47e8a70c04bbf45e0f12a9e5bc54de qcow2: Support BDRV_REQ_ZERO_WRITE for truncate a555b8092abc6f1bbe4b64c516679cbd68fcfbd8 qemu-file: Don't do IO after shutdown 08558e33257ec796594bd411261028a93414a70c replication: assert we own context before job_cancel_sync 49b44549ace7890fffdf027fd3695218ee7f1121 virtio-blk: On restart, process queued requests in the proper context 7aa1c247b466870b0704d3ccdc3755e5e7394dca virtio-blk: Refactor the code that processes queued requests d0435bc513e23a4961b6af20164d1c6c219eb4ea virtio: don't enable notifications during polling Vice versa being on 4.2.1 already gives Ubuntu some block changes that might have caused this... Waiting for Sean/Lee to comment on how testable/reliable that is on their end -> incomplete