Ubuntu
linux package

Bug #1838575
Comment #21

Comment 21 for bug 1838575

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-08-27:

#21

The above was through libvirt, doing that directly in qemu now to throw it into debugging more easily:

$ virsh nodedev-detach pci_0000_21_00_1 --driver vfio

$ qemu/x86_64-softmmu/qemu-system-x86_64 -name guest=test-vfio-slowness -m 131072 -smp 1 -no-user-config -drive file=/var/lib/uvtool/libvirt/images/eoan.qcow,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -device vfio-pci,host=21:00.1,id=hostdev0,bus=pci.0,addr=0x7 -display curses

The speed differs a lot between retries:

First run (after boot):
5204 3.793765 <... ioctl resumed> , 0x7ffffa6687b0) = 0 <13.464192>
Second run (after aborting the former qemu)
5251 179.361434 <... ioctl resumed> , 0x7ffed49fbc70) = 0 <189.030433>
Third run (after aborting the former qemu)
5728 703.886935 <... ioctl resumed> , 0x7fff9f6a7120) = 0 <713.556876>

Maybe a former unclean stop makes it take extra time?
Or there is still cleanup going on in the iommu which stalls the follow on mappings to be established.

Forth run (after proper guest shutdown + 10 min wait)
6171 11.167221 <... ioctl resumed> , 0x7ffe597f03e0) = 0 <20.838401>
Fifth run (again kill and restart)
6215 1182.244403 <... ioctl resumed> , 0x7fff20a49b10) = 0 <1191.917291>

10 sec iommu counters while in the hang:
34330395280 cycles
16 6 0 45 0 345 907 111 102 7927 1761 0 0 0 0 8492 0 168 129 0 0 0 0 24 cmd_processed (33.33%)
cmd_processed_inv (33.37%)
ign_rd_wr_mmio_1ff8h (33.41%)
int_dte_hit (33.44%)
int_dte_mis (33.44%)
mem_dte_hit (33.44%)
mem_dte_mis (33.44%)
mem_iommu_tlb_pde_hit (33.44%)
mem_iommu_tlb_pde_mis (33.40%)
mem_iommu_tlb_pte_hit (33.37%)
mem_iommu_tlb_pte_mis (33.33%)
mem_pass_excl (33.30%)
mem_pass_pretrans (33.29%)
mem_pass_untrans (33.29%)
mem_target_abort (33.29%)
mem_trans_total (33.30%)
page_tbl_read_gst (33.30%)
page_tbl_read_nst (33.30%)
page_tbl_read_tot (33.30%)
smi_blk (33.30%)
smi_recv (33.31%)
tlb_inv (33.31%)
vapic_int_guest (33.31%)
vapic_int_non_guest (33.31%)

10.011867065 seconds time elapsed

But then the slowness doesn't have to be visible in these counters.
Next step, check the kernel if we can find (on x86) any place to parallelize this.

The above was through libvirt, doing that directly in qemu now to throw it into debugging more easily:

$ virsh nodedev-detach pci_0000_21_00_1 --driver vfio

The speed differs a lot between retries:

First run (after boot):
5204       3.793765 <... ioctl resumed> , 0x7ffffa6687b0) = 0 <13.464192>
Second run (after aborting the former qemu)
5251     179.361434 <... ioctl resumed> , 0x7ffed49fbc70) = 0 <189.030433>
Third run (after aborting the former qemu)
5728     703.886935 <... ioctl resumed> , 0x7fff9f6a7120) = 0 <713.556876>

Maybe a former unclean stop makes it take extra time?
Or there is still cleanup going on in the iommu which stalls the follow on mappings to be established.

Forth run (after proper guest shutdown + 10 min wait)
6171      11.167221 <... ioctl resumed> , 0x7ffe597f03e0) = 0 <20.838401>
Fifth run (again kill and restart)
6215    1182.244403 <... ioctl resumed> , 0x7fff20a49b10) = 0 <1191.917291>

10 sec iommu counters while in the hang:
       34330395280      cycles                                                      
                16      cmd_processed                                                 (33.33%)
                 6      cmd_processed_inv                                             (33.37%)
                 0      ign_rd_wr_mmio_1ff8h                                          (33.41%)
                45      int_dte_hit                                                   (33.44%)
                 0      int_dte_mis                                                   (33.44%)
               345      mem_dte_hit                                                   (33.44%)
               907      mem_dte_mis                                                   (33.44%)
               111      mem_iommu_tlb_pde_hit                                         (33.44%)
               102      mem_iommu_tlb_pde_mis                                         (33.40%)
              7927      mem_iommu_tlb_pte_hit                                         (33.37%)
              1761      mem_iommu_tlb_pte_mis                                         (33.33%)
                 0      mem_pass_excl                                                 (33.30%)
                 0      mem_pass_pretrans                                             (33.29%)
                 0      mem_pass_untrans                                              (33.29%)
                 0      mem_target_abort                                              (33.29%)
              8492      mem_trans_total                                               (33.30%)
                 0      page_tbl_read_gst                                             (33.30%)
               168      page_tbl_read_nst                                             (33.30%)
               129      page_tbl_read_tot                                             (33.30%)
                 0      smi_blk                                                       (33.30%)
                 0      smi_recv                                                      (33.31%)
                 0      tlb_inv                                                       (33.31%)
                 0      vapic_int_guest                                               (33.31%)
                24      vapic_int_non_guest                                           (33.31%)

10.011867065 seconds time elapsed

But then the slowness doesn't have to be visible in these counters.
Next step, check the kernel if we can find (on x86) any place to parallelize this.

Ubuntulinux package

Comment 21 for bug 1838575

Ubuntu
linux package