Comment 2 for bug 1668829

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

This is my initial code analysis:

In between 2.3 and 2.5 we have about 80 vhost changes (no merges, no tests), being ~30 for vhost-user.

The most important vhost-user ones are these:

48854f57 vhost-user: fix log size
dc3db6ad vhost-user: start/stop all rings
5421f318 vhost-user: print original request on error
2b8819c6 vhost-user: modify SET_LOG_BASE to pass mmap size and offset
f6f56291 vhost user: add support of live migration
9a78a5dd vhost-user: send log shm fd along with log_base
1be0ac21 vhost-user: add vhost_user_requires_shm_log()
7263a0ad vhost-user: add a new message to disable/enable a specific virt queue.
* b931bfbf vhost-user: add multiple queue support
fc57fd99 vhost: introduce vhost_backend_get_vq_index method
e2051e9e vhost-user: add VHOST_USER_GET_QUEUE_NUM message
dcb10c00 vhost-user: add protocol feature negotiation
7305483a vhost-user: use VHOST_USER_XXX macro for switch statement
d345ed2d Revert "vhost-user: add multi queue support"
830d70db vhost-user: add multi queue support
294ce717 vhost-user: Send VHOST_RESET_OWNER on vhost stop

And these for vhost:

12b8cbac3c8 vhost: don't send RESET_OWNER at stop
25a2a920ddd vhost: set the correct queue index in case of migration with multiqueue
* 15324404f68 vhost: alloc shareable log
2ce68e4cf5b vhost: add vhost_has_free_slot() interface
0cf33fb6b49 virtio-net: correctly drop truncated packets
fc57fd9900d vhost: introduce vhost_backend_get_vq_index method
06c4670ff6d Revert "virtio-net: enable virtio 1.0"
dfb8e184db7 virtio-pci: initial virtio 1.0 support
b1506132001 vhost_net: add version_1 feature
df91055db5c virtio-net: enable virtio 1.0
* 309750fad51 vhost: logs sharing
9718e4ae362 arm_gicv2m: set kvm_gsi_direct_mapping and kvm_msi_via_irqfd_allowed

The vhost-user change is responsible for refactoring the multiple queue support for vhost-user. I'm not entirely sure about this change, in regards to this problem, since they're not using queues=XX in "-netdev" command.

They have changed amount of virtio device queues (virtio) - http://pastebin.ubuntu.com/24087865/ - but not the number of queues for the virtio-net-pci device (vhost-user multi queues, for this example).

Possible causes of such behavior (based on QEMU changes):

- vhost-user multiple queue support refactored
  they are not using "queues=XX" in "-netdev" cmdline
  it could have changed some logic (to check)

- tx queue callbacks scheduling (either timer or qemu aio bottom half)
  this would happen if there wasn't enough context switching
  (for qemu and vhost-user threads). could happen due to lock contention
  or system overload (due to some other change unrelated to virtio).

* raising tx queue size we make the flushes longer in time and that is
  possibly causing a bigger throughput (stopping the queue overrun). this
  tells us that either the buffer is small OR the flush is being called
  less times than it should.
* that is why im focusing on this part. something either reduced buffer
  size or is causing a bottleneck for the buffer flush typical of the
  "burst" behavior", btw.

- There was also a change in vhost logging system:

* vhost-user, commit: 309750fad51

* For live migration they started to log vhost (309750fad51) into
  anonymous pages from malloc() and into anonymous pages from
  memfd_create() OR backed by a file (in specific cases).

* Not sure the log backend is used when there is no live migration
  occurring (causing a lock contention, for example).