Comment 3 for bug 1066055

Revision history for this message
Stefan Hajnoczi (stefanha) wrote : Re: [Qemu-devel] [Bug 1066055] Re: Network performance regression with vde_switch

On Mon, Oct 15, 2012 at 09:46:06PM -0000, Edivaldo de Araujo Pereira wrote:
> Hi Stefan,
>
> Thank you, very much for taking the time to help me, and excuse me for
> not seeing your answer early...
>
> I've run the procedure you pointed me out, and the result is:
>
> 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f is the first bad commit
> commit 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f
> Author: Amit Shah <email address hidden>
> Date: Tue Sep 25 00:05:15 2012 +0530
>
> virtio: Introduce virtqueue_get_avail_bytes()
>
> The current virtqueue_avail_bytes() is oddly named, and checks if a
> particular number of bytes are available in a vq. A better API is to
> fetch the number of bytes available in the vq, and let the caller do
> what's interesting with the numbers.
>
> Introduce virtqueue_get_avail_bytes(), which returns the number of bytes
> for buffers marked for both, in as well as out. virtqueue_avail_bytes()
> is made a wrapper over this new function.
>
> Signed-off-by: Amit Shah <email address hidden>
> Signed-off-by: Michael S. Tsirkin <email address hidden>
>
> :040000 040000 1a58b06a228651cf844621d9ee2f49b525e36c93
> e09ea66ce7f6874921670b6aeab5bea921a5227d M hw
>
> I tried to revert that patch in the latest version, but it obviously
> didnt work; I'm trying to figure out the problem, but I don't know very
> well the souce code, so I think it's going to take some time. For now,
> it's all I could do.

After git-bisect(1) completes it is good to sanity-check the result by
manually testing 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f^ (the commit
just before the bad commit) and 0d8d7690850eb0cf2b2b60933cf47669a6b6f18f
(the bad commit).

This will verify that the commit indeed introduces the regression. I
suggest doing this just to be sure that you've found the bad commit.

Regarding this commit, I notice two things:

1. We will now loop over all vring descriptors because we calculate the
   total in/out length instead of returning early as soon as we see
   there is enough space. Maybe this makes a difference, although I'm a
   little surprised you see such a huge regression.

2. The comparision semantics have changed from:

     (in_total += vring_desc_len(desc_pa, i)) >= in_bytes

   to:

     (in_bytes && in_bytes < in_total)

   Notice that virtqueue_avail_bytes() now returns 0 when in_bytes ==
   in_total. Previously, it would return 1. Perhaps we are starving or
   delaying I/O due to this comparison change. You can easily change
   '<' to '<=' to see if it fixes the issue.

Stefan