Comment 26 for bug 1720126

Revision history for this message
Nish Aravamudan (nacc) wrote : Re: [Bug 1720126] Re: [ip link] Message truncated error for large number of passthrough VFs

On 19.10.2017 [09:35:19 -0000], Jan Gutter wrote:
> @nacc
>
> Thanks so much for the explanation. I also found
> https://wiki.ubuntu.com/ServerTeam/KnowledgeBase#Merge_Proposals_and_Reviewing
> that details a bit more of the internal processes. As relative outsiders
> to the Ubuntu process, I'd appreciate it very much if you could handle
> that part for Monique's patches. I can be on hand to answer technical
> questions if required.

And to be clear, the MP based workflow for the Git trees is brand new
and experimental :)

I'm happy to integrate the updated debdiffs (I'll reply to those
comments directly).

> Regarding the buffer size choice, it's very arbitrary as Phil said. I'm
> pretty sure we came to the same conclusion independently (libvirt and
> libnl had very similar issues) and the workaround is obvious. 32k seems
> to work for 64 VF's (our test case), but breaks with 128 VF's. Not a lot
> of machines can handle 128 concurrent VF's. I typed 64k "just because".
> libvirt+libnl allow message peeking. However, iproute2 uses netlink
> directly. So, implementing a similar idea would require an entirely new
> receive codepath with all the fun of finding out where new exception
> paths occur: something to be done on tip and not suitable for backport
> without thorough vetting.

Absolutely. My concern is the upstream code is at 32k as is Artful. I'm
hesitant to backport something different (64k) to X and T without also
ensuring Artful gets it (and BB when it opens), and presumably also
fixing it upstream.

So I see two routes forward:

1) File an upstream issue to request they bump to 64k, as you note 32k
is insufficient for 128 VFs. Link to that issue in this bug and we'll
fix AA, X and T with the suggested change (presuming upstream acks it).

2) Backport the upstream change as-is to X and T (AA already has the
necessary fix). This will be faster, of course, but does mean the 128 VF
case is broken. Given that it is less likely to be hit in the field,
perhaps that is ok -- and in the meanwhile, upstream can work on a
proper fix which, when available, we can backport accordingly (or decide
at that point, in any case).

I prefer 2), because I do not like diverging from upstream (or at least
not without an upstream bug report). If you and Monique are ok with 2),
I can update the debdiffs before sponsoring them.

> I'm sure it'll save a lot of time once the kinks have been worked out of
> the automation, backports are quite the double-edged sword.

Definitely :)