[ip link] Message truncated error for large number of passthrough VFs

Bug #1720126 reported by Pieter Malan on 2017-09-28
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
iproute2 (CentOS)
Unknown
Unknown
iproute2 (Ubuntu)
Undecided
Unassigned
Trusty
Undecided
Unassigned
Xenial
High
Unassigned
Zesty
Undecided
Unassigned

Bug Description

[Impact]

When querying a Physical Function netdev with a large amount of VF's (more than 30), the resulting return message can overflow the 16K netlink message buffer.

This can be fixed by enabling message peeking on the socket and resizing the buffer on receive, or by simply enlarging the receive buffer.

Since there's an upper limit to the number of VF's per PF, it's relatively sane to just enlarge the receive buffer. Please see the attached patch.

[Test Case]

# Set up 60 VF's on an SR-IOV device
ip link show > /dev/null

Observe the following:
Message truncated
Message truncated
Message truncated

[Regression Potential]

1) Applications relying on the broken behaviour will need to be updated, but it would be a really dubious use case.
2) Increasing the rx buffer size increases the memory footprint (but realistically, this is tiny).
3) Extra processing time is now needed to parse the larger buffer, in the case that a call to "ip link" is on the critical time path of an application, (called multiple times in a tight loop, for example), it would affect load.

[Other Info]

Observed on Ubuntu kernel 4.4.0-93-generic on both 14.04 and 16.04

=====================================================================================================
Ubuntu16 system

stack@cluster04:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

stack@cluster04:~$ uname -r
4.4.0-93-generic

stack@cluster04:~$ apt-cache policy iproute2
iproute2:
  Installed: 4.3.0-1ubuntu3.16.04.1
Version table:
*** 4.3.0-1ubuntu3.16.04.1 500
        500 http://us.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
=================================================================================================

Ubuntu14 system:
root@boomslang:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

root@boomslang:~# uname -r
4.4.0-96-generic

root@boomslang:~# apt-cache policy iproute2
iproute2:
  Installed: 3.12.0-2ubuntu1
  Version table:
 *** 3.12.0-2ubuntu1 0
        500 http://za.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages

Related branches

Jan Gutter (jangutter) wrote :

A simple one-line patch seems to solve the issue for me:

Index: iproute2-4.3.0/lib/libnetlink.c
===================================================================
--- iproute2-4.3.0.orig/lib/libnetlink.c
+++ iproute2-4.3.0/lib/libnetlink.c
@@ -202,7 +202,7 @@ int rtnl_dump_filter_l(struct rtnl_handl
   .msg_iov = &iov,
   .msg_iovlen = 1,
  };
- char buf[16384];
+ char buf[65536];
  int dump_intr = 0;

  iov.iov_base = buf;

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in iproute2 (Ubuntu):
status: New → Confirmed
Joshua Powers (powersj) on 2017-10-02
Changed in iproute2 (Ubuntu):
importance: Undecided → High
Jan Gutter (jangutter) wrote :

I'm unfamiliar with where to submit the fix for this bug. Am I submitting the fix to the wrong forum?

Jan Gutter (jangutter) wrote :

This bug has already been fixed upstream in the following commit:

https://anonscm.debian.org/git/collab-maint/pkg-iproute.git/commit/?id=72b365e8e0fd5efe1d5c05d04c25950736635cfb

This commit happened between: tags debian/4.3.0-1 and debian/4.6.0-1

Jan Gutter (jangutter) wrote :

This fix has been pulled into CentOS 7.3 and later

The attachment "[PATCH] Fix "Message truncated" issue with many VF's" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
description: updated
description: updated
description: updated
Brian Murray (brian-murray) wrote :

Is this then fixed in Ubuntu 17.04 which has iproute2 version 4.9.0-1ubuntu1?

Jan Gutter (jangutter) wrote :

Yes, I've confirmed it's fixed there.

I build it from the .dsc, and didn't see "Message truncated".

Then, just to make sure, I reverted the patch (changing the buf size to 16384) and the bug was back.

So at least there's been some use in real world of the patch, I hope...

What's the path of least resistance here? Backporting to iproute2's stable branch, or applying the patch downstream at Ubuntu?

Nish Aravamudan (nacc) on 2017-10-12
Changed in iproute2 (Ubuntu):
status: Confirmed → Fix Released
Changed in iproute2 (Ubuntu Xenial):
status: New → Confirmed
Changed in iproute2 (Ubuntu Zesty):
status: New → Fix Released
Changed in iproute2 (Ubuntu Xenial):
importance: Undecided → High
Changed in iproute2 (Ubuntu):
importance: High → Undecided
Nish Aravamudan (nacc) wrote :

@jangutter: I'm not sure why you built from source? I assume you actually tested the version in 17.04 and it worked.

The proper solution is to backport the Debian change to xenial and trusty, most likely. A git repository that can be used with `git-ubuntu` (sudo snap install --classic git-ubuntu`) is available here: https://code.launchpad.net/~usd-import-team/ubuntu/+source/iproute2/+git/iproute2

Feel free to propose MPs against ubuntu/xenial-devel and ubuntu/trusty-devel, or I can do it if you'd prefer.

This is a debdiff for iproute2 applicable to 3.12.0-2ubuntu1. I built this in pbuilder
and it builds successfully, and I installed it, the patch works as intended.

Jan Gutter (jangutter) wrote :

@nacc I built from source to verify that the one-liner is directly responsible for fixing and breaking the issue (inherent paranoia). I did test with the binaries and they worked.

Apologies, I'm unfamiliar with the Ubuntu SRU process as you can probably see. What exactly is an "MP" and how would one go about to propose one?

I'm aware of the need of testing bugfixes like these, I'm not familiar with your release pipeline, however.

This is a debdiff for iproute2 applicable to 4.3.0-1ubuntu3. I built this in pbuilder
and it builds successfully, and I installed it, the patch works as intended.

On 17.10.2017 [08:56:17 -0000], Jan Gutter wrote:
> @nacc I built from source to verify that the one-liner is directly
> responsible for fixing and breaking the issue (inherent paranoia). I did
> test with the binaries and they worked.

Ah ok, yeah -- I guess that's reasonable, and is a good preemptive test,
but given that it needs to be backported to a different release, feels a
bit like busywork (the same time could have been spent building it for
xenial :)

> Apologies, I'm unfamiliar with the Ubuntu SRU process as you can
> probably see. What exactly is an "MP" and how would one go about to
> propose one?

MP = Merge Proposal. Roughly like GitHub's Pull Requests (PR), except
less formal with Git (bzr I think is somewhat more first-class in
Launchpad).

> I'm aware of the need of testing bugfixes like these, I'm not familiar
> with your release pipeline, however.

http://www.justgohome.co.uk/blog/2017/07/developing-ubuntu-using-git.html

may help a bit. Roughly:

$ sudo snap install --classic git-ubuntu
$ git ubuntu clone iproute2
$ cd iproute2
$ ... make and commit changes
$ git ubuntu submit

Now, the issue is that middle bit, where you have to know a bit about
source packaging. That is, simply cherry-picking the upstream/Debian fix
is not quite right, as you need to change it into a Quilt patch and then
insert a changelog entry.

We are currently developing a fix for that so you can just do a

$ git ubuntu remote add debian
$ git cherry-pick 72b365e83
$ git ubuntu build-source

And it should spit out a commit that has the quiltify'd and
changelogify'd result that you can use as a base or to submit.

Feel free to find me on IRC if you want some more pointers.

In the meanwhile, I'll try and look at Monique's debdiffs this week.

-Nish

Nish Aravamudan (nacc) wrote :

@mvandenberg,

Couple of nits in your debdiffs.

1) changelogs are targetting UNRELEASED, please update to xenial and trusty respectively.

2) Is there a reason your fix is different than the fix upstream/Debian? Does Debian need your version instead?

3) Please use appropriate DEP3 headers (dpkg-source --commit creates a template patch with them): http://dep.debian.net/deps/dep3/, especially Origin (not Origin/Author).

4) The actual author appears to be Phil Sutter, not Jan Gutter, although Jan authored the patch in this bug. Consider using dep3changelog once you have a quilt patch to generate the changelog in the standard format (with an attribution to Phil or Jan as you decide).

5) The trusty version number seems off, it should be 3.12.0-2ubuntu1.1. Please take a look at https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging.

Can you fix these up and reupload the debdiffs?

Thanks,
Nish

Jan Gutter (jangutter) wrote :

@nacc

Thanks so much for the explanation. I also found https://wiki.ubuntu.com/ServerTeam/KnowledgeBase#Merge_Proposals_and_Reviewing that details a bit more of the internal processes. As relative outsiders to the Ubuntu process, I'd appreciate it very much if you could handle that part for Monique's patches. I can be on hand to answer technical questions if required.

Regarding the buffer size choice, it's very arbitrary as Phil said. I'm pretty sure we came to the same conclusion independently (libvirt and libnl had very similar issues) and the workaround is obvious. 32k seems to work for 64 VF's (our test case), but breaks with 128 VF's. Not a lot of machines can handle 128 concurrent VF's. I typed 64k "just because". libvirt+libnl allow message peeking. However, iproute2 uses netlink directly. So, implementing a similar idea would require an entirely new receive codepath with all the fun of finding out where new exception paths occur: something to be done on tip and not suitable for backport without thorough vetting.

I'm sure it'll save a lot of time once the kinks have been worked out of the automation, backports are quite the double-edged sword.

Nish Aravamudan (nacc) wrote :

On 19.10.2017 [09:35:19 -0000], Jan Gutter wrote:
> @nacc
>
> Thanks so much for the explanation. I also found
> https://wiki.ubuntu.com/ServerTeam/KnowledgeBase#Merge_Proposals_and_Reviewing
> that details a bit more of the internal processes. As relative outsiders
> to the Ubuntu process, I'd appreciate it very much if you could handle
> that part for Monique's patches. I can be on hand to answer technical
> questions if required.

And to be clear, the MP based workflow for the Git trees is brand new
and experimental :)

I'm happy to integrate the updated debdiffs (I'll reply to those
comments directly).

> Regarding the buffer size choice, it's very arbitrary as Phil said. I'm
> pretty sure we came to the same conclusion independently (libvirt and
> libnl had very similar issues) and the workaround is obvious. 32k seems
> to work for 64 VF's (our test case), but breaks with 128 VF's. Not a lot
> of machines can handle 128 concurrent VF's. I typed 64k "just because".
> libvirt+libnl allow message peeking. However, iproute2 uses netlink
> directly. So, implementing a similar idea would require an entirely new
> receive codepath with all the fun of finding out where new exception
> paths occur: something to be done on tip and not suitable for backport
> without thorough vetting.

Absolutely. My concern is the upstream code is at 32k as is Artful. I'm
hesitant to backport something different (64k) to X and T without also
ensuring Artful gets it (and BB when it opens), and presumably also
fixing it upstream.

So I see two routes forward:

1) File an upstream issue to request they bump to 64k, as you note 32k
is insufficient for 128 VFs. Link to that issue in this bug and we'll
fix AA, X and T with the suggested change (presuming upstream acks it).

2) Backport the upstream change as-is to X and T (AA already has the
necessary fix). This will be faster, of course, but does mean the 128 VF
case is broken. Given that it is less likely to be hit in the field,
perhaps that is ok -- and in the meanwhile, upstream can work on a
proper fix which, when available, we can backport accordingly (or decide
at that point, in any case).

I prefer 2), because I do not like diverging from upstream (or at least
not without an upstream bug report). If you and Monique are ok with 2),
I can update the debdiffs before sponsoring them.

> I'm sure it'll save a lot of time once the kinks have been worked out of
> the automation, backports are quite the double-edged sword.

Definitely :)

Nish Aravamudan (nacc) wrote :

I have set up two MPs with the adjustments (it looks like Monique's latest debdiffs followed path 2) from my previous comment already) to the DEP3 headers that I think make the most sense. Please take a look at them and if you approve the changes I will upload them.

I note also there are a number of statically sized buffers in the iproute2 code -- is there any concern about other buffers overflowing?

Jan Gutter (jangutter) wrote :

I concur with option 2), unnecessary deviation will just cause confusion.

Regarding the other buffer sizes, the last time I looked they were mostly OK. The issue reared its head in this particular case because the netlink message that previously had a pretty constant per-netdev response size suddenly had the ability to balloon with "no warning". A number of workarounds exist (i.e. you have to explicitly ask for the VF info), but, in this case we actually want the VF info and iproute2 was just unprepared for the size of it.

I guess the core issue is that it's entirely possible for the kernel to add extra netlink attributes to any query response, iproute2 makes the assumption that the queries it's making is not necessarily going to explode with gigabytes of new annotations and 16k will easily fit any current real-world system. A pragmatic approach would probably be to handle the "Message Truncated" path with a dynamically sized buffer as an exceptional case.

Any fix in iproute2 that "properly" addresses this issue has to be carefully vetted. Who knows how many inherent races will get exposed if the ip command doubles in execution time.

Jan Gutter (jangutter) wrote :

I had a look at the two proposals and could not spot any obvious mistakes:

- the correct upstream git commit has been cherry-picked
- I don't have any objections to attribution or log messages

Thanks again for shepherding this one through!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.