Adding/removing ports leaks memory in dpdk

Bug #1570466 reported by Christian Ehrhardt  on 2016-04-14
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dpdk (Ubuntu)
High
Unassigned
Xenial
Undecided
Unassigned

Bug Description

I set up a nicely working OVS-DPDK with KVM Guests.
Then in a loop I
   - add up to 512 ports
   - test connectivity
   - remove up to 512 ports

The effective error is like the issues discussed upstream about >1023 file descriptors - OVS-DPDK crashes due to that.
A lot in the log afterwards are just follow on issues.

But the problem of this bug is that there has to be some sort of leak of file descriptors (or something else depending on analysis).
Because one would expect that adding/removing the same amount of ports should not exceed any "more" of a limit than in the first iteration.

Of 12 planned iterations in 2 of 2 tries I now always failed at iteration 6.
So I hope it is rather reproducible and will start debugging it.
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
DistroRelease: Ubuntu 16.04
Package: openvswitch-dpdk
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 4.4.0-18.34-generic 4.4.6
Tags: xenial uec-images
Uname: Linux 4.4.0-18-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: kvm libvirtd
_MarkForUpload: True
modified.conffile..etc.dpdk.dpdk.conf: [modified]
modified.conffile..etc.dpdk.interfaces: [modified]
mtime.conffile..etc.dpdk.dpdk.conf: 2016-04-14T13:58:58.618313
mtime.conffile..etc.dpdk.interfaces: 2016-04-14T13:58:58.766313

Changed in dpdk (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in openvswitch-dpdk (Ubuntu):
status: New → Triaged
importance: Undecided → High
assignee: nobody → ChristianEhrhardt (paelzer)

Further analysis indicates more a memory leak or issues of allocation due to fragmentation.
Adding report...

tags: added: apport-collected uec-images xenial
description: updated

apport information

apport information

apport-collect didn't like me, attaching the retrace manually

Changed in dpdk (Ubuntu):
importance: Medium → High
summary: - Adding removing/ports leaks file descriptors
+ Adding removing/ports leaks memory
summary: - Adding removing/ports leaks memory
+ Adding removing/ports leaks memory in dpdk
no longer affects: openvswitch-dpdk (Ubuntu)
summary: - Adding removing/ports leaks memory in dpdk
+ Adding/removing ports leaks memory in dpdk

Discussion went on and we identified two different issues.
1. DPDK leaking on removal of used vhost_user ports
2. Openvswitch leaking on netdev_dpdk_vhost_destruct

I'll keep #1 handled in this bug and work with upstream on a fix to be backported for our DPDK.
#2 got a but on its own => 1573091

FYI - #1 is waiting for this discussion to complete and will backport the fix.
https://<email address hidden>/msg60552.html

Xenial is released, so we are back in SRU mode.
Therefore I add the matching SRU Template for the upload of 2.2.0ubuntu8 which is in the unapproved queue atm.

[Impact]

 * without the fix removing guests that were connected to vhost_user ports leaked memory. Thereby a longer running server wich spawned/removed guests over time crashed (openvswitch-dpdk)
 * fix is a backport from upstream to free the device on removal of the active port

[Test Case]

 * create a vhost_user socket in openvswitch-dpdk
 * attach a guest
 * detach the guest
 * loop the three steps above

[Regression Potential]

 * It only frees memory that otherwise would leak and the patch is upstream as well in the meantime (should be pulled fromt he vhost maintainer into main 16.07 soon as well)
 * In testing I found an issue freeing the wrong port if it was uninitialized and improved the patch together with the maintainer
 * passed ADT tests on i368/amd64/amd64-lowmem and our full CI (https://code.launchpad.net/~ubuntu-server/ubuntu/+source/dpdk-testing/+git/dpdk-testing)

Martin Pitt (pitti) wrote :

Christian, please upload the packge to yakkety too, otherwise this cannot progress to xenial-updates.

Changed in dpdk (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed

Hello ChristianEhrhardt, or anyone else affected,

Accepted dpdk into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/dpdk/2.2.0-0ubuntu8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

FYI - Verified in Proposed.

Next I need to prep some Y tests to reasonably request an upload to Yakkety to allow migration as Martin indicated.

tags: added: verification-done-xenial
tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 2.2.0-0ubuntu9

---------------
dpdk (2.2.0-0ubuntu9) yakkety; urgency=medium

  * d/p/ubuntu-backport-[36-37] fix virtio issues (LP: #1570195):
    - don't let DPDK initialize virtio devices still in use by the kernel
    - this avoids conflicts between kernel and dpdk usage of those devices
    - an admin now has to unbind/bind devices as on physical hardware
    - this is in the dpdk 16.04 release and delta can then be dropped
    - d/dpdk-doc.README.Debian update for changes in virtio-pci handling
    - d/dpdk.interfaces update for changes in virtio-pci handling
  * d/p/ubuntu-backport-38... fix for memory leak (LP: #1570466):
    - call vhost_destroy_device on removing vhost user ports to fix memory leak
    - this likely is in the dpdk 16.07 release and delta can then be dropped
  * d/p/ubuntu-fix-vhost-user-socket-permission.patch fox (LP: #1546565):
    - when vhost_user sockets are created they are owner:group of the process
    - the DPDK api to create those has no way to specify owner:group
    - to fix that without breaking the API and potential workaround code in
      consumers of the library like openvswitch 2.6 for example. This patch
      adds an EAL commandline option to specify user:group created vhost_user
      sockets should have.

 -- Christian Ehrhardt <email address hidden> Wed, 27 Apr 2016 07:52:48 -0500

Changed in dpdk (Ubuntu):
status: Triaged → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 2.2.0-0ubuntu8

---------------
dpdk (2.2.0-0ubuntu8) xenial; urgency=medium

  * d/p/ubuntu-backport-[36-37] fix virtio issues (LP: #1570195):
    - don't let DPDK initialize virtio devices still in use by the kernel
    - this avoids conflicts between kernel and dpdk usage of those devices
    - an admin now has to unbind/bind devices as on physical hardware
    - this is in the dpdk 16.04 release and delta can then be dropped
    - d/dpdk-doc.README.Debian update for changes in virtio-pci handling
    - d/dpdk.interfaces update for changes in virtio-pci handling
  * d/p/ubuntu-backport-38... fix for memory leak (LP: #1570466):
    - call vhost_destroy_device on removing vhost user ports to fix memory leak
    - this likely is in the dpdk 16.07 release and delta can then be dropped
  * d/p/ubuntu-fix-vhost-user-socket-permission.patch fox (LP: #1546565):
    - when vhost_user sockets are created they are owner:group of the process
    - the DPDK api to create those has no way to specify owner:group
    - to fix that without breaking the API and potential workaround code in
      consumers of the library like openvswitch 2.6 for example. This patch
      adds an EAL commandline option to specify user:group created vhost_user
      sockets should have.

 -- Christian Ehrhardt <email address hidden> Mon, 25 Apr 2016 11:42:40 +0200

Changed in dpdk (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for dpdk has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers