Adding up to MAX_VHOST vhost_user sockets breaks openvswitch-dpdk

Bug #1566874 reported by Christian Ehrhardt 
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dpdk (Ubuntu)
Fix Released
openvswitch-dpdk (Ubuntu)

Bug Description

I'm still debugging this, but wanted to report it in the current state in case someone else encounters the same issue.

I was doing limits testing and among others I tried to add all possible vhost_user ports to a openvsiwtch-dpdk bridge.
Doing so always ends up in a crashing vswitch.

It pretends to have an out of memory issue (not the OS, but in the dpdk lib), but I'm not trusting that yet.
The reason is that upping the memory x5 doesn't change a thing.
Changing the number of used rxq per PMD seems to change the number (the less rxq the higher - from 722 to 831, but not 1024).
But the relation is not linear and even keeping the default of one rxq doesn't get me working 1024 ports.

This also kills the old ports
From ovs-vsctl show:
        Port "vhost-user-1"
            Interface "vhost-user-1"
                type: dpdkvhostuser
                error: "could not open network device vhost-user-1 (Unknown error -1)"

Also in the log is for each of them:
ovs-vsctl: Error detected while setting up 'vhost-user-826'. See ovs-vswitchd log for details.

The log in /var/log/openvswitch/ovs-vswitchd.log is VERY verbose
It seems to repeat existing things for every port added
823M logfile in just a few minutes.

I'll attach a snippets of various logfiles and a stack trace.

Next steps:
Keep GDB attached while running look at dpdk_rte_mzalloc -> rte_zmalloc

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Changed in dpdk (Ubuntu):
importance: Undecided → High
Changed in openvswitch-dpdk (Ubuntu):
importance: Undecided → High
Changed in dpdk (Ubuntu):
status: New → Triaged
Changed in openvswitch-dpdk (Ubuntu):
status: New → Triaged
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Seems to be the same as a discussion already started

Joining the discussion there and trying to verify the accepted outcome vs our test.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package dpdk - 2.2.0-0ubuntu7

dpdk (2.2.0-0ubuntu7) xenial; urgency=medium

  * Increase max_map_count after setting huge pages (LP: #1507921):
    - The default config of 65530 would cause issues as soon as about 64GB or
      more are used as 2M huge pages for dpdk.
    - Increase this value to base+2*#hugepages to avoid issues on huge systems.
  * d/p/ubuntu-backport-[28-32,34-35] backports for stability (LP: #1568838):
     - these will be in the 16.04 dpdk release, delta can then be dropped.
     - 5 fixes that do not change api/behaviour but fix serious issues.
        - 01 f82f705b lpm: fix allocation of an existing object
        - 02 f9bd3342 hash: fix multi-process support
        - 03 1aadacb5 hash: fix allocation of an existing object
        - 04 5d7bfb73 hash: fix race condition at creation
        - 05 fe671356 vfio: fix resource leak
        - 06 356445f9 port: fix ring writer buffer overflow
        - 07 52f7a5ae port: fix burst size mask type
  * d/p/ubuntu-backport-33-vhost-user-add-error-handling-for-fd-1023.patch
     - this will likely be in dpdk release 16.07 and delta can then be dropped.
     - fixes a crash on using fd's >1023 (LP: #1566874)
  * d/p/ubuntu-fix-lpm-use-after-free-and-leak.patch fix lpm_free (LP: #1569375)
     - the old patches had an error freeing a pointer which had no meta data
     - that lead to a crash on any lpm_free call
     - folded into the fix that generally covers the lpm allocation and free
       weaknesses already (also there this particular mistake was added)

 -- Christian Ehrhardt <email address hidden> Tue, 12 Apr 2016 16:13:47 +0200

Changed in dpdk (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

not an issue of OVS, marking invalid

Changed in openvswitch-dpdk (Ubuntu):
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers