[dpdk]OpenvSwitch pmd threads fail to start due to incorrect cpu pinning if host has more than 1 NUMA node

Bug #1584006 reported by Mikhail Chernik
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Arthur Svechnikov
Mitaka
Fix Released
High
Arthur Svechnikov
Newton
Fix Committed
High
Arthur Svechnikov

Bug Description

Environment: MOS 9.0 ISO 370, hardware lab

Steps to reproduce:
1. Create environment, add a host with >1 NUMA nodes as compute
2. Configure hugepages and DPDK CPU pinning (min 4096 2M hugepages, at least one CPU pinned for DPDK)
3. Turn on DPDK on one interface and move Private network to this interface, deploy cluster
4. Check OVS process CPU utilization and threads on compute, e.g. with "top -n 1 -bH -p `pgrep ovs-vswitchd`"

Expected result:
There are OVS threads with name pmdXX, which fully utilize 1 CPU core each

Actual result:
No pmd threads, error message in /var/log/openwswitch/ovs-vswitchd.log:
2016-05-20T02:51:21.404Z|00021|dpif_netdev|ERR|Cannot create pmd threads due to out of unpinned cores on numa node

Additional information:

PMD threads are successfully started, if NIC and all cores in pmd-cpu-mask are on same NUMA node

Additionally, format of pmd-cpu-mask causes a warning in /var/log/openwswitch/ovs-vswitchd.log:
2016-05-20T02:51:20.779Z|00018|ovs_numa|WARN|Invalid cpu mask: x

Diagnostic snapshot: http://mos-scale-share.mirantis.com/fuel-snapshot-2016-05-20_08-51-41.tar.xz

Revision history for this message
Mikhail Chernik (mchernik) wrote :
Revision history for this message
Mikhail Chernik (mchernik) wrote :
Dmitry Klenov (dklenov)
tags: added: area-python
description: updated
summary: [dpdk]OpenvSwitch pmd threads fail to start due to incorrect cpu pinning
- if hoast has more than 1 NUMA node
+ if host has more than 1 NUMA node
Revision history for this message
Mikhail Chernik (mchernik) wrote :

To sum up:
There are 2 CPU masks for OVS+DPDK in astute.yaml:
1) ovs_core_mask, which is populated to /etc/default/openvswitch-switch ( -c 0xXXX )
2) ovs_pmd_core_mask, shich is populated to OVS database ( get Open_vSwitch . other_config:pmd-cpu-mask ). It should be without leading 0x

For successful operation both parameters should mask CPU cores from NUMA node to which NIC is attached.

Revision history for this message
Atsuko Ito (yottatsa) wrote :

It's basically correct. There is full summary for further understanding and docs team.

Cores for PMD processes SHOULD be from NUMA/Cluster*, where the NICs is located, minimum 1 PMD per NUMA/Cluster. Additional cores MAY be scheduled from other NUMA/Clusters*, where NICs or instances located (see perf notes**).

Core for ovs core process should be from one of NUMA/Cluster*, where PMD is scheduled.
Memory SHOULD be allocated on NUMA, where PMD is scheduled.

* Cluster is used when box is in cluster-on-die mode, cluster == socket.

** For performance, more than 1 PMD per NUMA could be scheduled. For VM-to-wire it should be NUMA/cluster with NICs, for VM-to-VM inside box it should be NUMA where the instances is running. Rule of thumb: 1 PMD could process 3 Mpps of traffic. n-dpdk-rxqs should be adjusted to number of PMD per NIC.

E.g. to utilize 10GigE interface bidirectionally we need 12 Mpps each direction, so we need 8 PMD (8PMD * 3 Mpps = 12 Mpps * 2 (in/out)). ovs-vsctl set Open_vSwitch . other_config:n-dpdk-rxqs=8.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/322212

Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
assignee: Arthur Svechnikov (asvechnikov) → Fedor Zhadaev (fzhadaev)
Changed in fuel:
assignee: Fedor Zhadaev (fzhadaev) → Arthur Svechnikov (asvechnikov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/322212
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=76e270ef966dd7735eac3e87f94bb0a39e49388c
Submitter: Jenkins
Branch: master

commit 76e270ef966dd7735eac3e87f94bb0a39e49388c
Author: Artur Svechnikov <email address hidden>
Date: Fri May 27 17:43:24 2016 +0300

    Change CPU distribution

    CPU distribution mechanism should be changed due
    to incorect requirements to nova and dpdk CPUs allocation

    Changes:
     * Change CPU distribution
     * Add function for recognizing DPDK NICs for node
     * Remove requirement of enabled hugepages for
       DPDK NICs (it's checked before deployment)
     * Change HugePages distribution. Now it take into
       account Nova CPUs placement

    Requirements Before:
     DPDK's CPUs should be located on the same NUMAs as
     Nova CPUs

    Requirements Now:
     1. DPDK component CPU pinning has two parts:
         * OVS pmd core CPUs - These CPUs must be placed on the
           NUMAs where DPDK NIC is located. Since DPDK NIC can
           handle about 12 Mpps/s and 1 CPU can handle about
           3 Mpps/s there is no necessity to place more than
           4 CPUs per NIC. Let's name all remained CPUs as
           additional CPUs.
         * OVS Core CPUs - 1 CPU is enough and that CPU should
           be taken from any NUMA where at least 1 OVS pmd core
           CPU is located

     2. To improve Nova and DPDK performance, all additional CPUs
        should be distributed along with Nova's CPUs as
        OVS pmd core CPUs.

    Change-Id: Ib2adf39c36b2e1536bb02b07fd8b5af50e3744b2
    Closes-Bug: #1584006

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326392

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/326392
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=260b0b8f99bdfeb784be1a0b7374cd284d3b68e9
Submitter: Jenkins
Branch: stable/mitaka

commit 260b0b8f99bdfeb784be1a0b7374cd284d3b68e9
Author: Artur Svechnikov <email address hidden>
Date: Fri May 27 17:43:24 2016 +0300

    Change CPU distribution

    CPU distribution mechanism should be changed due
    to incorect requirements to nova and dpdk CPUs allocation

    Changes:
     * Change CPU distribution
     * Add function for recognizing DPDK NICs for node
     * Remove requirement of enabled hugepages for
       DPDK NICs (it's checked before deployment)
     * Change HugePages distribution. Now it take into
       account Nova CPUs placement

    Requirements Before:
     DPDK's CPUs should be located on the same NUMAs as
     Nova CPUs

    Requirements Now:
     1. DPDK component CPU pinning has two parts:
         * OVS pmd core CPUs - These CPUs must be placed on the
           NUMAs where DPDK NIC is located. Since DPDK NIC can
           handle about 12 Mpps/s and 1 CPU can handle about
           3 Mpps/s there is no necessity to place more than
           4 CPUs per NIC. Let's name all remained CPUs as
           additional CPUs.
         * OVS Core CPUs - 1 CPU is enough and that CPU should
           be taken from any NUMA where at least 1 OVS pmd core
           CPU is located

     2. To improve Nova and DPDK performance, all additional CPUs
        should be distributed along with Nova's CPUs as
        OVS pmd core CPUs.

    Change-Id: Ib2adf39c36b2e1536bb02b07fd8b5af50e3744b2
    Closes-Bug: #1584006
    (cherry picked from commit 76e270ef966dd7735eac3e87f94bb0a39e49388c)

Revision history for this message
Sergii (sgudz) wrote :

Verified on MOS9.0 ISO 459. Fixed

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-qa (master)

Change abandoned by Vladimir Khlyunev (<email address hidden>) on branch: master
Review: https://review.openstack.org/320932
Reason: 8 month ago; also this test scenario was included to multiqueue tests

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.