DPDK erros after stop and start compute with DPDK

Bug #1652937 reported by Kristina Berezovskaia
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Mikhail Zhnichkov
Mitaka
Fix Released
High
Mikhail Zhnichkov
Newton
Fix Committed
High
Mikhail Zhnichkov
Ocata
Fix Committed
High
Mikhail Zhnichkov

Bug Description

After stop and start compute node vms connectivity stopped for vms on this computer with errors in ovs logs
The same error I can find here^ https://bugs.launchpad.net/networking-ovs-dpdk/+bug/1501449

Steps to reproduce:
1) DPDK vxlan env
2) Create 1 vm on the first compute (flavor with hp)
3) Create 1 vm on another compute (flavor with hp)
4) Stop 1st compute
5) Wait some time
6) Start compute
7) Try ping vm on this compute
Expected results: ping is available
Actual result: ping is unavailable
8) Boot new vm on the same compute
9) Try ping new vm on this compute
Expected results: ping is available
Actual result: ping is unavailable

In ovs log on compute we have error:
ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Tunneling can't be enabled with invalid local_ip '10.109.4.5'. IP couldn't be found on this host's interfaces

Huge pages after deleting all vms shows that no hp are allocated for DPDK
node-4.test.domain.local: (health node)
    1048576: {'free': 0, 'total': 0},
    2048: {'free': 18000, 'total': 19024}
node-3.test.domain.local: (stop and start node)
    1048576: {'free': 0, 'total': 0},
    2048: {'free': 19024, 'total': 19024}

Description of the environment:
SNAPSHOT_ID='#688
3 controllers, 2 computes+ceph (18000 2M HP, 1024 for DPDK)

Tags: area-library
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
description: updated
Revision history for this message
Mikhail Chernik (mchernik) wrote :

It seems that pinning some CPUs for DPDK is required, otherwise OVS+DPDK will not start after successful deployment.

This case is not validated in nailgun, so currently ovs+dpdk without pinned cpus is an acceptable configuration.

Dmitry Klenov (dklenov)
tags: added: area-python
Changed in mos:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
status: New → Confirmed
Revision history for this message
Mikhail Chernik (mchernik) wrote :

Just to clarify comment #2:
the root cause of the issue is that options pmd-cpu-mask and dpdk-lcore-mask were not set for OVS.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

The OVS agent log makes it look like the issue here is that the compute node was configured with an incorrect 'local_ip' value to use for its tunnel termination.

@Mikhail, how did you determine this is related to pinning?

Revision history for this message
Sergey Matov (smatov) wrote :

@Kevin for OVS+DPDK tunneling interface must be configured in system. For VxLAN with OVS+DPDK in MOS local_ip used for tunneling should be configured on br-mesh internal OVS interface. However process can't reach this stage since OVS daemon does not goes up after rebooting. For inactive OVS br-mesh can't be configured so that's why Neutron agent drops error.

Revision history for this message
Kevin Benton (kevinbenton) wrote :

Ah, so the pinning not being set was preventing it from booting up with the IP configured?

Revision history for this message
Sergey Matov (smatov) wrote :

Correct. openvswitch start script were awaiting for configuration, however empty set of parameters were used.

Changed in mos:
assignee: Fuel Sustaining (fuel-sustaining-team) → Vladimir Eremin (yottatsa)
no longer affects: mos
Revision history for this message
Atsuko Ito (yottatsa) wrote :

Looks like the root cause of this problem is because OVS could not be started if huge is not init'd and mounted and iface modules is not loaded.

In 9.0, boot ordering was ensured by first line here https://review.openstack.org/#/c/408024/5/deployment/puppet/l23network/templates/openvswitch_default_Debian.erb

Revision history for this message
Atsuko Ito (yottatsa) wrote :

Confirmed on lab, this one-line patch on production fixes problem:

    root@node-13:~# diff -u /etc/default/openvswitch-switch{.bak,}
    --- /etc/default/openvswitch-switch.bak 2017-01-10 18:02:20.292308367 +0000
    +++ /etc/default/openvswitch-switch 2017-01-10 17:50:29.543437162 +0000
    @@ -1,3 +1,4 @@
    +/etc/init.d/dpdk start
     # This is a POSIX shell fragment -*- sh -*-

fuel-library should be fixed by partially reverting this https://review.openstack.org/#/c/408024/5/deployment/puppet/l23network/templates/openvswitch_default_Debian.erb

tags: added: area-library
removed: area-python
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/418821

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/418931

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/418936

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/418821
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=f8e27d9323cf94f306406c707abb449974606461
Submitter: Jenkins
Branch: master

commit f8e27d9323cf94f306406c707abb449974606461
Author: Mikhail <email address hidden>
Date: Wed Jan 11 14:17:20 2017 +0400

    Start dpdk before ovs

    Change-Id: Ie69e95bcdf811bb584cb5c1d1018296c82ad47f3
    Closes-Bug: #1652937

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/418936
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=c52b4b09b725b60397f76f9b485a0f4bc9304f5c
Submitter: Jenkins
Branch: stable/mitaka

commit c52b4b09b725b60397f76f9b485a0f4bc9304f5c
Author: Mikhail <email address hidden>
Date: Wed Jan 11 14:17:20 2017 +0400

    Start dpdk before ovs

    Change-Id: Ie69e95bcdf811bb584cb5c1d1018296c82ad47f3
    Closes-Bug: #1652937

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/newton)

Reviewed: https://review.openstack.org/418931
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=17052f4fae15b62cab7264857a05ec4e78732f19
Submitter: Jenkins
Branch: stable/newton

commit 17052f4fae15b62cab7264857a05ec4e78732f19
Author: Mikhail <email address hidden>
Date: Wed Jan 11 14:17:20 2017 +0400

    Start dpdk before ovs

    Change-Id: Ie69e95bcdf811bb584cb5c1d1018296c82ad47f3
    Closes-Bug: #1652937

tags: added: on-verification
Revision history for this message
Sergey Novikov (snovikov) wrote :

Verified on 9.2 snapshot #804

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 11.0.0.0rc1

This issue was fixed in the openstack/fuel-library 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.