nova-compute will try to re-plug the vif even if it exists for vhostuser port.

Bug #1670628 reported by wangyalei
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
os-vif
Fix Committed
High
Sahid Orentino
Pike
Fix Committed
High
Matt Riedemann
Queens
Fix Committed
High
Sahid Orentino

Bug Description

Description
===========
In mitaka version, deploy neutron with ovs-dpdk.
If we stop ovs-agent, then re-start the nova-compute,the vm in the host will get network connection failed.

Steps to reproduce
==================
deploy mitaka. with neutron, enabled ovs-dpdk, choose one compute node, where vm has network connection.
run this in host,
1. #systemctl stop neutron-openvswitch-agent.service
2. #systemctl restart openstack-nova-compute.service

then ping $VM_IN_THIS_HOST

Expected result
===============
ping $VM_IN_THIS_HOST would would success

Actual result
=============
ping $VM_IN_THIS_HOST failed.

Environment
===========
Centos7
ovs2.5.1
dpdk 2.2.0
openstack-nova-compute-13.1.1-1

Reason:
after some digging, I found that nova-compute will try to plug the vif every time when it booting.
Specially for vhostuser port, nova-compute will not check whether it exists as legacy ovs,and it will re-plug the port with vsctl args like "--if-exists del-port vhuxxxx".
(refer https://github.com/openstack/nova/blob/stable/mitaka/nova/virt/libvirt/vif.py#L679-L683)
after recreate the ovs vhostuser port, it will not get the right vlan tag which set from ovs agent.

In the test environment, after restart the ovs agent, the agent will set a proper vlan id for the port. and the network connection will be resumed.

Not sure it's a bug or config issue, do I miss something?
there is also fp_plug type for vhostuser port, how could we specify it?

wangyalei (yalei)
Changed in nova:
assignee: nobody → wangyalei (yalei)
summary: - nova-compute will try to re-plug the vif even if it exists
+ nova-compute will try to re-plug the vif even if it exists for vhostuser
+ port.
wangyalei (yalei)
description: updated
wangyalei (yalei)
description: updated
Changed in nova:
status: New → In Progress
Revision history for this message
Sean Dague (sdague) wrote :

Automatically discovered version mitaka in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.mitaka
Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing the status back to the previous state and unassigning. If there are active reviews related to this bug, please include links in comments.

Changed in nova:
status: In Progress → New
assignee: wangyalei (yalei) → nobody
Revision history for this message
Sean Dague (sdague) wrote :

What real world scenario would you expect to expose a situation where the neutron environment is turned off and nova-compute is restarted? This seems pretty synthetic, and the fact that it recovers ones the neutron agent restarts seems like most of the environment is working as expected.

Changed in nova:
status: New → Incomplete
status: Incomplete → Opinion
Revision history for this message
Sahid Orentino (sahid-ferdjaoui) wrote :

This issue is valid for mitaka and newton which are both using OVS dpdkvhostuser port, deleting the port is going to make instance losing connectivity.

After newton we have deprecated usage dpdkvhostuser to use dpdkvhostuserclient. Not sure whether we want fix it upstream so?

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Not sure if this is the same issue but you can reproduce this by simply restarting nova-compute. With a DevStack-based deployment, run:

  $ sudo service <email address hidden> restart

This will result in the following ovs-related logs in journalctl:

  $ journalctl -a /usr/sbin/ovs-vswitchd /usr/bin/ovs-vsctl -a --follow
  Mar 05 14:01:12 localhost.localdomain ovs-vsctl[11616]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -- --may-exist add-br br-int -- set Bridge br-int datapath_type=netdev
  Mar 05 14:01:12 localhost.localdomain ovs-vsctl[11617]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --timeout=120 -- --if-exists del-port vhu81f803a2-ac -- add-port br-int vhu81f803a2-ac -- set Interface vhu81f803a2-ac external-ids:iface-id=81f803a2-ac81-4ee8-ba72-93f1e45a546c external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:a8:08:98 external-ids:vm-uuid=67bb2da9-5e5d-48c5-9e98-84fb26bfb362 type=dpdkvhostuserclient options:vhost-server-path=/var/run/openvswitch/vhu81f803a2-ac

IIRC, with dpdkvhostuser interfaces (as opposed to dpdkvhostuserclient), the OVS instance acts as the server. As a result, removing and re-adding the instance results in a loss of connectivity. This doesn't happen with dpdkvhostuserclient because QEMU is the server. As a result, this seems like something we should not be doing.

*However*, there is another issue: we do things this way because of bug #1270973 (and related bug #1268762). This behavior was introduced in https://github.com/openstack/nova/commit/33cc64fb817. I'm not entirely sure that the issue that describes is valid one as it seems like a bug with OVS (otherwise what's the point of the '--may-exist' flag?). However, I need to examine this to figure out what's happening.

Changed in nova:
status: Opinion → Confirmed
importance: Undecided → High
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

As noted in the review [0], the bug that this was working around was fixed in OVS 2.6.0 [1], meaning that commit 33cc64fb817 mentioned above is not only causing issues with some patches but it should also no longer be necessary. IMO, we should likely revert it in it's entirety, which is what [0] is pretty much doing.

[0] https://review.openstack.org/#/c/546588/1
[1] https://github.com/openvswitch/ovs/commit/e21c6643a02c6b446d2fbdfde366ea303b4c2730

Revision history for this message
melanie witt (melwitt) wrote :

The os-vif change https://review.openstack.org/#/c/546588 merged on 2018-03-06.

So, if I understand correctly, we need a os-vif library release and that minimum version included in nova's requirements.txt before we can close this bug for nova.

Matt Riedemann (mriedem)
no longer affects: nova
Changed in os-vif:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → sahid (sahid-ferdjaoui)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-vif (stable/queens)

Reviewed: https://review.openstack.org/550079
Committed: https://git.openstack.org/cgit/openstack/os-vif/commit/?id=3be52b9e650db643827b8b97b28799407f30926f
Submitter: Zuul
Branch: stable/queens

commit 3be52b9e650db643827b8b97b28799407f30926f
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Feb 21 14:13:10 2018 +0100

    ovs: do not delete port if already exists

    Change-Id: I0ab28bc38be1f72635afa97c2c4651cd1c2ab336
    Closes-Bug: #1670628
    (cherry picked from commit dadc65c0fae857020f75c8180f159f72cd66c7bd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-vif 1.10.0

This issue was fixed in the openstack/os-vif 1.10.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-vif 1.9.1

This issue was fixed in the openstack/os-vif 1.9.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-vif (stable/pike)

Reviewed: https://review.openstack.org/550080
Committed: https://git.openstack.org/cgit/openstack/os-vif/commit/?id=89b0bea3f32260fc1032729e7a75c3250aed8494
Submitter: Zuul
Branch: stable/pike

commit 89b0bea3f32260fc1032729e7a75c3250aed8494
Author: Sahid Orentino Ferdjaoui <email address hidden>
Date: Wed Feb 21 14:13:10 2018 +0100

    ovs: do not delete port if already exists

    Change-Id: I0ab28bc38be1f72635afa97c2c4651cd1c2ab336
    Closes-Bug: #1670628
    (cherry picked from commit dadc65c0fae857020f75c8180f159f72cd66c7bd)
    (cherry picked from commit 3be52b9e650db643827b8b97b28799407f30926f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-vif 1.7.1

This issue was fixed in the openstack/os-vif 1.7.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.