Tunnel ports are not cleaned up with several ovs agents restart

Bug #1877296 reported by Ann Taraday
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
Unassigned

Bug Description

If we update local_ip config value on several openvswitch agents and restart them we may get tunnel ports with old local_ip(remote_ip) setting are not cleaned up.

Before

        Port "vxlan-c0a80006"
            Interface "vxlan-c0a80006"
                type: vxlan
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="192.168.0.7", out_key=flow, remote_ip="192.168.0.6"}
        Port "vxlan-c0a89b04"
            Interface "vxlan-c0a89b04"
                type: vxlan
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="192.168.0.7", out_key=flow, remote_ip="192.168.155.4"}

After restart

        Port "vxlan-c0a80006"
            Interface "vxlan-c0a80006"
                type: vxlan
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="192.168.0.7", out_key=flow, remote_ip="192.168.0.6"}
        Port "vxlan-c0a89b04"
            Interface "vxlan-c0a89b04"
                type: vxlan
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="192.168.155.10", out_key=flow, remote_ip="192.168.155.4"}
        Port "vxlan-c0a89b05"
            Interface "vxlan-c0a89b05"
                type: vxlan
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, local_ip="192.168.155.10", out_key=flow, remote_ip="192.168.155.5"}

If agents restart one by one this issue does not appear.

Debug shows that during tunnel_sync https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py#L2240 rpc call https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/type_tunnel.py#L538 is getting interrupted http://paste.openstack.org/show/792876/ and some tunnels are not proceeded properly.

Originally found on Queens version, but is reproduced with master code and multinode devstack.

Tags: ovs
tags: added: ovs
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Miguel Lavalle (minsel) wrote :

In the failed case, what is the sequence you followed to update the config file and re-start the agents?

Changed in neutron:
status: New → Triaged
status: Triaged → New
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

In failed case for devstack: change config files on all nodes, then restart openvswitch agents almost at once with minimum delay.

Ivan Kolodyazhny (e0ne)
Changed in neutron:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.