openvswitch flows not always restored after openvswitch-switch upgrade with l2population enabled

Bug #1722946 reported by James Page
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Triaged
Medium
Unassigned
neutron
New
Undecided
Unassigned
neutron (Ubuntu)
Triaged
Medium
Sahid Orentino

Bug Description

In previous releases, an upgrade to the openvswitch packages which results in a restart of the userspace daemons is detected by the neutron-openvswitch-agent, which then tidies and restores the required state resulting in a minimal outage to instances on the hypervisor being upgrades.

Whilst testing updates to ovs 2.8.1, I noted that instance connectivity dropped but was never restored; restarting the neutron-openvswitch-agent resolved the issue.

2017-10-11 22:12:43.481 28598 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: send error: Connection refused
2017-10-11 22:12:43.482 28598 WARNING ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: connection dropped (Connection refused)
2017-10-11 22:12:45.332 28598 WARNING neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] OVS is restarted. OVSNeutronAgent will reset bridges and recover ports.
2017-10-11 22:12:45.647 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Mapping physical network physnet1 to bridge br-data
2017-10-11 22:12:45.700 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Bridge br-data has datapath-ID 00003e24776a3b44
2017-10-11 22:12:45.886 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Port b08484e1-5e7d-45e9-86d8-7078ded39d9f updated. Details: {'profile': {}, 'network_qos_policy_id': None, 'qos_policy_id': None, 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id': 'f1b6715f-9d79-457f-8272-6b54d0f82332', 'segmentation_id': 5, 'fixed_ips': [{'subnet_id': 'b069aeff-8879-4007-9799-bd350c3d1e5c', 'ip_address': '192.168.21.3'}], 'device_owner': u'compute:nova', 'physical_network': None, 'mac_address': 'fa:16:3e:69:89:92', 'device': u'b08484e1-5e7d-45e9-86d8-7078ded39d9f', 'port_security_enabled': True, 'port_id': 'b08484e1-5e7d-45e9-86d8-7078ded39d9f', 'network_type': u'gre', 'security_groups': ['84afead6-fcc4-45f7-94f1-017f263ca080']}
2017-10-11 22:12:45.887 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Assigning 1 as local vlan for net-id=f1b6715f-9d79-457f-8272-6b54d0f82332
2017-10-11 22:12:45.892 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Port 8aedca8c-fd85-42f0-97fa-eacb43a3364b updated. Details: {'profile': {}, 'network_qos_policy_id': None, 'qos_policy_id': None, 'allowed_address_pairs': [], 'admin_state_up': True, 'network_id': 'f1b6715f-9d79-457f-8272-6b54d0f82332', 'segmentation_id': 5, 'fixed_ips': [{'subnet_id': 'b069aeff-8879-4007-9799-bd350c3d1e5c', 'ip_address': '192.168.21.10'}], 'device_owner': u'compute:nova', 'physical_network': None, 'mac_address': 'fa:16:3e:50:b7:1a', 'device': u'8aedca8c-fd85-42f0-97fa-eacb43a3364b', 'port_security_enabled': True, 'port_id': '8aedca8c-fd85-42f0-97fa-eacb43a3364b', 'network_type': u'gre', 'security_groups': ['84afead6-fcc4-45f7-94f1-017f263ca080']}
2017-10-11 22:12:45.893 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Assigning 1 as local vlan for net-id=f1b6715f-9d79-457f-8272-6b54d0f82332
2017-10-11 22:12:45.902 28598 INFO neutron.agent.securitygroups_rpc [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Preparing filters for devices set([u'b08484e1-5e7d-45e9-86d8-7078ded39d9f', u'8aedca8c-fd85-42f0-97fa-eacb43a3364b'])
2017-10-11 22:12:47.813 28598 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-d616236d-5a5f-4f55-a721-83ff5079e5f0 - - - - -] Configuration for devices up [u'b08484e1-5e7d-45e9-86d8-7078ded39d9f', u'8aedca8c-fd85-42f0-97fa-eacb43a3364b'] and devices down [] completed.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: neutron-openvswitch-agent 2:11.0.1-0ubuntu1~cloud0 [origin: Canonical]
ProcVersionSignature: Ubuntu 4.4.0-96.119-generic 4.4.83
Uname: Linux 4.4.0-96-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
CrashDB:
 {
                "impl": "launchpad",
                "project": "cloud-archive",
                "bug_pattern_url": "http://people.canonical.com/~ubuntu-archive/bugpatterns/bugpatterns.xml",
             }
Date: Wed Oct 11 22:17:32 2017
Ec2AMI: ami-0000022c
Ec2AMIManifest: FIXME
Ec2AvailabilityZone: nova
Ec2InstanceType: m1.medium
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
PackageArchitecture: all
ProcEnviron:
 TERM=screen-256color-bce
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: neutron
UpgradeStatus: No upgrade log present (probably fresh install)
mtime.conffile..etc.neutron.plugins.ml2.openvswitch_agent.ini: 2017-10-11T19:36:21.675410

Revision history for this message
James Page (james-page) wrote :
description: updated
tags: added: ovs
Revision history for this message
Jakub Libosvar (libosvar) wrote :

Would be helpful to paste output of ovs-ofctl dump-flows br-int and br-tun to see whether the flows were implemented correctly or if the connectivity is dropped for other reason.

Changed in neutron:
status: New → Incomplete
Revision history for this message
LiweiWang (wlw9001) wrote :

You can try adding 'drop_flows_on_start=True' to Your l2 agent configure file. Then when you restart l2 agent, it will try to drop all flows and rebuid them.

Revision history for this message
James Page (james-page) wrote :

flow dumps before and after restart.

Changed in neutron:
status: Incomplete → New
Revision history for this message
James Page (james-page) wrote :

br-tun looks incomplete from a flows perspective after the restart of the openvswitch daemons.

Revision history for this message
James Page (james-page) wrote :

@wlw9001 the issue here is not a restart of the openvswitch-agent, but rather of openvswitch itself. After a restart of the neutron-openvswitch-agent, flows are correctly restored - but that does not help with restarts of openvswitch itself.

Revision history for this message
James Page (james-page) wrote :

Setting Neutron task back to new as flow data has been provided.

Revision history for this message
James Page (james-page) wrote :

This appears to be a bit tricky to reproduce; subsequent restarts of the openvswitch daemons does result in the br-tun flows being re-setup by the n-ovs-agent.

Just seems to be that first hit that causes the problem...

Revision history for this message
James Page (james-page) wrote :

Worth noting that the deployment uses the l2-population driver; disabling and trying to reproduce without that

Revision history for this message
James Page (james-page) wrote :

Unable to reproduce with l2pop disabled; so guessing that the issue lies somewhere in the tunnel update code driven by l2pop rather than in the general tunnel_sync code used when its disabled.

Changed in neutron (Ubuntu):
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

When l2pop is enabled, the n-ovs-agent never makes the calls to setup_tunnel_port to reconfigure the tunnel flows AFAICT; this does not always happen so something must be getting wedged/raced.

James Page (james-page)
Changed in neutron (Ubuntu):
status: New → Triaged
summary: - openvswitch flows not fully restored after openvswitch-switch upgrade
+ openvswitch flows not always restored after openvswitch-switch upgrade
+ with l2population enabled
Revision history for this message
James Page (james-page) wrote :

Grepping the neutron-server logs - a good restart has:

2017-10-24 12:47:27.017 20832 DEBUG neutron.plugins.ml2.drivers.l2pop.rpc [req-ee78be7a-5150-4884-9650-8533eda08e3c - - - - -] Notify l2population agent juju-8ac9b5-pike-gnocchi-testing-17 at q-agent-notifier the message add_fdb_entries with {u'2dfaf721-4af8-44a5-ac25-37e882838b40': {'ports': {u'10.5.0.48': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address=u'fa:16:3e:27:ec:71', ip_address=u'192.168.21.11')], u'10.5.0.29': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address=u'fa:16:3e:5c:44:6e', ip_address=u'192.168.21.9')], u'10.5.0.28': [('00:00:00:00:00:00', '0.0.0.0'), PortInfo(mac_address=u'fa:16:3e:46:4a:3a', ip_address=u'192.168.21.8')]}, 'network_type': u'gre', 'segment_id': 5}} _notification_host /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/drivers/l2pop/rpc.py:57

whereas once an agent gets into the state where the flows are not reconfigured, I don't see that type of message again.

James Page (james-page)
Changed in cloud-archive:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
James Page (james-page) wrote :

This might be related to bug 1823295 - I'm thinking that maybe the 3x restart of the underlying ovs daemons on package upgrade just causes confusion and dismay.

Changed in neutron (Ubuntu):
assignee: nobody → Sahid Orentino (sahid-ferdjaoui)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.