Neutron ovs agent fails on rpc_loop iteration:1

Bug #1881424 reported by Radosław Piliszek on 2020-05-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Critical
Unassigned
Victoria
Critical
Unassigned
neutron
High
Terry Wilson

Bug Description

Hi Neutrinos!

This is from Kolla-Ansible CI, it started happening in Victoria on May 28.
It affects all distros deb-ubu-centos and makes the jobs fail (ovs agent is dead).
It does *not* affect OVN though.

br-ex exists before iteration:0 and acts fine in iteration:0 but not in iteration:1.

The only quasi-relevant change in neutron seems to be https://review.opendev.org/721554 but still it should only affect DVR which we are not running. Odd.

Full logs with config and debug are attached.

The relevant logs look like this (bottom of the file):
2020-05-30 15:27:59.003 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Agent rpc_loop - iteration:1 started
2020-05-30 15:27:59.006 7 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] ofctl request version=0x4,msg_type=0x12,msg_len=0x38,xid=0x15f04540,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=23,type=1) result [OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=371283074374527098,duration_nsec=718000000,duration_sec=5,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],flags=0,type=1)] _send_msg /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:113
2020-05-30 15:27:59.008 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Physical bridge br-ex was just re-created.
2020-05-30 15:27:59.009 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index name autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.009 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index name autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.009 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index name autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.010 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index target autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.011 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Mapping physical network physnet1 to bridge br-ex
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Error executing command: ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Bridge with name=br-ex
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command Traceback (most recent call last):
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 38, in execute
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command self.run_idl(None)
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/command.py", line 214, in run_idl
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command record = self.api.lookup(self.table, self.record)
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 171, in lookup
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command return self._lookup(table, record)
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 218, in _lookup
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command row = idlutils.row_by_value(self, rl.table, rl.column, record)
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command File "/var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 95, in row_by_value
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command raise RowNotFound(table=table, col=column, match=match)
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Bridge with name=br-ex
2020-05-30 15:27:59.012 7 ERROR ovsdbapp.backend.ovs_idl.command
2020-05-30 15:27:59.015 7 DEBUG neutron.agent.common.async_process [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Halting async process [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]. stop /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/agent/common/async_process.py:135
2020-05-30 15:27:59.015 7 DEBUG neutron.agent.linux.utils [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Running command: ['kill', '-9', '75'] create_process /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/agent/linux/utils.py:87
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Agent main thread died of an exception: TypeError: int() can't convert non-string with explicit base
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp Traceback (most recent call last):
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 40, in agent_main_wrapper
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp ovs_agent.main(bridge_classes)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2780, in main
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp agent.daemon_loop()
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2674, in daemon_loop
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp self.rpc_loop(polling_manager=pm)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2545, in rpc_loop
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp added_bridges)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1416, in _reconfigure_physical_bridges
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp sync = self._do_reconfigure_physical_bridges(bridges)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1439, in _do_reconfigure_physical_bridges
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp self.setup_physical_bridges(bridge_mappings)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1495, in setup_physical_bridges
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp self._check_bridge_datapath_id(br, datapath_ids_set)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1457, in _check_bridge_datapath_id
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp dpid = int(bridge.get_datapath_id(), 16)
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp TypeError: int() can't convert non-string with explicit base
2020-05-30 15:27:59.036 7 ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp
2020-05-30 15:27:59.048 7 DEBUG neutron.agent.common.async_process [-] Output received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: None _read_stdout /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/agent/common/async_process.py:262
2020-05-30 15:27:59.048 7 ERROR neutron.agent.common.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: None
2020-05-30 15:27:59.052 7 CRITICAL neutron [-] Unhandled error: TypeError: int() can't convert non-string with explicit base
2020-05-30 15:27:59.052 7 ERROR neutron Traceback (most recent call last):
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/bin/neutron-openvswitch-agent", line 8, in <module>
2020-05-30 15:27:59.052 7 ERROR neutron sys.exit(main())
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/cmd/eventlet/plugins/ovs_neutron_agent.py", line 20, in main
2020-05-30 15:27:59.052 7 ERROR neutron agent_main.main()
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 37, in main
2020-05-30 15:27:59.052 7 ERROR neutron of_main.main()
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main
2020-05-30 15:27:59.052 7 ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.'
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/os_ken/base/app_manager.py", line 375, in run_apps
2020-05-30 15:27:59.052 7 ERROR neutron hub.joinall(services)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/os_ken/lib/hub.py", line 102, in joinall
2020-05-30 15:27:59.052 7 ERROR neutron t.wait()
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/greenthread.py", line 181, in wait
2020-05-30 15:27:59.052 7 ERROR neutron return self._exit_event.wait()
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/event.py", line 132, in wait
2020-05-30 15:27:59.052 7 ERROR neutron current.throw(*self._exc)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/eventlet/greenthread.py", line 221, in main
2020-05-30 15:27:59.052 7 ERROR neutron result = function(*args, **kwargs)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/os_ken/lib/hub.py", line 64, in _launch
2020-05-30 15:27:59.052 7 ERROR neutron raise e
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/os_ken/lib/hub.py", line 59, in _launch
2020-05-30 15:27:59.052 7 ERROR neutron return func(*args, **kwargs)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 43, in agent_main_wrapper
2020-05-30 15:27:59.052 7 ERROR neutron LOG.exception("Agent main thread died of an exception")
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-05-30 15:27:59.052 7 ERROR neutron self.force_reraise()
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-05-30 15:27:59.052 7 ERROR neutron six.reraise(self.type_, self.value, self.tb)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-05-30 15:27:59.052 7 ERROR neutron raise value
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_oskenapp.py", line 40, in agent_main_wrapper
2020-05-30 15:27:59.052 7 ERROR neutron ovs_agent.main(bridge_classes)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2780, in main
2020-05-30 15:27:59.052 7 ERROR neutron agent.daemon_loop()
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2674, in daemon_loop
2020-05-30 15:27:59.052 7 ERROR neutron self.rpc_loop(polling_manager=pm)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 2545, in rpc_loop
2020-05-30 15:27:59.052 7 ERROR neutron added_bridges)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1416, in _reconfigure_physical_bridges
2020-05-30 15:27:59.052 7 ERROR neutron sync = self._do_reconfigure_physical_bridges(bridges)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1439, in _do_reconfigure_physical_bridges
2020-05-30 15:27:59.052 7 ERROR neutron self.setup_physical_bridges(bridge_mappings)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1495, in setup_physical_bridges
2020-05-30 15:27:59.052 7 ERROR neutron self._check_bridge_datapath_id(br, datapath_ids_set)
2020-05-30 15:27:59.052 7 ERROR neutron File "/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py", line 1457, in _check_bridge_datapath_id
2020-05-30 15:27:59.052 7 ERROR neutron dpid = int(bridge.get_datapath_id(), 16)
2020-05-30 15:27:59.052 7 ERROR neutron TypeError: int() can't convert non-string with explicit base
2020-05-30 15:27:59.052 7 ERROR neutron

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Changed in kolla-ansible:
status: New → Triaged
importance: Undecided → Critical
Changed in neutron:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

(sorry for status changes - keyboard cleaning!)

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

testing hypothesis in https://review.opendev.org/731985

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

not helping

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

another hypothesis - https://pypi.org/project/ovsdbapp updated just before the breakage

Changed in neutron:
importance: Undecided → High
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Radoslaw:

Reviewing the log you sent [1], can you assure that br-ex is created? Seems that ovsdbapp doesn't find br-ex bridge:
  ERROR ovsdbapp.backend.ovs_idl.command ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find
    Bridge with name=br-ex

What annoys me the most is a previous message:
  Physical bridge br-ex was just re-created.

That means the bridge is there (I think so...). Let me take a look at the ovsdbapp code and if there is any change there.

Thank you for your bug report.

[1]http://paste.openstack.org/show/794175/

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Hello Rodolfo, and thanks for handling. The bridge exists indeed.
in the full log:
2020-05-30 15:27:53.303 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] Bridge br-ex datapath-id = 0x000056072b0a2049

I found this commit suspicious: https://review.opendev.org/678076

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Reverting ovsdbapp to 1.1.0 (the previous on master; the current is 1.3.0) helped.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Reverting just the suspicious patch [1] helped as well.
The revert is in [2].
The testing in [3].

[1] https://review.opendev.org/678076
[2] https://review.opendev.org/731990
[3] https://review.opendev.org/731985

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

The following question remains:
why neither ovsdbapp nor neutron CIs caught this issue early?
It looks pretty breaking.

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

The logs here look a little odd. It looks like the indexes are being created after the ovsdb connection has already been started, which would explain why the bridge is not found in the index. Investigating why this is happening in your case, but none of the other cases we've tested and seen.

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

(with here meaning http://paste.openstack.org/show/794175/
Specifically:
2020-05-30 15:27:59.003 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Agent rpc_loop - iteration:1 started
2020-05-30 15:27:59.006 7 DEBUG neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] ofctl request version=0x4,msg_type=0x12,msg_len=0x38,xid=0x15f04540,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=23,type=1) result [OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=371283074374527098,duration_nsec=718000000,duration_sec=5,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],flags=0,type=1)] _send_msg /var/lib/kolla/venv/lib/python3.6/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:113
2020-05-30 15:27:59.008 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Physical bridge br-ex was just re-created.
2020-05-30 15:27:59.009 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index name autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.009 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index name autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.009 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index name autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.010 7 DEBUG ovsdbapp.backend.ovs_idl [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Created index target autocreate_indices /var/lib/kolla/venv/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/__init__.py:99
2020-05-30 15:27:59.011 7 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-680d0d82-c47f-4b45-86cc-53520f537f29 - - - - -] Mapping physical network physnet1 to bridge br-ex

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

If I had to guess we ended up with a name collision that maybe ends up mattering? The indexing patch adds an ovsdb_connection propert, and uses self._ovsdb_connection to be the underlying variable. the OvsIdl class in neutron/agent/ovsdb/native/connection happens to store a string in self._ovsdb_connection (for absolutely no reason, it is never used outside of __init__). I'm going to post a patch to neutron that just removes the self\. from that and can you see if it helps you?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/732077

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Gah, too many Idl classes. that one shouldn't matter. :-/ Still looking.

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Ok, new thought.

BaseOVS() is called a lot in the ovs agent. BaseOVS calls neutron.agent.ovsdb.impl_idl.api_factory a lot. While the connection and Idl classes are globals, the It returns a new NeturonOvsdbIdl() class each time, and each time it does that, it calls ovsdbapp.backend.ovs_idl.__init__ which calls autocreate_indices getting a new index, even though the idl is already running.

Could fix in neutron by maybe just also making the return value of api_factory a singleton. Can also add to ovsdbapp to ensure autocreate_indices is only called once, which I'll do right now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Terry Wilson (<email address hidden>) on branch: master
Review: https://review.opendev.org/732077
Reason: This was incorrect. OvsIdl is the idl.Idl class, not an ovsdbapp class inheriting from ovs_idl.Backend

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Looks like the existing ovsdbapp code is correctly keeping indexes from being recreated even if we try to call autocreate_indices multiple times when running tests that do the same kind of lookup failing in your logs, like:

    def test_multiple_BaseOVS_calls(self):
        br = self.useFixture(net_helpers.OVSBridgeFixture()).bridge.br_name
        self.assertEqual(br, self.ovs.db_get_val('Bridge', br, 'name'))
        new_base = ovs_lib.BaseOVS()
        self.assertEqual(br, self.ovs.db_get_val('Bridge', br, 'name'))

It successfully finds the bridge both times w/ no code changes. But clearly in your logs I can see indices being created twice, so I'll go ahead and post the patch to forbid creating indices if the connection is already running just to see if it helps.

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Please try https://review.opendev.org/#/c/732081/ and see if it helps.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Thanks, Terry. Testing again in https://review.opendev.org/731985

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I confirm it helping. Logs attached.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :
Changed in neutron:
assignee: nobody → Terry Wilson (otherwiseguy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/732523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/732523
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=178227c8b6636433fcdab4dfd342468335b76aa6
Submitter: Zuul
Branch: master

commit 178227c8b6636433fcdab4dfd342468335b76aa6
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 2 11:10:21 2020 +0000

    Make NeutronOvsdbIdl singleton

    "NeutronOvsdbIdl" should be a singleton class; only one instance
    should be shared everywhere is called. Currently the connection
    and the "OvsIdlMonitor" instances are global; those instances are
    used to declare the "NeutronOvsdbIdl" instance.

    As commented in the related bug, this singleton will ensure that
    the OvsIdl indexes are created just once.

    Change-Id: I639cf673a983b7b1be810495d8a8c2d89919a9b6
    Related-Bug: #1881424

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ovsdbapp (master)

Change abandoned by Terry Wilson (<email address hidden>) on branch: master
Review: https://review.opendev.org/731990
Reason: We've got a patch working around it in Neutron merged and a patch that fixes the issue in ovsdbapp as well.

tags: added: neutron-proactive-backport-potential
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

@Terry: can we mark this bug as fixed in ovsdbapp and remove Neutron from the affected projects? Or is there anything more to do there?

tags: removed: neutron-proactive-backport-potential
Revision history for this message
Terry Wilson (otherwiseguy) wrote :

@Slawek: I started looking at it again yesterday, and the change that Rodolfo did fixed it the kolla folks. There is a patch here: https://review.opendev.org/#/c/732081/ for ovsdbapp that they said also fixed the issue. Though I was never able to recreate the issue myself and the existing code really shouldn't allow individual indexes to be created twice anyway.

Since we have two different things that make the problem go away, we can probably go ahead and close it. The mystery still bugs me a bit.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ovsdbapp (master)

Reviewed: https://review.opendev.org/732081
Committed: https://git.openstack.org/cgit/openstack/ovsdbapp/commit/?id=0320e78c2c1d04118cd10182c2e0c476950ba776
Submitter: Zuul
Branch: master

commit 0320e78c2c1d04118cd10182c2e0c476950ba776
Author: Terry Wilson <email address hidden>
Date: Sun May 31 21:19:06 2020 -0500

    Only create indices if connection is not running

    Also adds some better debug logs when creating indexes

    Related-Bug: #1881424
    Change-Id: I094dd475bb6956a3c029075ad85dbc15cf59a16c

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/755767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/755768

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/755767
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c874c44cdc0dc32ae1bdcedfce4a2a9319d8e1eb
Submitter: Zuul
Branch: stable/ussuri

commit c874c44cdc0dc32ae1bdcedfce4a2a9319d8e1eb
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 2 11:10:21 2020 +0000

    Make NeutronOvsdbIdl singleton

    "NeutronOvsdbIdl" should be a singleton class; only one instance
    should be shared everywhere is called. Currently the connection
    and the "OvsIdlMonitor" instances are global; those instances are
    used to declare the "NeutronOvsdbIdl" instance.

    As commented in the related bug, this singleton will ensure that
    the OvsIdl indexes are created just once.

    Related-Bug: #1881424

    Conflicts:
        neutron/common/utils.py

    Change-Id: I639cf673a983b7b1be810495d8a8c2d89919a9b6
    (cherry picked from commit 178227c8b6636433fcdab4dfd342468335b76aa6)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/755768
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ae74b483c3b9533d4f92d5b0b1210f35c896d2d2
Submitter: Zuul
Branch: stable/train

commit ae74b483c3b9533d4f92d5b0b1210f35c896d2d2
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Tue Jun 2 11:10:21 2020 +0000

    Make NeutronOvsdbIdl singleton

    "NeutronOvsdbIdl" should be a singleton class; only one instance
    should be shared everywhere is called. Currently the connection
    and the "OvsIdlMonitor" instances are global; those instances are
    used to declare the "NeutronOvsdbIdl" instance.

    As commented in the related bug, this singleton will ensure that
    the OvsIdl indexes are created just once.

    Related-Bug: #1881424

    Conflicts:
        neutron/common/utils.py
        neutron/agent/ovsdb/impl_idl.py

    Change-Id: I639cf673a983b7b1be810495d8a8c2d89919a9b6
    (cherry picked from commit 178227c8b6636433fcdab4dfd342468335b76aa6)
    (cherry picked from commit c874c44cdc0dc32ae1bdcedfce4a2a9319d8e1eb)

tags: added: in-stable-train
Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Looks like this was fixed by Rodolfo's singleton patch

no longer affects: ovsdbapp
Changed in neutron:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers