Neutron times out handling a request to create a network

Bug #2083570 reported by Ihar Hrachyshka
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Unassigned

Bug Description

This happened in gate: https://369ed7cdb441e6ba648b-4c5da0a4d8de7235fc811d3d8ea842f8.ssl.cf5.rackcdn.com/930608/3/gate/neutron-ovn-tempest-ipv6-only-ovs-release/16831fa/job-output.txt

In devstack log, we see this:

2024-10-02 19:07:29.822345 | controller | ++ lib/tempest:configure_tempest:384 : openstack --os-cloud devstack-admin --os-region RegionOne network create --share shared
2024-10-02 19:12:30.075768 | controller | Error while executing command: HttpException: 500, : 500 Internal Server Error: Internal Server Error: The server encountered an internal error or: misconfiguration and was unable to complete: your request.: Please contact the server administrator at: webmaster@localhost to inform them of the time this error occurred,: and the actions you performed just before this error.: More information about this error may be available: in the server error log.: Apache/2.4.52 (Ubuntu) Server at 2001:4802:7805:104:be76:4eff:fe20:23a0 Port 80

The req-id is: req-77c1f0b0-0b12-4b4b-9caa-f1d749596242

In neutron-api log, we see this as the last message for the req-id:

Oct 02 19:07:30.194086 np0038691690 <email address hidden>[59580]: DEBUG neutron_lib.callbacks.manager [None req-77c1f0b0-0b12-4b4b-9caa-f1d749596242 admin admin] Publish callbacks ['neutron.services.auto_allocate.db._ensure_external_network_default_value_callback-8743187362991'] for network (13143b5f-8c77-4b54-be12-2807c0430033), precommit_create {{(pid=59580) _notify_loop /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/callbacks/manager.py:184}}

Nothing after it related to the req-id.

There are also these errors for hash ring / db locks after the last req-id tagged message:

Oct 02 19:07:59.684841 np0038691690 <email address hidden>[59580]: DEBUG futurist.periodics [-] Submitting periodic callback 'neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance.HashRingHealthCheckPeriodics.touch_hash_ring_nodes' {{(pid=59580) _process_scheduled /opt/stack/data/venv/lib/python3.10/site-packages/futurist/periodics.py:638}}
Oct 02 19:08:09.697938 np0038691690 <email address hidden>[59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}}
Oct 02 19:08:20.609536 np0038691690 <email address hidden>[59580]: DEBUG neutron_lib.db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Retry wrapper got retriable exception: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
Oct 02 19:08:20.609536 np0038691690 <email address hidden>[59580]: [SQL: UPDATE ovn_hash_ring SET updated_at=%(updated_at)s WHERE ovn_hash_ring.node_uuid = %(node_uuid_1)s]
Oct 02 19:08:20.609536 np0038691690 <email address hidden>[59580]: [parameters: {'updated_at': datetime.datetime(2024, 10, 2, 19, 7, 30, 112762), 'node_uuid_1': '90677d5a-7f61-461d-8b8c-b9fad0a372e0'}]
Oct 02 19:08:20.609536 np0038691690 <email address hidden>[59580]: (Background on this error at: https://sqlalche.me/e/20/e3q8) {{(pid=59580) wrapped /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/db/api.py:185}}
Oct 02 19:08:20.610666 np0038691690 <email address hidden>[59580]: DEBUG oslo_db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Performing DB retry for function neutron.db.ovn_hash_ring_db._touch {{(pid=59580) wrapper /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/api.py:155}}
Oct 02 19:08:31.116859 np0038691690 <email address hidden>[59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}}
Oct 02 19:08:50.611310 np0038691690 <email address hidden>[59580]: DEBUG neutron_lib.db.api [-] Retry wrapper got retriable exception: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
Oct 02 19:08:50.611310 np0038691690 <email address hidden>[59580]: [SQL: UPDATE ovn_hash_ring SET updated_at=%(updated_at)s WHERE ovn_hash_ring.hostname = %(hostname_1)s AND ovn_hash_ring.group_name = %(group_name_1)s]
Oct 02 19:08:50.611310 np0038691690 <email address hidden>[59580]: [parameters: {'updated_at': datetime.datetime(2024, 10, 2, 19, 7, 59, 685059), 'hostname_1': 'np0038691690', 'group_name_1': 'mechanism_driver'}]
Oct 02 19:08:50.611310 np0038691690 <email address hidden>[59580]: (Background on this error at: https://sqlalche.me/e/20/e3q8) {{(pid=59580) wrapped /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/db/api.py:185}}
Oct 02 19:08:50.612564 np0038691690 <email address hidden>[59580]: DEBUG oslo_db.api [-] Performing DB retry for function neutron.db.ovn_hash_ring_db._touch {{(pid=59580) wrapper /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/api.py:155}}
Oct 02 19:09:01.122051 np0038691690 <email address hidden>[59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}}
Oct 02 19:09:11.618081 np0038691690 <email address hidden>[59580]: DEBUG neutron_lib.db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Retry wrapper got retriable exception: (pymysql.err.OperationalError) (1205, 'Lock wait timeout exceeded; try restarting transaction')
Oct 02 19:09:11.618081 np0038691690 <email address hidden>[59580]: [SQL: UPDATE ovn_hash_ring SET updated_at=%(updated_at)s WHERE ovn_hash_ring.node_uuid = %(node_uuid_1)s]
Oct 02 19:09:11.618081 np0038691690 <email address hidden>[59580]: [parameters: {'updated_at': datetime.datetime(2024, 10, 2, 19, 8, 21, 110834), 'node_uuid_1': '90677d5a-7f61-461d-8b8c-b9fad0a372e0'}]
Oct 02 19:09:11.618081 np0038691690 <email address hidden>[59580]: (Background on this error at: https://sqlalche.me/e/20/e3q8) {{(pid=59580) wrapped /opt/stack/data/venv/lib/python3.10/site-packages/neutron_lib/db/api.py:185}}
Oct 02 19:09:11.619845 np0038691690 <email address hidden>[59580]: DEBUG oslo_db.api [None req-fbfa6ef6-a02f-4aa1-bbc0-cfa1aab9022e None None] Performing DB retry for function neutron.db.ovn_hash_ring_db._touch {{(pid=59580) wrapper /opt/stack/data/venv/lib/python3.10/site-packages/oslo_db/api.py:155}}
Oct 02 19:09:21.740920 np0038691690 <email address hidden>[59580]: DEBUG dbcounter [-] [59580] Writing DB stats neutron:UPDATE=1 {{(pid=59580) stat_writer /opt/stack/data/venv/lib/python3.10/site-packages/dbcounter.py:115}}
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers [None req-1ed43a5e-ac35-4256-850c-3d3d414dd16e None None] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.exceptions.TimeoutException: Commands [CheckRevisionNumberCommand(_result=None, name=1ce51282-34b7-4e53-8b1c-811d585a1a7d, resource={'id': '1ce51282-34b7-4e53-8b1c-811d585a1a7d', 'name': '', 'network_id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'tenant_id': '', 'mac_address': 'fa:16:3e:6e:77:38', 'admin_state_up': True, 'status': 'ACTIVE', 'device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'device_owner': 'network:router_gateway', 'standard_attr_id': 23, 'fixed_ips': [{'subnet_id': '8a949128-8d38-454f-b7ed-931aafdeccdf', 'ip_address': '172.24.5.37'}, {'subnet_id': 'b02a7718-b0a3-4b32-afcb-384ab2c14812', 'ip_address': '2001:db8::7e'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'np0038691690', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True, 'connectivity': 'l2', 'bridge_name': 'br-int', 'datapath_type': 'system'}, 'port_security_enabled': False, 'tags': [], 'created_at': '2024-10-02T19:06:13Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 6, 'project_id': '', 'network': {'id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'name': 'public', 'tenant_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'admin_state_up': True, 'mtu': 1430, 'status': 'ACTIVE', 'subnets': ['8a949128-8d38-454f-b7ed-931aafdeccdf', 'b02a7718-b0a3-4b32-afcb-384ab2c14812'], 'standard_attr_id': 18, 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': True, 'vlan_transparent': None, 'description': '', 'port_security_enabled': True, 'is_default': True, 'tags': [], 'created_at': '2024-10-02T19:06:07Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 3, 'project_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'provider:network_type': 'flat', 'provider:physical_network': 'public', 'provider:segmentation_id': None}}, resource_type=ports, if_exists=True), SetLSwitchPortCommand(_result=None, lport=1ce51282-34b7-4e53-8b1c-811d585a1a7d, external_ids_update=None, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'neutron:project_id': '', 'neutron:cidrs': '172.24.5.37/24 2001:db8::7e/64', 'neutron:device_owner': 'network:router_gateway', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:network_name': 'neutron-a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'neutron:security_group_ids': '', 'neutron:revision_number': '6', 'neutron:vnic_type': 'normal', 'neutron:port_capabilities': '', 'neutron:mtu': '', 'neutron:host_id': 'np0038691690'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'np0038691690', 'mcast_flood_reports': 'true', 'exclude-lb-vips-from-garp': 'true', 'nat-addresses': 'router', 'router-port': 'lrp-1ce51282-34b7-4e53-8b1c-811d585a1a7d'}, 'enabled': True, 'port_security': [], 'type': 'router'}, if_exists=False), PgDelPortCommand(_result=None, port_group=neutron_pg_drop, lsp=['1ce51282-34b7-4e53-8b1c-811d585a1a7d'], if_exists=False)] exceeded timeout 180 seconds, cause: Result queue is empty
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 54, in commit
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers result = self.results.get(timeout=self.timeout)
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/queue.py", line 320, in get
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers return waiter.wait()
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/queue.py", line 139, in wait
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers return get_hub().switch()
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 310, in switch
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers return self.greenlet.switch()
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers _queue.Empty
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers The above exception was the direct cause of the following exception:
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 497, in _call_on_drivers
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 886, in update_port_postcommit
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers self._ovn_update_port(context.plugin_context, port, original_port,
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 766, in _ovn_update_port
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers self._ovn_client.update_port(plugin_context, port,
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 663, in update_port
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers with self._nb_idl.transaction(check_error=True,
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers next(self.gen)
Oct 02 19:09:25.055794 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 274, in transaction
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers with super(OvsdbNbOvnIdl, self).transaction(*args, **kwargs) as t:
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers next(self.gen)
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/api.py", line 114, in transaction
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers with self.create_transaction(
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/api.py", line 71, in __exit__
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers self.result = self.commit()
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers File "/opt/stack/data/venv/lib/python3.10/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 56, in commit
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers raise exceptions.TimeoutException(
Oct 02 19:09:25.058874 np0038691690 <email address hidden>[59580]: ERROR neutron.plugins.ml2.managers ovsdbapp.exceptions.TimeoutException: Commands [CheckRevisionNumberCommand(_result=None, name=1ce51282-34b7-4e53-8b1c-811d585a1a7d, resource={'id': '1ce51282-34b7-4e53-8b1c-811d585a1a7d', 'name': '', 'network_id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'tenant_id': '', 'mac_address': 'fa:16:3e:6e:77:38', 'admin_state_up': True, 'status': 'ACTIVE', 'device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'device_owner': 'network:router_gateway', 'standard_attr_id': 23, 'fixed_ips': [{'subnet_id': '8a949128-8d38-454f-b7ed-931aafdeccdf', 'ip_address': '172.24.5.37'}, {'subnet_id': 'b02a7718-b0a3-4b32-afcb-384ab2c14812', 'ip_address': '2001:db8::7e'}], 'allowed_address_pairs': [], 'extra_dhcp_opts': [], 'security_groups': [], 'description': '', 'binding:vnic_type': 'normal', 'binding:profile': {}, 'binding:host_id': 'np0038691690', 'binding:vif_type': 'ovs', 'binding:vif_details': {'port_filter': True, 'connectivity': 'l2', 'bridge_name': 'br-int', 'datapath_type': 'system'}, 'port_security_enabled': False, 'tags': [], 'created_at': '2024-10-02T19:06:13Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 6, 'project_id': '', 'network': {'id': 'a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'name': 'public', 'tenant_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'admin_state_up': True, 'mtu': 1430, 'status': 'ACTIVE', 'subnets': ['8a949128-8d38-454f-b7ed-931aafdeccdf', 'b02a7718-b0a3-4b32-afcb-384ab2c14812'], 'standard_attr_id': 18, 'shared': False, 'availability_zone_hints': [], 'availability_zones': [], 'ipv4_address_scope': None, 'ipv6_address_scope': None, 'router:external': True, 'vlan_transparent': None, 'description': '', 'port_security_enabled': True, 'is_default': True, 'tags': [], 'created_at': '2024-10-02T19:06:07Z', 'updated_at': '2024-10-02T19:06:21Z', 'revision_number': 3, 'project_id': 'ae9f99f870e44c829d566f4038cd9fcd', 'provider:network_type': 'flat', 'provider:physical_network': 'public', 'provider:segmentation_id': None}}, resource_type=ports, if_exists=True), SetLSwitchPortCommand(_result=None, lport=1ce51282-34b7-4e53-8b1c-811d585a1a7d, external_ids_update=None, columns={'external_ids': {'neutron:port_name': '', 'neutron:device_id': '02ca23c9-f068-4884-b5f1-89209b252b98', 'neutron:project_id': '', 'neutron:cidrs': '172.24.5.37/24 2001:db8::7e/64', 'neutron:device_owner': 'network:router_gateway', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:network_name': 'neutron-a4a2cb55-6e5c-436e-b2b0-f240149b240f', 'neutron:security_group_ids': '', 'neutron:revision_number': '6', 'neutron:vnic_type': 'normal', 'neutron:port_capabilities': '', 'neutron:mtu': '', 'neutron:host_id': 'np0038691690'}, 'parent_name': [], 'tag': [], 'options': {'requested-chassis': 'np0038691690', 'mcast_flood_reports': 'true', 'exclude-lb-vips-from-garp': 'true', 'nat-addresses': 'router', 'router-port': 'lrp-1ce51282-34b7-4e53-8b1c-811d585a1a7d'}, 'enabled': True, 'port_security': [], 'type': 'router'}, if_exists=False), PgDelPortCommand(_result=None, port_group=neutron_pg_drop, lsp=['1ce51282-34b7-4e53-8b1c-811d585a1a7d'], if_exists=False)] exceeded timeout 180 seconds, cause: Result queue is empty

I don't know if hash ring error is related to the original request lost / timed out. I don't see any errors in ovsdb-server log for NB.

tags: added: db gate-failure ovn
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
yatin (yatinkarel) wrote (last edit ):
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/931842

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

I've executed this patch [1] several times without any issue. The patch was eventlet and wsgi jobs, all of them based on "neutron-ovn-tempest-ipv6-only-base".

[1]https://review.opendev.org/c/openstack/neutron/+/931842

Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Using the testing patch [1] I've seen some occurrences of this issue. From the Neutron API logs I see the following message:
"""
Oct 16 08:53:21.446622 np0038811792 <email address hidden>[59526]: WARNING neutron.db.ovn_revision_numbers_db [None req-41c0f350-6ddc-4a25-a413-e8885f2a6be7 admin admin] No revision row found for c35cf187-15fb-4a8f-8a8f-892376b94aaa (type: router_ports) when bumping the revision number. Creating one.
"""

That happens when we try to update the host information in the LSP [2]. I'm going to test removing [3] and [4].

[1]https://review.opendev.org/c/openstack/neutron/+/931842
[2]https://review.opendev.org/c/openstack/neutron/+/882705
[3]https://github.com/openstack/neutron/blob/6d05519e93540227d228defd2495d81e6e405ff9/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1185
[4]https://github.com/openstack/neutron/blob/6d05519e93540227d228defd2495d81e6e405ff9/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1231

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

The log message is incorrect and I can't modify c#3. This is the correct log message:
"""
Oct 16 08:54:18.444685 np0038811792 <email address hidden>[59526]: WARNING neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovn_client [None req-b6b56194-4506-4a78-a9eb-338512bea93e None None] No hosting information found for port cf6a1d7f-491a-4b0c-bc0d-a4431da6c855: RuntimeError: No hosting information found for port cf6a1d7f-491a-4b0c-bc0d-a4431da6c855
"""

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/932601

Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :
Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
yatin (yatinkarel) wrote :

another recent one:- https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_921/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-with-neutron-lib-master/9218d5e/job-output.txt

Seems it could be related to running two uwsgi workers and those conflicting with each other.
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_921/periodic/opendev.org/openstack/neutron/master/neutron-ovn-tempest-with-neutron-lib-master/9218d5e/controller/logs/screen-neutron-api.txt

Nov 22 02:46:29.688564 np0039134856 <email address hidden>[59422]: INFO neutron.db.ovn_hash_ring_db [None req-6be9f2c7-4f05-4ff6-ba25-36b88d691cca None None] Nodes from host "np0039134856" and group "mechanism_driver" removed from the Hash Ring
Nov 22 02:46:29.696054 np0039134856 <email address hidden>[59422]: INFO neutron.db.ovn_hash_ring_db [None req-6be9f2c7-4f05-4ff6-ba25-36b88d691cca None None] Node 78ff4e06-deff-432f-b200-a9bad2640e37 from host "np0039134856" and group "mechanism_driver" added to the Hash Ring
Nov 22 02:46:29.697232 np0039134856 <email address hidden>[59422]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance [None req-6be9f2c7-4f05-4ff6-ba25-36b88d691cca None None] Periodic task found: HashRingHealthCheckPeriodics.touch_hash_ring_nodes {{(pid=59422) add_periodics /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/maintenance.py:93}}
Nov 22 02:46:29.698209 np0039134856 <email address hidden>[59422]: INFO neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [None req-6be9f2c7-4f05-4ff6-ba25-36b88d691cca None None] Hash Ring probing thread has started

Nov 22 02:46:32.938746 np0039134856 <email address hidden>[59423]: INFO neutron.db.ovn_hash_ring_db [None req-83eaa856-e44d-421b-8408-e937b01e3db0 None None] Nodes from host "np0039134856" and group "mechanism_driver" removed from the Hash Ring
Nov 22 02:46:32.946471 np0039134856 <email address hidden>[59423]: INFO neutron.db.ovn_hash_ring_db [None req-83eaa856-e44d-421b-8408-e937b01e3db0 None None] Node 292bb55a-facc-4786-b8a0-a555f68472f4 from host "np0039134856" and group "mechanism_driver" added to the Hash Ring
Nov 22 02:46:32.947968 np0039134856 <email address hidden>[59423]: DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.maintenance [None req-83eaa856-e44d-421b-8408-e937b01e3db0 None None] Periodic task found: HashRingHealthCheckPeriodics.touch_hash_ring_nodes {{(pid=59423) add_periodics /opt/stack/neutron/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/maintenance.py:93}}
Nov 22 02:46:32.949379 np0039134856 <email address hidden>[59423]: INFO neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [None req-83eaa856-e44d-421b-8408-e937b01e3db0 None None] Hash Ring probing thread has started

will try setting it to 1 to see how that goes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936429

Changed in neutron:
status: New → In Progress
Revision history for this message
yatin (yatinkarel) wrote :

<< will try setting it to 1 to see how that goes.
Just to keep it updated, with 1 worker not hit single failure in around 300 runs https://review.opendev.org/c/openstack/neutron/+/936147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936813

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936829

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936838

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/936428
Committed: https://opendev.org/openstack/neutron/commit/865097c6896a87fa5b6d508058db9ff1e6c423d1
Submitter: "Zuul (22348)"
Branch: master

commit 865097c6896a87fa5b6d508058db9ff1e6c423d1
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri Nov 29 09:30:23 2024 +0000

    [OVN] Improve initial hash ring setup

    When using WSGI module, the multiprocess event is not shared between
    the WSGI workers. This new implementation retrieves the WSGI start time
    to define the OVN hash ring register creation time. That will be used
    to filter out the stale registers.

    Depends-On: https://review.opendev.org/c/openstack/devstack/+/936669

    Closes-Bug: #2083570
    Change-Id: Id9f851f33c2cb3d2c2759a3c66adf2599a3122fe

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/936813
Committed: https://opendev.org/openstack/neutron/commit/dce01d7550642ec478eaeb19deded4c974134ea9
Submitter: "Zuul (22348)"
Branch: master

commit dce01d7550642ec478eaeb19deded4c974134ea9
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Dec 2 09:23:00 2024 +0000

    Fast exit when initially creating tunnel allocations

    If the database tunnel allocations match the current configuration, no
    other operation is done. During the initiallization, if several workers
    try to execute this method, only the first one will update the
    allocations. The next workers will check that the database is correctly
    updated and will exit this method.

    Closes-Bug: #2089940
    Related-Bug: #2083570
    Change-Id: I208ba38bff9191cabcc1325fec516d0b0179c97c

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/openstack/neutron/+/936838
Committed: https://opendev.org/openstack/neutron/commit/0e304fabc13ecf22c71ccbe6ee041d26c27b980f
Submitter: "Zuul (22348)"
Branch: master

commit 0e304fabc13ecf22c71ccbe6ee041d26c27b980f
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Dec 2 14:31:04 2024 +0000

    [OVN] Create a OVN hash ring maintenance thread per worker

    The class ``HashRingHealthCheckPeriodics`` now handles the OVN hash ring
    update status of a single register. With this change, each API worker
    will spawn its own maintenance worker to update its own OVN hash ring
    register.

    The ``HashRingManager`` will also store the OVN hash ring ``updated_at``
    value, in order to avoid unnecessary updates. The
    ``OvnIdlDistributedLock`` will retrieve this ``updated_at`` value
    instead of having a local timer. If the register is updated, the
    ``OvnIdlDistributedLock`` notify method won't refresh it.

    The method ``touch_node`` no longer is decorated with
    ``retry_if_session_inactive``. If the node update fails, other calls to
    update the OVN hash ring register will be responsible of refreshing it.

    The goals of this patch are:
    * To make each worker acountable of its own OVN hash ring.
    * To void multiple registers updates.
    * To update the OVN hash registers when needed, only if they are
      outdated.

    Related-Bug: #2083570
    Change-Id: Ia15f48c28fe6431eac4778fd0c6a88c035a4f712

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/940123

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/940123
Committed: https://opendev.org/openstack/neutron/commit/b0c02c28929dedb4ed8a460994e71e72282058c5
Submitter: "Zuul (22348)"
Branch: master

commit b0c02c28929dedb4ed8a460994e71e72282058c5
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri Jan 24 11:52:11 2025 +0000

    [eventlet-removal][OVN] Require wsgi start-time in the config

    Currently there is one single method to run the Neutron API that is
    using uWSGI. This method requires a specific hash ring manager
    initialization, different to the eventlet server based.

    Because the second has been removed and is no longer available, as
    long as Neutron continues the effors of removing eventlet from the
    source code, the second way to initialize the hash ring manager
    is removed.

    Related-Bug: #2083570
    Change-Id: I59b25093df9cf5aa77767492a5b9008bfa11cc07

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936429
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "yatin <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936829
Reason: No longer needed, couple of issues already handled

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "yatin <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/936147
Reason: for wsgi issues

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 26.0.0.0rc1

This issue was fixed in the openstack/neutron 26.0.0.0rc1 Epoxy release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.