Since 08.07.2024 we have been seeing very frequent failures of the neutron-tempest-plugin-designate-scenario job. There are different tests failing in those jobs but the common thing is error we have seen there. It is always something like:
Traceback (most recent call last):
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_dns_integration.py", line 241, in test_fip_admin_delete
self._verify_dns_records(addr, name)
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_dns_integration.py", line 174, in _verify_dns_records
self._verify_designate_recordset(address, found, rec_type)
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_dns_integration.py", line 137, in _verify_designate_recordset
dns_waiters.wait_for_recordset_status(
File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/designate_tempest_plugin/common/waiters.py", line 210, in wait_for_recordset_status
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (DNSIntegrationAdminTests:test_fip_admin_delete) Recordset d7918b56-a39b-429b-a33b-16377580f03b failed to reach status=ACTIVE within the required time (360 s). Current status: PENDING
Examples of failure: https://059105a74883c41850c8-6b9500683e6a67d31c1bad572acf67ba.ssl.cf2.rackcdn.com/917429/8/check/neutron-tempest-plugin-designate-scenario/8f356a7/testr_results.html
In the designate-producer logs (I don't even know if that is good component to check) I can see a lot of errors like:
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall Traceback (most recent call last):
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_log/pipe_mutex.py", line 67, in acquire
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall os.read(self.rfd, 1)
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall BlockingIOError: [Errno 11] Resource temporarily unavailable
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall During handling of the above exception, another exception occurred:
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall Traceback (most recent call last):
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_service/loopingcall.py", line 71, in func
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall LOG.error('%(kind)s %(func_name)r failed',
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1858, in error
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.log(ERROR, msg, *args, **kwargs)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1879, in log
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.logger.log(level, msg, *args, **kwargs)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1547, in log
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self._log(level, msg, args, **kwargs)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.handle(record)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.callHandlers(record)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall hdlr.handle(record)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 966, in handle
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.acquire()
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 917, in acquire
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.lock.acquire()
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_log/pipe_mutex.py", line 80, in acquire
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall eventlet.hubs.trampoline(self.rfd, read=True)
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/__init__.py", line 153, in trampoline
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall listener = hub.add(hub.READ, fileno, current.switch, current.throw, mark_as_closed)
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/epolls.py", line 22, in add
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall listener = hub.BaseHub.add(self, evtype, fileno, cb, tb, mac)
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 174, in add
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall raise RuntimeError(
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall RuntimeError: Second simultaneous read on fileno 4 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_prevent_multiple_readers(False) - MY THREAD=<built-in method switch of GreenThread object at 0x7ff39b694e00>; THAT THREAD=FdListener('read', 4, <built-in method switch of GreenThread object at 0x7ff39c5f91c0>, <built-in method throw of GreenThread object at 0x7ff39c5f91c0>)
Maybe this is somehow related to that issue.
This seems to be related to a recent change in oslo.log https:/ /review. opendev. org/c/openstack /oslo.log/ +/918426