Frequent timeouts while waiting for recordset to become ACTIVE

Bug #2072627 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Fix Released
Critical
Michael Johnson

Bug Description

Since 08.07.2024 we have been seeing very frequent failures of the neutron-tempest-plugin-designate-scenario job. There are different tests failing in those jobs but the common thing is error we have seen there. It is always something like:

Traceback (most recent call last):
  File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_dns_integration.py", line 241, in test_fip_admin_delete
    self._verify_dns_records(addr, name)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_dns_integration.py", line 174, in _verify_dns_records
    self._verify_designate_recordset(address, found, rec_type)
  File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_dns_integration.py", line 137, in _verify_designate_recordset
    dns_waiters.wait_for_recordset_status(
  File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/designate_tempest_plugin/common/waiters.py", line 210, in wait_for_recordset_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (DNSIntegrationAdminTests:test_fip_admin_delete) Recordset d7918b56-a39b-429b-a33b-16377580f03b failed to reach status=ACTIVE within the required time (360 s). Current status: PENDING

Examples of failure: https://059105a74883c41850c8-6b9500683e6a67d31c1bad572acf67ba.ssl.cf2.rackcdn.com/917429/8/check/neutron-tempest-plugin-designate-scenario/8f356a7/testr_results.html

In the designate-producer logs (I don't even know if that is good component to check) I can see a lot of errors like:

Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall Traceback (most recent call last):
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_log/pipe_mutex.py", line 67, in acquire
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall os.read(self.rfd, 1)
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall BlockingIOError: [Errno 11] Resource temporarily unavailable
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall During handling of the above exception, another exception occurred:
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall Traceback (most recent call last):
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_service/loopingcall.py", line 150, in _run_loop
Jul 09 20:26:37.666691 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall result = func(*self.args, **self.kw)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_service/loopingcall.py", line 71, in func
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall LOG.error('%(kind)s %(func_name)r failed',
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1858, in error
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.log(ERROR, msg, *args, **kwargs)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1879, in log
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.logger.log(level, msg, *args, **kwargs)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1547, in log
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self._log(level, msg, args, **kwargs)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1624, in _log
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.handle(record)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1634, in handle
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.callHandlers(record)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 1696, in callHandlers
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall hdlr.handle(record)
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 966, in handle
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.acquire()
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/usr/lib/python3.10/logging/__init__.py", line 917, in acquire
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall self.lock.acquire()
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/oslo_log/pipe_mutex.py", line 80, in acquire
Jul 09 20:26:37.668339 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall eventlet.hubs.trampoline(self.rfd, read=True)
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/__init__.py", line 153, in trampoline
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall listener = hub.add(hub.READ, fileno, current.switch, current.throw, mark_as_closed)
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/epolls.py", line 22, in add
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall listener = hub.BaseHub.add(self, evtype, fileno, cb, tb, mac)
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 174, in add
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall raise RuntimeError(
Jul 09 20:26:37.670467 np0037939496 designate-producer[107481]: ERROR oslo.service.loopingcall RuntimeError: Second simultaneous read on fileno 4 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_prevent_multiple_readers(False) - MY THREAD=<built-in method switch of GreenThread object at 0x7ff39b694e00>; THAT THREAD=FdListener('read', 4, <built-in method switch of GreenThread object at 0x7ff39c5f91c0>, <built-in method throw of GreenThread object at 0x7ff39c5f91c0>)

Maybe this is somehow related to that issue.

Tags: breaks-gate
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

This seems to be related to a recent change in oslo.log https://review.opendev.org/c/openstack/oslo.log/+/918426

Changed in designate:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Michael Johnson (johnsom) wrote :

I have proposed a revert of the oslo log change that is causing this issue here: https://review.opendev.org/c/openstack/oslo.log/+/923961

Revision history for this message
Michael Johnson (johnsom) wrote :
Revision history for this message
Michael Johnson (johnsom) wrote :

The revert of the oslo logging patch has resolved this issue.

Changed in designate:
status: Triaged → Fix Released
Revision history for this message
Michael Johnson (johnsom) wrote :
Changed in designate:
assignee: nobody → Michael Johnson (johnsom)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.