nova-compute process stuck without any actions

Bug #1971663 reported by lijie.xie
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Description
===========
nova-compute process gets stuck after running for a while

Steps to reproduce
==================
Sorry, I haven't specified how to reproduce,but I know this problem has occurred at least three times in the arm environment, but not on x86.

Expected result
===============
nova-compute run normally

Actual result
=============
2022-04-28 15:07:15.150 5967 DEBUG oslo_concurrency.lockutils [-] Lock "ca12badc-42a2-43f2-8a28-cb2afa3a377c" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.151 5967 DEBUG oslo_concurrency.lockutils [-] Lock "53988d8e-75bc-40de-b9af-ce7509f94918" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.151 5967 DEBUG oslo_concurrency.lockutils [-] Lock "bc2b5ad1-a7a4-45fc-8d95-e986476c3129" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.151 5967 DEBUG oslo_concurrency.lockutils [-] Lock "f2068deb-c12f-4c31-998c-cf48af4440ff" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.152 5967 DEBUG oslo_concurrency.lockutils [-] Lock "48577b65-28e2-4858-abed-db00f90c0dc6" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.152 5967 DEBUG oslo_concurrency.lockutils [-] Lock "5e6a68fc-b711-4115-bc77-179a224ed5c6" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.152 5967 DEBUG oslo_concurrency.lockutils [-] Lock "4f47003a-c056-4aa6-9be2-e5bec0cb4a07" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.153 5967 DEBUG oslo_concurrency.lockutils [-] Lock "a1416265-68cb-4e06-a1b9-80c0bd138a09" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 15:07:15.153 5967 DEBUG oslo_concurrency.lockutils [-] Lock "96408c82-5951-46e6-b105-bcb21118ed99" acquired by "nova.compute.manager.ComputeManager._sync_power_states.<locals>._sync.<locals>.query_driver_power_state_and_sync" :: waited 0.000s inner /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
2022-04-28 17:11:03.055 5967 INFO oslo.messaging._drivers.impl_rabbit [-] A recoverable connection/channel error occurred, trying to reconnect: [Errno 104] Connection reset by peer
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers
    timer()
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/timer.py", line 59, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python3.6/site-packages/eventlet/semaphore.py", line 152, in _do_acquire
    waiter.switch()
greenlet.error: cannot switch to a different thread

Environment
===========
Python 3.6.8
eventlet 0.30.2
oslo.concurrency 4.4.0
oslo.messaging 12.7.1

nova-compute Stack list
======================

Stack for thread 281471352828416
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 365, in run
    self.wait(sleep_time)
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
    time.sleep(seconds)

Stack for thread 281473394471424
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 365, in run
    self.wait(sleep_time)
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
    time.sleep(seconds)

Stack for thread 281471310557696
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 365, in run
    self.wait(sleep_time)
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
    time.sleep(seconds)

Stack for thread 281471838974464
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 365, in run
    self.wait(sleep_time)
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
    time.sleep(seconds)

Stack for thread 281471847428608
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/hub.py", line 365, in run
    self.wait(sleep_time)
  File "/usr/local/lib/python3.6/site-packages/eventlet/hubs/poll.py", line 77, in wait
    time.sleep(seconds)

Stack for thread 281471855882752
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281471864336896
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281471872791040
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281471881245184
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281471889699328
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472375845376
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472384299520
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472392753664
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472401207808
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472409661952
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472418116096
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472426570240
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472979825152
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472988279296
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281472996733440
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281473005187584
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281473013641728
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281473022095872
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281473030550016
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281473111552512
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/eventlet/tpool.py", line 75, in tworker
    msg = _reqq.get()
  File "/usr/lib64/python3.6/queue.py", line 164, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.6/threading.py", line 295, in wait
    waiter.acquire()

Stack for thread 281473120006656
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/site-packages/nova/virt/libvirt/host.py", line 205, in _native_thread
    libvirt.virEventRunDefaultImpl()
  File "/usr/local/lib64/python3.6/site-packages/libvirt.py", line 442, in virEventRunDefaultImpl
    ret = libvirtmod.virEventRunDefaultImpl()

Stack for thread 281473614563696
  File "/usr/local/lib/python3.6/site-packages/eventlet/green/thread.py", line 42, in __thread_body
    func(*args, **kwargs)
  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.6/site-packages/eventlet/green/thread.py", line 63, in wrap_bootstrap_inner
    bootstrap_inner()
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "<string>", line 167, in run
  File "/usr/lib64/python3.6/code.py", line 233, in interact
    more = self.push(line)
  File "/usr/lib64/python3.6/code.py", line 259, in push
    more = self.runsource(source, self.filename)
  File "/usr/lib64/python3.6/code.py", line 75, in runsource
    self.runcode(code)
  File "/usr/lib64/python3.6/code.py", line 91, in runcode
    exec(code, self.locals)
  File "<console>", line 3, in <module>

========================================================

I tried to analyze why I got stuck, but it took me a long time to figure it out, so looking for community help, thanks.

lijie.xie (none2021)
description: updated
Revision history for this message
lijie.xie (none2021) wrote :

More configuration information:
=========================================

enable log_config_append config in nova.conf
as follows:

nova.conf
DEFAULT:
log_config_append: /etc/nova/logging.conf

[nova@node-2 /]$ cat /etc/nova/logging.conf
[formatter_context]
class = oslo_log.formatters.ContextFormatter
datefmt = %Y-%m-%d %H:%M:%S
[formatter_default]
datefmt = %Y-%m-%d %H:%M:%S
format = %(message)s
[formatters]
keys = context,default
[handler_null]
args = ()
class = logging.NullHandler
formatter = default
[handler_stderr]
args = (sys.stderr,)
class = StreamHandler
formatter = context
[handler_stdout]
args = (sys.stdout,)
class = StreamHandler
formatter = context
[handlers]
keys = stdout,stderr,null
[logger_amqp]
handlers = stderr
level = WARNING
qualname = amqp
[logger_amqplib]
handlers = stderr
level = WARNING
qualname = amqplib
[logger_boto]
handlers = stderr
level = WARNING
qualname = boto
[logger_eventletwsgi]
handlers = stderr
level = WARNING
qualname = eventlet.wsgi.server
[logger_nova]
handlers = stdout
level = INFO
qualname = nova
[logger_os.brick]
handlers = stdout
level = INFO
qualname = os.brick
[logger_placement]
handlers = stdout
level = INFO
qualname = placement
[logger_root]
handlers = null
level = WARNING
[logger_sqlalchemy]
handlers = stderr
level = WARNING
qualname = sqlalchemy
[loggers]
keys = root,nova,os.brick,placement

Steps to reproduce
==================

Trigger blocking when the following two conditions are met at the same time:
1、rabbitmq is unavailable
2、_sync_power_states is being executed
And the result is that nova-compute main threading not refresh resource and it do not reconnect rabbitmq again

if disbale log_config_append in nova.conf and set debug=True,
the condition 1 and 2 are met
nova-compute main threading refresh resource normally but it do not reconnect rabbitmq again

Revision history for this message
Uggla (rene-ribaud) wrote :

Can you please add the nova and client version ?

Maybe I miss something, but nova cannot work correctly if you have issues with rabbitmq.
So I don't catch what is causing rabbitmq unavailability?

Changed in nova:
status: New → Incomplete
Revision history for this message
lijie.xie (none2021) wrote :

nova Wallaby
python-novaclient 17.4.0

yeah,got the result that if I remove "log_config_append=/etc/nova/logging.conf" and add "debug=true" in nova.conf, the compute process will not stuck when rabbitmq unavailable. The unavailability of rabbitmq will cause the nova-compute process to lose the tcp connection to MQ and never reconnect again, which is bad.
So even though rabbitmq is back to work, nova-compute still cannot return to normal.
It mainly includes that the heartbeat is no longer reported, and the RPC-related actions cannot be processed.

I constructed the same scenario to the Newton environment,it can reconnect when rabbitmq is unavailable although it will fail. But as soon as rabbitmq is back to work, nova-compute will go back to normal.

what is causing rabbitmq unavailability?
I keep restarting the rabbitmq service simulating the problem in the environment.

In the N version, heartbeat connection detection is done through coroutines. Starting from the S version, the community introduced the heartbeat_in_pthread configuration item of the oslo messaging library, which is officially enabled in the W version, and the default value is True. The function corresponding to this configuration is to run the heartbeat detection in a thread.
https://review.opendev.org/c/openstack/oslo-specs/+/661314/4/specs/train/amqp-rabbitmq-heartbeat-isolated-pthread.rst#60

At present, the heartbeat_in_pthread value is set to False, and the heartbeat detection is still performed by the coroutine. It works normally.

Revision history for this message
lijie.xie (none2021) wrote :
Revision history for this message
sean mooney (sean-k-mooney) wrote :

this is probably related too https://bugs.launchpad.net/oslo.messaging/+bug/1949964

if you have a node in this state can you check if it has a lot of open file desciptors

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.