Nova compute dies if it cannot authenticate to RabbitMQ

Bug #1752736 reported by Mohammed Naser on 2018-03-01
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Ye Huang

Bug Description

At the moment, nova-compute will die if it fails to authenticate to the messaging cluster and it will not retry on start. It is possible that the vhost is not ready yet so it should be handled here:

https://github.com/openstack/nova/blob/stable/pike/nova/conductor/api.py#L61-L78

Sam Hague (shague-v) wrote :

2017-10-27 20:28:57.855 29039 CRITICAL nova [req-deca1acd-7a73-4bac-a1b6-95ae5466e8a0 - -] Unhandled error: NotAllowed: Connection.open: (530) NOT_ALLOWED - access to vhost 'nova_cell1' refused for user 'stackrabbit'
2017-10-27 20:28:57.855 29039 ERROR nova Traceback (most recent call last):
2017-10-27 20:28:57.855 29039 ERROR nova File "/usr/bin/nova-compute", line 10, in <module>
2017-10-27 20:28:57.855 29039 ERROR nova sys.exit(main())
2017-10-27 20:28:57.855 29039 ERROR nova File "/opt/stack/nova/nova/cmd/compute.py", line 57, in main
2017-10-27 20:28:57.855 29039 ERROR nova topic=compute_rpcapi.RPC_TOPIC)
2017-10-27 20:28:57.855 29039 ERROR nova File "/opt/stack/nova/nova/service.py", line 240, in create
2017-10-27 20:28:57.855 29039 ERROR nova periodic_interval_max=periodic_interval_max)
2017-10-27 20:28:57.855 29039 ERROR nova File "/opt/stack/nova/nova/service.py", line 126, in __init__
2017-10-27 20:28:57.855 29039 ERROR nova conductor_api.wait_until_ready(context.get_admin_context())
2017-10-27 20:28:57.855 29039 ERROR nova File "/opt/stack/nova/nova/conductor/api.py", line 67, in wait_until_ready
2017-10-27 20:28:57.855 29039 ERROR nova timeout=timeout)
2017-10-27 20:28:57.855 29039 ERROR nova File "/opt/stack/nova/nova/baserpc.py", line 58, in ping
2017-10-27 20:28:57.855 29039 ERROR nova return cctxt.call(context, 'ping', arg=arg_p)
2017-10-27 20:28:57.855 29039 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 169, in call
2017-10-27 20:28:57.855 29039 ERROR nova retry=self.retry)
2017-10-27 20:28:57.855 29039 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_messaging/transport.py", line 123, in _send
2017-10-27 20:28:57.855 29039 ERROR nova timeout=timeout, retry=retry)
2017-10-27 20:28:57.855 29039 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 578, in send
2017-10-27 20:28:57.855 29039 ERROR nova retry=retry)
2017-10-27 20:28:57.855 29039 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 522, in _send
2017-10-27 20:28:57.855 29039 ERROR nova msg.update({'_reply_q': self._get_reply_q()})
2017-10-27 20:28:57.855 29039 ERROR nova File "/usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 503, in _get_reply_q
2017-10-2

Matt Riedemann (mriedem) on 2018-03-08
tags: added: compute
tags: removed: compute
melanie witt (melwitt) wrote :

This would have to be handled a bit differently than the existing timeout code that was linked in the bug report.

I think we would want to catch NotAllowed specifically and sleep(timeout) before the next attempt, or similar.

tags: added: conductor
Changed in nova:
importance: Undecided → Medium
status: New → Triaged
tags: added: low-hanging-fruit
Ye Huang (littlemiaowu) on 2018-08-05
Changed in nova:
assignee: nobody → Ye Huang (littlemiaowu)
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers