In a stable/queens devstack environment with multiple PowerVM compute nodes, everytime I see this in <email address hidden> logs:
Jul 11 15:48:57 myhostname nova-conductor[3796]: DEBUG nova.conductor.manager [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin admin] Rescheduling: True {{(pid=4108) build_instances /opt/stack/nova/nova/conductor/manager.py:571}}
it is shortly thereafter followed by:
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin admin] Exception during message handling: CantStartEngineError: No sql_connection parameter is established
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/manager.py", line 652, in build_instances
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server host.service_host))
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/availability_zones.py", line 95, in get_host_availability_zone
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server key='availability_zone')
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/objects/aggregate.py", line 541, in get_by_host
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server _get_by_host_from_db(context, host, key=key)]
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 987, in wrapper
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server with self._transaction_scope(context):
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self.gen.next()
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1037, in _transaction_scope
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server context=context) as resource:
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self.gen.next()
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 640, in _session
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server bind=self.connection, mode=self.mode)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 404, in _create_session
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server self._start()
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 491, in _start
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server engine_args, maker_args)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 513, in _setup_for_connection
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server "No sql_connection parameter is established")
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server CantStartEngineError: No sql_connection parameter is established
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server
The nova_cell1.conf does have [database]connection set:
[database]
connection = mysql+pymysql://root:mysql@127.0.0.1/nova_cell1?charset=utf8
This may be related to https://bugs.launchpad.net/nova/+bug/1736946 , though that is supposedly fixed in stable/queens and the trace is different, hence the new defect.
It's not the [database] /connection that matters, build requests are in API DB which isn't configured in the cell conductor in default devstack.
If you have a recreate, can you log the filter_properties when this fails? Because the logic is based on the num_attempts count from the retries:
except Exception as exc:
num_ attempts = filter_ properties. get(
'retry' , {}).get( 'num_attempts' , 1)
self. _set_vm_ state_and_ notify(
context, instance.uuid, 'build_instances', updates,
exc, request_spec)
try:
# If the BuildRequest stays around then instance
# show/lists will pull from it rather than the errored
# instance.
self. _destroy_ build_request( context, instance)
updates = {'vm_state': vm_states.ERROR, 'task_state': None}
for instance in instances:
# If num_attempts > 1, we're in a reschedule and probably
# either hit NoValidHost or MaxRetriesExceeded. Either way,
# the build request should already be gone and we probably
# can't reach the API DB from the cell conductor.
if num_attempts <= 1: