Down state host rejoins cluster failed

Bug #1897716 reported by wangzhh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Medium
wangzhh

Bug Description

Description
===========
When a down state host rejoin to the cluster, nova-compute service failed to start. Here is the log:

ERROR oslo_service.service Traceback (most recent call last):
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_service/service.py", line 807, in run_service
ERROR oslo_service.service service.start()
ERROR oslo_service.service File "/home/stack/nova/nova/service.py", line 159, in start
ERROR oslo_service.service self.manager.init_host()
ERROR oslo_service.service File "/home/stack/nova/nova/compute/manager.py", line 1439, in init_host
ERROR oslo_service.service context, nodes_by_uuid)
ERROR oslo_service.service File "/home/stack/nova/nova/compute/manager.py", line 746, in _destroy_evacuated_instances
ERROR oslo_service.service bdi, destroy_disks)
ERROR oslo_service.service File "/home/stack/nova/nova/virt/libvirt/driver.py", line 1342, in destroy
ERROR oslo_service.service destroy_disks)
ERROR oslo_service.service File "/home/stack/nova/nova/virt/libvirt/driver.py", line 1414, in cleanup
ERROR oslo_service.service cleanup_instance_disks=cleanup_instance_disks)
ERROR oslo_service.service File "/home/stack/nova/nova/virt/libvirt/driver.py", line 1493, in _cleanup
ERROR oslo_service.service instance.save()
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 210, in wrapper
ERROR oslo_service.service ctxt, self, fn.__name__, args, kwargs)
ERROR oslo_service.service File "/home/stack/nova/nova/conductor/rpcapi.py", line 248, in object_action
ERROR oslo_service.service objmethod=objmethod, args=args, kwargs=kwargs)
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_messaging/rpc/client.py", line 179, in call
ERROR oslo_service.service transport_options=self.transport_options)
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_messaging/transport.py", line 128, in _send
ERROR oslo_service.service transport_options=transport_options)
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 654, in send
ERROR oslo_service.service transport_options=transport_options)
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 644, in _send
ERROR oslo_service.service raise result
ERROR oslo_service.service nova.exception_Remote.InstanceNotFound_Remote: Instance dd7d0109-34c5-4800-b8a1-0e28b208f75e could not be found.
ERROR oslo_service.service Traceback (most recent call last):
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/conductor/manager.py", line 139, in _object_dispatch
ERROR oslo_service.service return getattr(target, method)(*args, **kwargs)
ERROR oslo_service.service
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
ERROR oslo_service.service return fn(self, *args, **kwargs)
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/objects/instance.py", line 838, in save
ERROR oslo_service.service columns_to_join=_expected_cols(expected_attrs))
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/db/api.py", line 685, in instance_update_and_get_original
ERROR oslo_service.service expected=expected)
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/db/sqlalchemy/api.py", line 179, in wrapper
ERROR oslo_service.service return f(*args, **kwargs)
ERROR oslo_service.service
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_db/api.py", line 154, in wrapper
ERROR oslo_service.service ectxt.value = e.inner_exc
ERROR oslo_service.service
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
ERROR oslo_service.service self.force_reraise()
ERROR oslo_service.service
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
ERROR oslo_service.service six.reraise(self.type_, self.value, self.tb)
ERROR oslo_service.service
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise
ERROR oslo_service.service raise value
ERROR oslo_service.service
ERROR oslo_service.service File "/usr/local/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper
ERROR oslo_service.service return f(*args, **kwargs)
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/db/sqlalchemy/api.py", line 222, in wrapped
ERROR oslo_service.service return f(context, *args, **kwargs)
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/db/sqlalchemy/api.py", line 2106, in instance_update_and_get_original
ERROR oslo_service.service columns_to_join=columns_to_join)
ERROR oslo_service.service
ERROR oslo_service.service File "/home/stack/nova/nova/db/sqlalchemy/api.py", line 1234, in _instance_get_by_uuid
ERROR oslo_service.service raise exception.InstanceNotFound(instance_id=uuid)
ERROR oslo_service.service
ERROR oslo_service.service nova.exception.InstanceNotFound: Instance dd7d0109-34c5-4800-b8a1-0e28b208f75e could not be found.
ERROR oslo_service.service
ERROR oslo_service.service
DEBUG oslo_concurrency.lockutils [None req-cdd7334f-ef15-45cd-b732-d86e59fe40f6 None None] Acquired lock "singleton_lock" {{(pid=122256) lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:266}}
DEBUG oslo_concurrency.lockutils [None req-cdd7334f-ef15-45cd-b732-d86e59fe40f6 None None] Releasing lock "singleton_lock" {{(pid=122256) lock /usr/local/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:282}}

Steps to reproduce
==================
* Force down a compute node or there is a down host.
* nova evacuate {instance_id} # the instance should launch at the down host before.
* nova delete {instance_id}
* nova force-down-service {service_id} --unset
* Restart nova-compute service

Expected result
===============
Host rejoins the cluster.

Actual result
=============
compute start failed

Environment
===========
Trigger this bug in Q release. And still exists in my devstack(master) environment.

wangzhh (wangzhh)
Changed in nova:
assignee: nobody → wangzhh (wangzhh)
summary: - Down state host rejoin cluster failed
+ Down state host rejoins cluster failed
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I was able to reproduce the problem on a mutlinode devstack started from master. Marking this bug as Confirmed.

As you reported that the same problem can be observed in Queens I don't mark it as an RC2 potential for Victoria as it is not a Victoria regression.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
tags: added: compute evacuate
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

As far as I see if I restart the failed compute a second time then it starts up properly. So a double restart can be a workaround.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

@WangZhh: You have assigned the bug to yourself. Will you provide a bugfix?

Revision history for this message
wangzhh (wangzhh) wrote :

@Balazs Gibizer Yep, I'll provide today or tomorrow. Sorry for misread your reply.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/757053

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Thanks Wangzhh. I will check your fix soon.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.