Comment 4 for bug 1781648

Matt Riedemann (mriedem) wrote :

Potentially related failure:

http://logs.openstack.org/34/585034/20/check/nova-tox-functional/2f1e5d1/job-output.txt.gz#_2018-08-28_22_17_16_476578

But NoValidHost in this case. And I think gibi is onto something here - in this failure, the compute service starts up but the node isn't in the DB before the scheduler starts filtering so we get NoValidHost:

2018-08-28 22:17:16.508812 | ubuntu-xenial | 2018-08-28 22:17:10,004 INFO [nova.service] Starting compute node (version 18.0.0)
2018-08-28 22:17:16.509229 | ubuntu-xenial | 2018-08-28 22:17:10,046 WARNING [nova.compute.manager] No compute node record found for host cell2. If this is the first time this service is starting on this host, then you can ignore this warning.
2018-08-28 22:17:16.509585 | ubuntu-xenial | 2018-08-28 22:17:10,047 WARNING [nova.compute.monitors] Excluding nova.compute.monitors.cpu monitor virt_driver. Not in the list of enabled monitors (CONF.compute_monitors).
2018-08-28 22:17:16.509862 | ubuntu-xenial | 2018-08-28 22:17:10,053 WARNING [nova.compute.resource_tracker] No compute node record for cell2:cell2
2018-08-28 22:17:16.510072 | ubuntu-xenial | 2018-08-28 22:17:10,060 INFO [nova.filters] Filter RetryFilter returned 0 hosts
2018-08-28 22:17:16.510476 | ubuntu-xenial | 2018-08-28 22:17:10,060 INFO [nova.filters] Filtering removed all hosts for the request with instance ID '275c6524-e5d8-4082-b8ff-61ba9f1f58a2'. Filter results: ['RetryFilter: (start: 0, end: 0)']
2018-08-28 22:17:16.510792 | ubuntu-xenial | 2018-08-28 22:17:10,061 INFO [nova.compute.resource_tracker] Compute node record created for cell2:cell2 with uuid: fd8fbe55-302c-4bfc-a9d9-fa636813b377
2018-08-28 22:17:16.510977 | ubuntu-xenial | 2018-08-28 22:17:10,069 ERROR [nova.conductor.manager] Failed to schedule instances
2018-08-28 22:17:16.511074 | ubuntu-xenial | Traceback (most recent call last):
2018-08-28 22:17:16.511248 | ubuntu-xenial | File "nova/conductor/manager.py", line 1206, in schedule_and_build_instances
2018-08-28 22:17:16.511361 | ubuntu-xenial | instance_uuids, return_alternates=True)
2018-08-28 22:17:16.511517 | ubuntu-xenial | File "nova/conductor/manager.py", line 723, in _schedule_instances
2018-08-28 22:17:16.511624 | ubuntu-xenial | return_alternates=return_alternates)
2018-08-28 22:17:16.511755 | ubuntu-xenial | File "nova/scheduler/utils.py", line 907, in wrapped
2018-08-28 22:17:16.511848 | ubuntu-xenial | return func(*args, **kwargs)
2018-08-28 22:17:16.512017 | ubuntu-xenial | File "nova/scheduler/client/__init__.py", line 53, in select_destinations
2018-08-28 22:17:16.512148 | ubuntu-xenial | instance_uuids, return_objects, return_alternates)
2018-08-28 22:17:16.516439 | ubuntu-xenial | File "nova/scheduler/client/__init__.py", line 37, in __run_method
2018-08-28 22:17:16.516634 | ubuntu-xenial | return getattr(self.instance, __name)(*args, **kwargs)
2018-08-28 22:17:16.517085 | ubuntu-xenial | File "nova/scheduler/client/query.py", line 42, in select_destinations
2018-08-28 22:17:16.517225 | ubuntu-xenial | instance_uuids, return_objects, return_alternates)
2018-08-28 22:17:16.517385 | ubuntu-xenial | File "nova/scheduler/rpcapi.py", line 158, in select_destinations
2018-08-28 22:17:16.517533 | ubuntu-xenial | return cctxt.call(ctxt, 'select_destinations', **msg_args)
2018-08-28 22:17:16.517832 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/rpc/client.py", line 179, in call
2018-08-28 22:17:16.517907 | ubuntu-xenial | retry=self.retry)
2018-08-28 22:17:16.518206 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/transport.py", line 133, in _send
2018-08-28 22:17:16.518270 | ubuntu-xenial | retry=retry)
2018-08-28 22:17:16.518584 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_fake.py", line 222, in send
2018-08-28 22:17:16.518743 | ubuntu-xenial | return self._send(target, ctxt, message, wait_for_reply, timeout)
2018-08-28 22:17:16.519058 | ubuntu-xenial | File "/home/zuul/src/git.openstack.org/openstack/nova/.tox/py27/local/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_fake.py", line 209, in _send
2018-08-28 22:17:16.519125 | ubuntu-xenial | raise failure
2018-08-28 22:17:16.519293 | ubuntu-xenial | NoValidHost: No valid host was found. There are not enough hosts available.

Maybe this is due to these tests using the CachingScheduler, which I think on startup gets the initial host states (compute nodes) and then caches them, and the test is likely starting the computes after the scheduler so that's probably where the race lies.