Lots of gate failures with "not enough hosts available"

Bug #1441745 reported by David Kranz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
tempest
Invalid
Undecided
Unassigned

Bug Description

Thousands of matches in the last two days:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiTm8gdmFsaWQgaG9zdCB3YXMgZm91bmQuIFRoZXJlIGFyZSBub3QgZW5vdWdoIGhvc3RzIGF2YWlsYWJsZVwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiIxNzI4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDI4NTA4MTY3MTcwfQ==

The following is from this log file:

http://logs.openstack.org/42/163842/8/check/check-tempest-dsvm-neutron-full/1f66320/logs/screen-n-cond.txt.gz

For the few I looked at, there is an error in the n-cond log:

2015-04-08 07:20:15.207 WARNING nova.scheduler.utils [req-a21c9875-efe1-407d-b08b-2b05b35b4642 AggregatesAdminTestJSON-325246720 AggregatesAdminTestJSON-279542170] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.
Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner
    return func(*args, **kwargs)

  File "/opt/stack/new/nova/nova/scheduler/manager.py", line 86, in select_destinations
    filter_properties)

  File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 80, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

--------------------------

That makes it sound like the problem is that the deployed devstack does not have enough capacity. But right before that I see:

2015-04-08 07:20:15.014 ERROR nova.conductor.manager [req-a21c9875-efe1-407d-b08b-2b05b35b4642 AggregatesAdminTestJSON-325246720 AggregatesAdminTestJSON-279542170] Instance update attempted for 'availability_zone' on 745aafcf-686d-4cf0-91c7-701e282f6d06
2015-04-08 07:20:15.149 ERROR nova.scheduler.utils [req-a21c9875-efe1-407d-b08b-2b05b35b4642 AggregatesAdminTestJSON-325246720 AggregatesAdminTestJSON-279542170] [instance: 745aafcf-686d-4cf0-91c7-701e282f6d06] Error from last host: devstack-trusty-rax-dfw-1769605.slave.openstack.org (node devstack-trusty-rax-dfw-1769605.slave.openstack.org): [u'Traceback (most recent call last):\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 2193, in _do_build_and_run_instance\n filter_properties)\n', u' File "/opt/stack/new/nova/nova/compute/manager.py", line 2336, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 745aafcf-686d-4cf0-91c7-701e282f6d06 was re-scheduled: u\'u"unexpected update keyword \\\'availability_zone\\\'"\\nTraceback (most recent call last):\\n\\n File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 142, in inner\\n return func(*args, **kwargs)\\n\\n File "/opt/stack/new/nova/nova/conductor/manager.py", line 125, in instance_update\\n raise KeyError("unexpected update keyword \\\'%s\\\'" % key)\\n\\nKeyError: u"unexpected update keyword \\\'availability_zone\\\'"\\n\'\n']

Revision history for this message
Matt Riedemann (mriedem) wrote :

The KeyError is only happening on one change in the check queue:

http://goo.gl/FpqCku

https://review.openstack.org/#/c/163842/

So that patch is busted, it's not a gate bug.

Changed in nova:
status: New → Invalid
Revision history for this message
David Kranz (david-kranz) wrote :

It is true that this particular sub-case of the bug title has only one patch responsible, there are many other patches shown in logstash that could not possibly cause this problem but which experience it. So this seems to be a problem that can randomly impact any patch. Though it may be difficult to find, it seems to me there is a bug here. The other possibility is that tempest is trying to create too many vms. I'm not sure how many tiny vms are expected to be supported by our devstack.

Revision history for this message
Dan Smith (danms) wrote :

David,

The KeyError that was causing this trace was added in that patch. We don't use the legacy instance_update call from any other place, and definitely not with availability_zone in the updates.

If you find other things, please open bugs for them, but this one was definitely contained to that single patch.

Changed in tempest:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.