'No valid host' during overcloud build when using Ironic

Bug #1308079 reported by Thom Leggett
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Michael Kerrin

Bug Description

Could be a ironic bug. Seems that RAM is not getting reported properly. From nova-scheduler.log

2014-04-15 13:38:58.626 4698 DEBUG nova.filters [req-e8a71cc6-2987-4f80-861b-239708ef6079 None] Filter AvailabilityZoneFilter returned 2 host(s) get_filtered_objects /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/filters.py:88
2014-04-15 13:38:58.626 4698 DEBUG nova.scheduler.filters.ram_filter [req-e8a71cc6-2987-4f80-861b-239708ef6079 None] (undercloud, d168e056-27d8-4223-a566-6b68bb49e9b0) ram:0 disk:0 io_ops:1 instances:1 does not have 4096 MB usable ram, it only has 0.0 MB usable ram. host_passes /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/scheduler/filters/ram_filter.py:60
2014-04-15 13:38:58.626 4698 DEBUG nova.scheduler.filters.ram_filter [req-e8a71cc6-2987-4f80-861b-239708ef6079 None] (undercloud, 10688214-ddc3-4933-923a-863c980c5b1e) ram:0 disk:0 io_ops:1 instances:1 does not have 4096 MB usable ram, it only has 0.0 MB usable ram. host_passes /opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/scheduler/filters/ram_filter.py:60
2014-04-15 13:38:58.627 4698 INFO nova.filters [req-e8a71cc6-2987-4f80-861b-239708ef6079 None] Filter RamFilter returned 0 hosts
2014-04-15 13:38:58.627 4698 WARNING nova.scheduler.driver [req-e8a71cc6-2987-4f80-861b-239708ef6079 None] [instance: 19e55a4e-a2d9-4dcb-a348-111683c24d03] Setting instance to ERROR state.

Also seeing this in ironic-api.log for the affected node:

NodeLocked: Node bf61976d-d18f-4be6-b590-1ef32237a800 is locked by host undercloud-undercloud-nauh6oabmc3q.novalocal, please retry after the current operation is completed.
". Detail:
Traceback (most recent call last):

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/wsmeext/pecan.py", line 77, in callfunction
    result = f(self, *args, **kwargs)

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/api/controllers/v1/node.py", line 227, in provision
    pecan.request.context, node_uuid, topic)

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/conductor/rpcapi.py", line 200, in do_node_tear_down
    topic=topic or self.topic)

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/openstack/common/rpc/proxy.py", line 125, in call
    result = rpc.call(context, real_topic, msg, timeout)

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/openstack/common/rpc/__init__.py", line 112, in call
    return _get_impl().call(CONF, context, topic, msg, timeout)

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/openstack/common/rpc/impl_kombu.py", line 815, in call
    rpc_amqp.get_connection_pool(conf, Connection))

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/openstack/common/rpc/amqp.py", line 575, in call
    rv = list(rv)

  File "/opt/stack/venvs/ironic/local/lib/python2.7/site-packages/ironic/openstack/common/rpc/amqp.py", line 540, in __iter__
    raise result

Possibly linked to ironic bug: 1308045 as I see this in ironic-conductor.log:

RuntimeError: Second simultaneous read on fileno 9 detected. Unless you really know what you're doing, make sure that only one greenthread can read any particular socket. Consider using a pools.Pool. If you do know what you're doing and want to disable this error, call eventlet.debug.hub_prevent_multiple_readers(False)

Tags: ironic
Thom Leggett (tteggel)
tags: added: ironic
Thom Leggett (tteggel)
description: updated
description: updated
Revision history for this message
Ben Nemec (bnemec) wrote :

I'm pretty sure I saw the same thing on nova-baremetal so I don't think this is Ironic-specific. On a second runthrough of devtest I didn't see the same error, so this seems to be a race of some sort.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Michael Kerrin (michael-kerrin-w) wrote :

I am trying to track this down as well. It seems to be a race in eventlet when you read from a socket, and then close the socket. Once I get a bit more information I will link the bug here.

Note that this isn't specific to ironic. I usually see it trigged by nova downloading an image through glanceclient or nova processutils calling some qemu-* command. This might be because I am not yet using ironic.

Revision history for this message
jan grant (jan-grant) wrote :

Ditto to Ben's claim: there's a similar rare behaviour with nova-baremetal. When I went looking it wasn't clear if it was in the glanceclient (looked like the obvious culprit, but I couldn't back that up at the time) or eventlet itself. So, _something_ is amiss; think there's maybe a shared root cause. (The eventlet second simultaneous read problem?)

Revision history for this message
Michael Kerrin (michael-kerrin-w) wrote :

I have created a bug against eventlet here: https://github.com/eventlet/eventlet/issues/94 with my analysis on why this is an eventlet problem.

Changed in tripleo:
assignee: nobody → Michael Kerrin (michael-kerrin-w)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-image-elements (master)

Reviewed: https://review.openstack.org/109543
Committed: https://git.openstack.org/cgit/openstack/tripleo-image-elements/commit/?id=0011e19c15cee09c21a1ec5f772a480a677201b9
Submitter: Jenkins
Branch: master

commit 0011e19c15cee09c21a1ec5f772a480a677201b9
Author: Michael Kerrin <email address hidden>
Date: Wed Jun 25 11:32:47 2014 -0400

    Pull in patched eventlet to fix Second Simultaneous errors

    We have patched eventlet to resolve the issues around
    Second simultanous read errors. Pulling in the eventlet
    pull request here.

    This is a temporary workaround until the changes get merged
    into eventlet and released at which point this will be dropped.

    Closes-bug: #1308079
    Closed-bug: #1229475
    Closed-bug: #1308045
    Change-Id: I0495f8c443429d9bdbbd97d46fe9a8e180b867ee

Changed in tripleo:
status: In Progress → Fix Committed
Jay Dobies (jdob)
Changed in tripleo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.