Timed out waiting for Nova hypervisor-stats count >= 1 due to Nova Unable to establish connection to http://127.0.0.1:35357/v2.0/tokens

Bug #1537076 reported by Dmitry Tantsur
32
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
Critical
Dmitry Tantsur
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

e.g. seems like http://logs.openstack.org/61/246161/12/check/gate-tempest-dsvm-ironic-pxe_ipa-ipxe/169b905/logs/screen-n-cpu.txt.gz?#_2016-01-22_09_01_00_240 causes http://logs.openstack.org/61/246161/12/check/gate-tempest-dsvm-ironic-pxe_ipa-ipxe/169b905/logs/devstacklog.txt.gz#_2016-01-22_09_08_19_910

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 671, in run_service
    service.start()
  File "/opt/stack/new/nova/nova/service.py", line 198, in start
    self.manager.pre_start_hook()
  File "/opt/stack/new/nova/nova/compute/manager.py", line 1340, in pre_start_hook
    self.update_available_resource(nova.context.get_admin_context())
  File "/opt/stack/new/nova/nova/compute/manager.py", line 6290, in update_available_resource
    nodenames = set(self.driver.get_available_nodes())
  File "/opt/stack/new/nova/nova/virt/ironic/driver.py", line 554, in get_available_nodes
    self._refresh_cache()
  File "/opt/stack/new/nova/nova/virt/ironic/driver.py", line 537, in _refresh_cache
    for node in self._get_node_list(detail=True, limit=0):
  File "/opt/stack/new/nova/nova/virt/ironic/driver.py", line 476, in _get_node_list
    node_list = self.ironicclient.call("node.list", **kwargs)
  File "/opt/stack/new/nova/nova/virt/ironic/client_wrapper.py", line 136, in call
    client = self._get_client(retry_on_conflict=retry_on_conflict)
  File "/opt/stack/new/nova/nova/virt/ironic/client_wrapper.py", line 86, in _get_client
    cli = ironic.client.get_client(CONF.ironic.api_version, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ironicclient/client.py", line 86, in get_client
    _ksclient = _get_ksclient(**ks_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ironicclient/client.py", line 35, in _get_ksclient
    insecure=kwargs.get('insecure'))
  File "/usr/local/lib/python2.7/dist-packages/keystoneclient/v2_0/client.py", line 166, in __init__
    self.authenticate()
  File "/usr/local/lib/python2.7/dist-packages/keystoneclient/utils.py", line 337, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keystoneclient/httpclient.py", line 589, in authenticate
    resp = self.get_raw_token_from_identity_service(**kwargs)
  File "/usr/local/lib/python2.7/dist-packages/keystoneclient/v2_0/client.py", line 210, in get_raw_token_from_identity_service
    _("Authorization Failed: %s") % e)
AuthorizationFailure: Authorization Failed: Unable to establish connection to http://127.0.0.1:35357/v2.0/tokens

Tags: compute ironic
Dmitry Tantsur (divius)
description: updated
Revision history for this message
aeva black (tenbrae) wrote :

I've proposed search query to elastic-recheck here: https://review.openstack.org/#/c/271496/

Revision history for this message
Vladyslav Drok (vdrok) wrote :

This failure is seen only in n-cpu log, in ir-api I see the following:

2016-01-27 08:34:21.572 30788 DEBUG keystoneauth.identity.v2 [-] Making authentication request to http://127.0.0.1:35357/v2.0/tokens get_auth_ref /usr/local/lib/python2.7/dist-packages/keystoneauth1/identity/v2.py:62
2016-01-27 08:34:21.641 30788 DEBUG keystoneauth.session [-] REQ: curl -g -i --cacert "/opt/stack/data/ca-bundle.pem" -X GET http://127.0.0.1:35357/v3/auth/tokens -H "X-Subject-Token: {SHA1}6f4792750ecc5c46afececfa19849e2df8c71113" -H "User-Agent: python-keystoneclient" -H "Accept: application/json" -H "X-Auth-Token: {SHA1}f279ba41bea3ff4737634c791e70d6dbc1ba4758" _http_log_request /usr/local/lib/python2.7/dist-packages/keystoneauth1/session.py:225

So the request actually goes to v3, despite v2 is logged.

Dmitry Tantsur (divius)
Changed in ironic:
importance: High → Critical
Revision history for this message
Dmitry Tantsur (divius) wrote :

New finding: it looks like keystone does not even start when the failure happens. We should probably NOT fail Nova on start up if we can't connect to it...

tags: added: ironic
Revision history for this message
Dmitry Tantsur (divius) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/273059

Changed in nova:
assignee: nobody → Dmitry Tantsur (divius)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/273129

Changed in ironic:
assignee: nobody → Dmitry Tantsur (divius)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/273129
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=fde62528afd5c88737999a3deabeba6c2d389e88
Submitter: Jenkins
Branch: master

commit fde62528afd5c88737999a3deabeba6c2d389e88
Author: Dmitry Tantsur <email address hidden>
Date: Wed Jan 27 17:10:58 2016 +0100

    [devstack] Restart nova compute before checking hypervisor stats

    Nova compute may fail if it was unable to fetch nodes list from Ironic
    during startup. This change makes sure we restart it.

    Change-Id: Ibddd34faa0eeb9b42fe0e8d73f14907b1df11734
    Closes-Bug: #1537076

Changed in ironic:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Dmitry Tantsur (<email address hidden>) on branch: master
Review: https://review.openstack.org/273059

Revision history for this message
Dmitry Tantsur (divius) wrote :

We've worked around it in ironic, but anyway it would be good to properly fix it in the nova driver.

Changed in nova:
assignee: Dmitry Tantsur (divius) → nobody
status: In Progress → New
tags: added: compute
Revision history for this message
Matt Riedemann (mriedem) wrote :

I've proposed a stable/liberty devstack backport here: https://review.openstack.org/#/c/277920/

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Sean Dague (sdague) wrote :

It seems really weird about why keystone just stops working. That is the thing that really needs to be sorted out.

Changed in nova:
status: New → Invalid
Revision history for this message
Sean Dague (sdague) wrote :

I don't really think that Nova not being able to talk to Ironic when Keystone is there is a Nova bug. Why keystone failed is the real issue to figure out

Revision history for this message
Dmitry Tantsur (divius) wrote :

Sean, keystone was restarting during devstack run, and it happened at the same time as nova compute was starting. At least that's how I understand it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/280642

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic (master)

Change abandoned by Pavlo Shchelokovskyy (<email address hidden>) on branch: master
Review: https://review.openstack.org/280642
Reason: squashed into https://review.openstack.org/#/c/280601/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic (master)

Reviewed: https://review.openstack.org/280601
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=001513f4234ab681d33c9d1f3cd97a005b338df9
Submitter: Jenkins
Branch: master

commit 001513f4234ab681d33c9d1f3cd97a005b338df9
Author: Dmitry Tantsur <email address hidden>
Date: Tue Feb 16 11:53:16 2016 +0100

    [DevStack] fix restart of nova compute

    We introduced nova compute restart to handle problems of it failure
    to start when keystone is not available. However, now we get broken
    again because stop_service fails when nova compute is not running.
    This change makes it ignore failure to stop nova compute.

    Also, if n-cpu failed to start the first time, the n-cpu.failure file
    that
    was created will fail the service_check function in the end
    of the deployment, although n-cpu might be in fact up, running and
    healthy.

    Rename this file to `n-cpu.failure.before-restart-by-ironic` right after
    stopping n-cpu so that it does not fail service_check later, but is
    still
    available for reference if needed.

    Change-Id: Idfe7d68d6ebeb786fe9e6da6a001a0ee2e10a76b
    Closes-Bug: #1546058
    Related-Bug: #1537076

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/ironic 5.0.0

This issue was fixed in the openstack/ironic 5.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/318479

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/320040

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic (master)

Reviewed: https://review.openstack.org/318479
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=19703205aefb11a62d492de77a18c5c836e15a0e
Submitter: Jenkins
Branch: master

commit 19703205aefb11a62d492de77a18c5c836e15a0e
Author: Vasyl Saienko <email address hidden>
Date: Thu May 19 10:33:00 2016 +0300

    Restart n-cpu after Ironic install

    Ironic install initiates apache restart, it may cause nova-compute
    failed to start.
    This patch restart nova-compute after Ironic install for new phase.
    And ensures that keystone is responding before starting n-cpu.

    Related-Bug: 1537076

    Change-Id: Id229617ec5cf2820c1033fd2c34fc160a75c46d6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sam Betts (<email address hidden>) on branch: master
Review: https://review.openstack.org/320040
Reason: This plan is superseded by https://review.openstack.org/#/c/323851

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/python-ironicclient 1.4.0

This issue was fixed in the openstack/python-ironicclient 1.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.