InternalServerError: Internal Server Error (HTTP 500) in n-cpu logs on startup with Ironic driver

Bug #1706772 reported by Matt Riedemann
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Invalid
High
Unassigned
OpenStack Compute (nova)
Fix Released
High
Ed Leafe
Ocata
Fix Committed
Medium
Matt Riedemann

Bug Description

The init_host stacktraces on startup, killing the thread:

http://logs.openstack.org/49/484949/14/check/gate-tempest-dsvm-ironic-ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-nv/69b18d7/logs/screen-n-cpu.txt.gz?level=TRACE

Jul 24 22:26:26.423871 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service [None req-414af81a-44e7-4a67-9cbd-74e549deb41b None None] Error starting thread.: InternalServerError: Internal Server Error (HTTP 500)
Jul 24 22:26:26.423993 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service Traceback (most recent call last):
Jul 24 22:26:26.424084 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 721, in run_service
Jul 24 22:26:26.424167 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service service.start()
Jul 24 22:26:26.424250 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/service.py", line 143, in start
Jul 24 22:26:26.424330 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service self.manager.init_host()
Jul 24 22:26:26.424417 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/compute/manager.py", line 1100, in init_host
Jul 24 22:26:26.424498 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service self.driver.init_host(host=self.host)
Jul 24 22:26:26.424638 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/virt/ironic/driver.py", line 449, in init_host
Jul 24 22:26:26.424717 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service self._refresh_cache()
Jul 24 22:26:26.424798 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/virt/ironic/driver.py", line 608, in _refresh_cache
Jul 24 22:26:26.425096 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service for node in self._get_node_list(detail=True, limit=0):
Jul 24 22:26:26.425187 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/virt/ironic/driver.py", line 527, in _get_node_list
Jul 24 22:26:26.425278 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service node_list = self.ironicclient.call("node.list", **kwargs)
Jul 24 22:26:26.425359 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/opt/stack/new/nova/nova/virt/ironic/client_wrapper.py", line 146, in call
Jul 24 22:26:26.425440 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service return self._multi_getattr(client, method)(*args, **kwargs)
Jul 24 22:26:26.425536 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/ironicclient/v1/node.py", line 143, in list
Jul 24 22:26:26.425614 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service limit=limit)
Jul 24 22:26:26.425708 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/ironicclient/common/base.py", line 149, in _list_pagination
Jul 24 22:26:26.425788 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service resp, body = self.api.json_request('GET', url)
Jul 24 22:26:26.425866 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/ironicclient/common/http.py", line 558, in json_request
Jul 24 22:26:26.425945 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service resp = self._http_request(url, method, **kwargs)
Jul 24 22:26:26.426027 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/ironicclient/common/http.py", line 188, in wrapper
Jul 24 22:26:26.426114 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service return func(self, url, method, **kwargs)
Jul 24 22:26:26.426198 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service File "/usr/local/lib/python2.7/dist-packages/ironicclient/common/http.py", line 540, in _http_request
Jul 24 22:26:26.426277 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service error_json.get('debuginfo'), method, url)
Jul 24 22:26:26.426355 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service InternalServerError: Internal Server Error (HTTP 500)
Jul 24 22:26:26.426437 ubuntu-xenial-osic-cloud1-disk-10052658 nova-compute[13755]: ERROR oslo_service.service

I guess the Ironic API service isn't yet up at that point?

Tags: compute ironic
Revision history for this message
Matt Riedemann (mriedem) wrote :

Actually this wasn't caused by the migrate instance flavors change mentioned above, it was happening before that as well:

http://logs.openstack.org/80/461480/5/check/gate-tempest-dsvm-ironic-ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-nv/834477a/logs/screen-n-cpu.txt.gz?level=TRACE#_Jul_19_03_34_01_687299

So something has thrown off the timing of when nova-compute starts up and when the ironic-api is available.

This wasn't an issue in Ocata. From Ocata CI logs, the init_host in the driver happens around here:

http://logs.openstack.org/63/485263/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-nv/de2c924/logs/screen-n-cpu.txt.gz#_2017-07-21_12_07_20_885

And then get_available_nodes is here:

http://logs.openstack.org/63/485263/2/check/gate-tempest-dsvm-ironic-ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-nv/de2c924/logs/screen-n-cpu.txt.gz#_2017-07-21_12_07_21_167

So that's not blowing up like init_host is blowing up in Pike.

description: updated
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (8.6 KiB)

Looks like the failure is in ironic-api on startup:

http://logs.openstack.org/80/461480/5/check/gate-tempest-dsvm-ironic-ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-nv/834477a/logs/screen-ir-api.txt.gz#_Jul_19_03_34_01_680460

Jul 19 03:34:01.678194 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: WARNING keystoneauth.identity.generic.base [-] Failed to discover available identity versions when contacting http://10.0.1.4/identity. Attempting to parse version from URL.: ConnectFailure: Unable to establish connection to http://10.0.1.4/identity: HTTPConnectionPool(host='10.0.1.4', port=80): Max retries exceeded with url: /identity (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f06c5d86050>: Failed to establish a new connection: [Errno 111] ECONNREFUSED',))
Jul 19 03:34:01.680460 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: INFO eventlet.wsgi.server [-] Traceback (most recent call last):
Jul 19 03:34:01.680615 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/usr/local/lib/python2.7/dist-packages/eventlet/wsgi.py", line 490, in handle_one_response
Jul 19 03:34:01.680747 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: result = self.application(self.environ, start_response)
Jul 19 03:34:01.680907 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/opt/stack/new/ironic/ironic/api/app.py", line 114, in __call__
Jul 19 03:34:01.681047 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: return self.v1(environ, start_response)
Jul 19 03:34:01.681215 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/usr/local/lib/python2.7/dist-packages/webob/dec.py", line 131, in __call__
Jul 19 03:34:01.681359 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: resp = self.call_func(req, *args, **self.kwargs)
Jul 19 03:34:01.681488 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/usr/local/lib/python2.7/dist-packages/webob/dec.py", line 196, in call_func
Jul 19 03:34:01.681622 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: return self.func(req, *args, **kwargs)
Jul 19 03:34:01.681747 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/usr/local/lib/python2.7/dist-packages/oslo_middleware/base.py", line 131, in __call__
Jul 19 03:34:01.681871 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: response = req.get_response(self.application)
Jul 19 03:34:01.682003 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/usr/local/lib/python2.7/dist-packages/webob/request.py", line 1316, in send
Jul 19 03:34:01.682142 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: application, catch_exc_info=False)
Jul 19 03:34:01.682279 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/usr/local/lib/python2.7/dist-packages/webob/request.py", line 1280, in call_application
Jul 19 03:34:01.682408 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: app_iter = application(self.environ, start_response)
Jul 19 03:34:01.682541 ubuntu-xenial-citycloud-la1-9927584 ironic-api[16734]: File "/opt/stack/new/ironic/ironic/api/middleware/auth_token.py", line 59, in __call__
Jul 19 03:34:01.682...

Read more...

Changed in ironic:
status: New → Confirmed
Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Vladyslav Drok (vdrok) wrote :

The reason is someone restarting apache at that moment -- http://logs.openstack.org/49/484949/14/check/gate-tempest-dsvm-ironic-ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-nv/69b18d7/logs/apache/error.txt.gz. We need to come up with an idea how to work around this (again)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/487925

Changed in nova:
assignee: nobody → Ed Leafe (ed-leafe)
status: Invalid → In Progress
Revision history for this message
Ed Leafe (ed-leafe) wrote :

Note that this fix only smooths the behavior in the ironic driver so that it doesn't crash when this happens.

Dmitry Tantsur (divius)
Changed in ironic:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/491835

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/487925
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c2d33c3271370358d48553233b41bf9119d834fb
Submitter: Jenkins
Branch: master

commit c2d33c3271370358d48553233b41bf9119d834fb
Author: EdLeafe <email address hidden>
Date: Wed Jul 26 23:07:38 2017 +0000

    Handle ironicclient failures in Ironic driver

    The ironic driver uses ironicclient to get the list of nodes from the
    Ironic service. If that service is not running, or something otherwise
    prevents the ironicclient's call from succeeding, and unhandled
    exception will occur. This patch adds a generic exception handler for
    this call, logging an error and continuing.

    Co-Authored-By: Lucas Alvares Gomes <email address hidden>

    Closes-Bug: #1706772

    Change-Id: I4ebebcf221dfe29e2aa125f5956aec10108f8fbe

Changed in nova:
status: In Progress → Fix Released
Matt Riedemann (mriedem)
tags: removed: pike-rc-potential
Revision history for this message
Vladyslav Drok (vdrok) wrote :

We're also hitting this issue on our grenade job on ocata http://logs.openstack.org/04/480904/10/check/gate-grenade-dsvm-ironic-ubuntu-xenial/a13a964/logs/new/screen-n-cpu.txt.gz. In this case, placement start causes apache restart, and compute is started at the same time. Will propose a patch to grenade and see if it'll break anything.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 16.0.0.0rc1

This issue was fixed in the openstack/nova 16.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/491835
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c46f037a0f57710553b5ff8f02bc08ebedf3c5fb
Submitter: Jenkins
Branch: stable/ocata

commit c46f037a0f57710553b5ff8f02bc08ebedf3c5fb
Author: EdLeafe <email address hidden>
Date: Wed Jul 26 23:07:38 2017 +0000

    Handle ironicclient failures in Ironic driver

    The ironic driver uses ironicclient to get the list of nodes from the
    Ironic service. If that service is not running, or something otherwise
    prevents the ironicclient's call from succeeding, and unhandled
    exception will occur. This patch adds a generic exception handler for
    this call, logging an error and continuing.

    Co-Authored-By: Lucas Alvares Gomes <email address hidden>

    Closes-Bug: #1706772

    Conflicts:
          nova/tests/unit/virt/ironic/test_driver.py

    NOTE(mriedem): The conflicts in the tests can be ignored since they
    were just around some boot from volume and flavor migration tests
    in Pike which don't exist in Ocata. Also, the _LE translation marker
    has to be added back in for LOG.error in Ocata due to hacking checks.

    Change-Id: I4ebebcf221dfe29e2aa125f5956aec10108f8fbe
    (cherry picked from commit c2d33c3271370358d48553233b41bf9119d834fb)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 15.0.7

This issue was fixed in the openstack/nova 15.0.7 release.

Dmitry Tantsur (divius)
Changed in ironic:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.