nova-compute with IronicDriver failed to start if ironic-api service not started

Bug #1430616 reported by Zhenzan Zhou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Zhenzan Zhou

Bug Description

When running devstack stack.sh, nova-compute started at 2015-03-11 10:24:01.323, but ironic-api started at 2015-03-11 10:27:04.685. Nova-compute exited at 2015-03-11 10:26:00.083 with NovaException:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/openstack/common/service.py", line 491, in run_service
    service.start()
  File "/opt/stack/nova/nova/service.py", line 164, in start
    self.manager.init_host()
  File "/opt/stack/nova/nova/compute/manager.py", line 1201, in init_host
    self._destroy_evacuated_instances(context)
  File "/opt/stack/nova/nova/compute/manager.py", line 737, in _destroy_evacuated_instances
    local_instances = self._get_instances_on_driver(context, filters)
  File "/opt/stack/nova/nova/compute/manager.py", line 700, in _get_instances_on_driver
    driver_uuids = self.driver.list_instance_uuids()
  File "/opt/stack/nova/nova/virt/ironic/driver.py", line 422, in list_instance_uuids
    limit=0)
  File "/opt/stack/nova/nova/virt/ironic/client_wrapper.py", line 142, in call
    raise exception.NovaException(msg)
NovaException: Error contacting Ironic server for 'node.list'. Attempt 60 of 60

It doesn't make sense for a service.

Tags: ironic
Eric Xie (mark-xiett)
Changed in nova:
assignee: nobody → Eric Xie(OpenCOS) (mark-xiett)
assignee: Eric Xie(OpenCOS) (mark-xiett) → nobody
Revision history for this message
Eric Xie (mark-xiett) wrote :

I use Juno version, 2014.2.
Check the status of nova-compute service after stop ironic-api service, and get it:
# systemctl status openstack-nova-compute.service
openstack-nova-compute.service - OpenStack Nova Compute Server
   Loaded: loaded (/usr/lib/systemd/system/openstack-nova-compute.service; enabled)
   Active: active (running) since Wed 2015-03-11 12:01:22 CST; 1h 41min ago
 Main PID: 28778 (nova-compute)
   CGroup: /system.slice/openstack-nova-compute.service
           └─28778 /usr/bin/python /usr/bin/nova-compute

Mar 11 12:01:22 controller systemd[1]: Started OpenStack Nova Compute Server.

But in the log file of nova-compute:
2015-03-11 13:40:56.502 28778 TRACE nova.openstack.common.periodic_task NovaException: Error contacting Ironic server for 'node.list'. Attempt 60 of 60

Then start ironic-api service, and the exception was gone.

IMHO this flow does not process properly. When ironic-api service stopped, nova-compute service should be exited. But now it was running. End-user can launch one instance because 'nova-compute' service is up.

Changed in nova:
status: New → Incomplete
assignee: nobody → Eric Xie(OpenCOS) (mark-xiett)
Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

I tried to stop ironic-api while nova-compute is running and I reproduced the same behavior as you pasted. For this bug, it only happened at the start up phase.
Personally I don't think it's a right behavior to just shutdown nova-compute if ironic-api service stopped. You can always launch an instance even if nova-compute is stopped. And you will get error like:
"message": "No valid host was found. There are not enough hosts available."
I also tried to just stop ironic-api but keeps nova-compute running, the same error for the instance:
"message": "No valid host was found. There are not enough hosts available."
So it's better to keep nova-compute and once ironic-api is back, it can recover immediately.

BTW, what information do you want to change the status back to normal from incomplete? Thanks.

Revision history for this message
Eric Xie (mark-xiett) wrote :

I tried it as below steps:
1) stop ironic-api
2) stop nova-compute
3) start nova-compute
And found that nova-compute was still running.
Anything wrong?

Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

Does it run for long time? Do you see it reports exception periodically after step 3? If the answer is yes, then we probably meet a regression. Please try the latest Kilo code. Thanks.

Revision history for this message
Eric Xie (mark-xiett) wrote :

Ok. I will try it as soon as i can:)

Revision history for this message
Sean Dague (sdague) wrote :

I do believe that nova compute should not die under these situations.

tags: added: ironic
Changed in nova:
status: Incomplete → Confirmed
importance: Undecided → Medium
Eric Xie (mark-xiett)
Changed in nova:
assignee: Eric Xie (mark-xiett) → nobody
Changed in nova:
assignee: nobody → Zhenzan Zhou (zhenzan-zhou)
Revision history for this message
Zhenzan Zhou (zhenzan-zhou) wrote :

If I killed libvirtd and then start nova-compute, it got an exception:

HypervisorUnavailable: Connection to the hypervisor is broken on host

and quit immediately. If killed libvirtd while nova-compute is running, it changed compute service status to disabled and won't exit with HypervisorUnavailable exceptions:

2015-04-14 14:57:38.699 DEBUG nova.virt.libvirt.driver [req-5f28fc21-9769-4cd9-8d96-0ee6fd15fd8c None None] Updating compute service status to disabled from (pid=1344) _set_host_enabled /opt/stack/nova/nova/virt/libvirt/driver.py:3021

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/173681

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/173681
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cce06a1e9855d9eed3f7c653200853f23466d791
Submitter: Jenkins
Branch: master

commit cce06a1e9855d9eed3f7c653200853f23466d791
Author: Zhenzan Zhou <email address hidden>
Date: Wed Apr 15 13:27:51 2015 +0800

    Bypass ironic server not available issue

    The ironic driver needs enhancement for exception handling.
    This patch is a workaround to make devstack with ironic enabled
    success. A more elegant patch should be made later in ironic
    driver for exception handling.

    Change-Id: Ibace25ad905a8278ecea4b02c69c59737a490d3a
    Closes-Bug: #1430616

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-2 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.