instance uuid not cleared in an ironic node

Bug #1596922 reported by Yushiro FURUKAWA on 2016-06-28
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Low
Hironori Shiina

Bug Description

Description
===========
In nova with ironic environment, "instance UUID" still remains if nova boot
got an error. As a result, the ironic node cannot delete by API and deploy
baremetal instance.

Steps to reproduce
==================
1. create an ironic-node
2. create an ironic-port
3. nova boot for ironic instance with following illegal instance name.

   $ nova boot --flavor my-baremetal-agent --image $MY_IMAGE_UUID \
               --key-name default test3.141592 --nic net-id=$NET_UUID

Expected result
===============
Nova got an error and state has changed "ERROR". Then, tenant user deletes
nova's instance($nova delete <instance_uuid>). After that, "instance UUID"
of an ironic node should be cleared.

Actual result
=============
After $nova delete <instance_uuid>, "instance UUID" still remains into the
ironic node database.

$ ironic node-delete 0ea7c2e3-e5be-4052-85ac-a7b1adf0f30b
Failed to delete node 0ea7c2e3-e5be-4052-85ac-a7b1adf0f30b: Node 0ea7c2e3-e5be-4052-85ac-a7b1adf0f30b is associated with instance 84911c1f-976b-4428-a509-8ab6cf04182a. (HTTP 409)

Environment
===========
* Devstack all-in-one
* Nova's source code is for "May 18".

commit fe8a119e8d80de35d7f99e0c1d9a9e5095840146
Merge: b56d861 6f2a46f
Author: Jenkins <email address hidden>
Date: Wed May 18 23:33:00 2016 +0000

    Merge "Remove unused base_options param from _get_image_defined_bdms"

2. Which hypervisor did you use?
   ironic

3. Which networking type did you use?
   (For example: nova-network, Neutron with OpenVSwitch, ...)
neutron + ML2 + openvswitch driver + linuxbridge driver

Logs
====
2016-06-28 21:03:04.670 ERROR nova.scheduler.utils [req-7db0e0e5-5d18-4a5a-9293-52dd2e0f4351 admin admin] [instance: 84911c1f-976b-4428-a509-8ab6cf04182a] Error from last host: f
urukawa-dev-ironic (node 0ea7c2e3-e5be-4052-85ac-a7b1adf0f30b): [u'Traceback (most recent call last):\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 1749, in _do_bu
ild_and_run_instance\n filter_properties)\n', u' File "/opt/stack/nova/nova/compute/manager.py", line 1939, in _build_and_run_instance\n instance_uuid=instance.uuid, reaso
n=six.text_type(e))\n', u"RescheduledException: Build of instance 84911c1f-976b-4428-a509-8ab6cf04182a was re-scheduled: Invalid input for dns_name. Reason: 'test3.141592' not a
valid PQDN or FQDN. Reason: TLD '141592' must not be all numeric.\n"]
2016-06-28 21:03:04.689 DEBUG oslo_messaging._drivers.amqpdriver [req-7db0e0e5-5d18-4a5a-9293-52dd2e0f4351 admin admin] sending reply msg_id: dea4917aad334ffda701e6cd23cf6a4c rep
ly queue: reply_a163275e9821450cb6494257a3b9629f time elapsed: 0.0230762520805s from (pid=5506) _send_reply /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdri
ver.py:74
2016-06-28 21:03:04.688 DEBUG oslo_messaging._drivers.amqpdriver [req-7db0e0e5-5d18-4a5a-9293-52dd2e0f4351 admin admin] CALL msg_id: 51218c1cd9ad403798f2d830d9d0abb3 exchange 'no
va' topic 'scheduler' from (pid=5507) _send /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:450
2016-06-28 21:03:04.741 DEBUG oslo_messaging._drivers.amqpdriver [-] received reply msg_id: 51218c1cd9ad403798f2d830d9d0abb3 from (pid=5507) __call__ /usr/local/lib/python2.7/dis
t-packages/oslo_messaging/_drivers/amqpdriver.py:298
2016-06-28 21:03:04.743 WARNING nova.scheduler.utils [req-7db0e0e5-5d18-4a5a-9293-52dd2e0f4351 admin admin] Failed to compute_task_build_instances: No valid host was found. There
 are not enough hosts available.
Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 200, in inner
    return func(*args, **kwargs)

  File "/opt/stack/nova/nova/scheduler/manager.py", line 104, in select_destinations
    dests = self.driver.select_destinations(ctxt, spec_obj)

  File "/opt/stack/nova/nova/scheduler/filter_scheduler.py", line 74, in select_destinations
    raise exception.NoValidHost(reason=reason)

NoValidHost: No valid host was found. There are not enough hosts available.

Yushiro FURUKAWA (y-furukawa-2) wrote :
Sean Dague (sdague) wrote :

It looks like the issue is solely that Ironic's cleanup should get called and do this here - https://github.com/openstack/nova/blob/4e62960722caaefd02f6fdc753176a7c117f6a18/nova/virt/ironic/driver.py#L838

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Changed in nova:
assignee: nobody → Hironori Shiina (shiina-hironori)

Fix proposed to branch: master
Review: https://review.openstack.org/341253

Changed in nova:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/341253
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0e24e9e2ec254364ffe029226b9ae5956002df54
Submitter: Jenkins
Branch: master

commit 0e24e9e2ec254364ffe029226b9ae5956002df54
Author: Hironori Shiina <email address hidden>
Date: Sun Jul 10 15:32:58 2016 +0900

    ironic: Cleanup instance information when spawn fails

    Instance information such as an instance_uuid set to an ironic node by
    _add_driver_fields() is not cleared when spawning is aborted by an
    exception raised before ironic starts deployment. Then, ironic node
    stays AVAILABLE state with instance_uuid set. This information is not
    cleared even if the instance is deleted. The ironic node cannot be
    removed nor deployed again becuase instance_uuid remains.

    This patch adds a method to remove the information. This method is
    called if ironic doesn't need unprovisioning when an instance is
    destroyed.

    Change-Id: Idf5191aa1c990552ca2340856d5d5b6ac03f7539
    Closes-Bug: 1596922

Changed in nova:
status: In Progress → Fix Released

This issue was fixed in the openstack/nova 14.0.0.0b3 development milestone.

Reviewed: https://review.openstack.org/364369
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=20ba099205d8c90334d830f53ff9bec254415265
Submitter: Jenkins
Branch: stable/mitaka

commit 20ba099205d8c90334d830f53ff9bec254415265
Author: Hironori Shiina <email address hidden>
Date: Sun Jul 10 15:32:58 2016 +0900

    ironic: Cleanup instance information when spawn fails

    Instance information such as an instance_uuid set to an ironic node by
    _add_driver_fields() is not cleared when spawning is aborted by an
    exception raised before ironic starts deployment. Then, ironic node
    stays AVAILABLE state with instance_uuid set. This information is not
    cleared even if the instance is deleted. The ironic node cannot be
    removed nor deployed again becuase instance_uuid remains.

    This patch adds a method to remove the information. This method is
    called if ironic doesn't need unprovisioning when an instance is
    destroyed.

    Change-Id: Idf5191aa1c990552ca2340856d5d5b6ac03f7539
    Closes-Bug: 1596922
    (cherry picked from commit 0e24e9e2ec254364ffe029226b9ae5956002df54)

tags: added: in-stable-mitaka

This issue was fixed in the openstack/nova 13.1.2 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments