Node Deployment failure due to " Error finding the disk or partition device to deploy the image onto"

Bug #1670916 reported by Sai Sindhur Malleni
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Wishlist
Unassigned
ironic-python-agent
Fix Released
Medium
Dmitry Tantsur

Bug Description

From the introspection data, the WWN on the disk that needs to be root disk is obtained as follows:
{
    "size": 500107862016,
    "rotational": true,
    "vendor": "ATA",
    "name": "/dev/sdak",
    "wwn_vendor_extension": null,
    "wwn_with_extension": "0x5000c50091799672",
    "model": "ST9500620NS",
    "wwn": "0x5000c50091799672",
    "serial": "9XF44QWA"
},

The root device is set using ironic node-update $UUID add properties/root_device="{\"wwn": \"$WWN\"}"

ironic node-show confirms this
[stack@c02-h10-r620 6048r]$ ironic node-show f5acc2b9-888a-4437-9fb7-5f98994f2e96 [4/1813]
+------------------------+--------------------------------------------------------------------------+
| Property | Value |
+------------------------+--------------------------------------------------------------------------+
| chassis_uuid | |
| clean_step | {} |
| console_enabled | False |
| created_at | 2017-03-06T21:47:04+00:00 |
| driver | pxe_ipmitool |
| driver_info | {u'deploy_kernel': u'a3789c1a-3f18-40c4-b84c-97bcdbd18129', |
| | u'ipmi_address': |
| | u'mgmt-c06-h05-6048r.rdu.openstack.engineering.redhat.com', |
| | u'deploy_ramdisk': u'7915d35a-f90a-4fa4-9b9f-42ca0c5272fd', |
| | u'ipmi_password': u'******', u'ipmi_username': u'quads'} |
| driver_internal_info | {u'agent_cached_clean_steps_refreshed': u'2017-03-07 18:27:17.915430', |
| | u'agent_cached_clean_steps': {u'deploy': [{u'priority': 99, |
| | u'interface': u'deploy', u'reboot_requested': False, u'abortable': True, |
| | u'step': u'erase_devices_metadata'}, {u'priority': 10, u'interface': |
| | u'deploy', u'reboot_requested': False, u'abortable': True, u'step': |
| | u'erase_devices'}]}, u'clean_steps': None, u'hardware_manager_version': |
| | {u'generic_hardware_manager': u'1.1'}, u'is_whole_disk_image': False, |
| | u'agent_continue_if_ata_erase_failed': False, |
| | u'agent_erase_devices_iterations': 1, u'agent_erase_devices_zeroize': |
| | True, u'agent_url': u'http://192.0.2.129:9999', u'agent_last_heartbeat': |
| | 1488911239} |
| extra | {u'hardware_swift_object': u'extra_hardware- |
| | f5acc2b9-888a-4437-9fb7-5f98994f2e96'} |
| inspection_finished_at | None |
| inspection_started_at | None |
| instance_info | {} |
| instance_uuid | None |
| last_error | None |
| maintenance | False |
| maintenance_reason | None |
| name | None |
| network_interface | |
| power_state | power off |
| properties | {u'cpu_arch': u'x86_64', u'root_device': {u'wwn': |
| | u'0x5000c50091799672'}, u'cpus': u'56', u'capabilities': u'cpu_vt:true,c |
| | pu_hugepages:true,boot_option:local,cpu_txt:true,cpu_aes:true,cpu_hugepa |
| | ges_1g:true', u'memory_mb': u'262144', u'local_gb': u'464'} |
| provision_state | available |
| provision_updated_at | 2017-03-07T18:27:36+00:00 |
| raid_config | |
| reservation | None |
| resource_class | |
| target_power_state | None |
| target_provision_state | None |
| target_raid_config | |
| updated_at | 2017-03-07T18:27:36+00:00 |
| uuid | f5acc2b9-888a-4437-9fb7-5f98994f2e96 |

However, overcloud deploy fails because of nova scheduling error and looking at ironic conductor logs this is seen
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor [req-30212281-8757-4802-aa86-76812ee7b4f7 - - - - -] Asynchronous exception for node f5acc2b9-888a-4437-9fb7-5f98994f2e96: Node failed to get image for deploy. Exception: Failed to deploy instance: Failed to start the iSCSI target to deploy the node f5acc2b9-888a-4437-9fb7-5f98994f2e96. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'wwn': u'0x5000c50091799672'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'wwn': u'0x5000c50091799672'}"}
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor Traceback (most recent call last):
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/agent_base_vendor.py", line 482, in heartbeat
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor self.continue_deploy(task)
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs)
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/conductor/task_manager.py", line 138, in wrapper
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor return f(*args, **kwargs)
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 381, in continue_deploy
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor uuid_dict_returned = do_agent_iscsi_deploy(task, self._client)
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic_lib/metrics.py", line 61, in wrapped
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor result = f(*args, **kwargs)
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor File "/usr/lib/python2.7/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 308, in do_agent_iscsi_deploy
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor raise exception.InstanceDeployFailure(reason=msg)
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor InstanceDeployFailure: Failed to deploy instance: Failed to start the iSCSI target to deploy the node f5acc2b9-888a-4437-9fb7-5f98994f2e96. Error: {u'message': u"Error finding the disk or partition device to deploy the image onto: No suitable device was found for deployment using these hints {u'wwn': u'0x5000c50091799672'}", u'code': 404, u'type': u'DeviceNotFound', u'details': u"No suitable device was found for deployment using these hints {u'wwn': u'0x5000c50091799672'}"}
2017-03-07 18:23:40.723 32676 ERROR ironic.drivers.modules.agent_base_vendor

The very WWN that was returned by introspection is said to be non-existent in the error above.

Changed in ironic-python-agent:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (master)

Fix proposed to branch: master
Review: https://review.openstack.org/443649

Changed in ironic-python-agent:
status: New → In Progress
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Here's some more logs: https://gist.github.com/smalleni/81884e80f499fd5417bb2a45c71ff96b

It's important to note that, these machines have 30+ disks therefore waiting for any to appear first is not ideal.

Revision history for this message
Michael Turek (mjturek) wrote :

Reading the comments in the IPA patch makes it seem like a small change to ironic is planned. Specifically adding a config option for amount of time to wait before invoking '_wait_for_disks()'.

Moving to Triaged as the fix is detailed in the IPA patch, and Wishlist as it's not required for the IPA fix (at least initially)

Changed in ironic:
status: New → Triaged
importance: Undecided → Wishlist
Changed in ironic-python-agent:
importance: Undecided → Medium
Revision history for this message
Michael Turek (mjturek) wrote :

Also set the IPA bug to Medium as it seems to be a good improvement for root_hints behavior.

Changed in ironic-python-agent:
assignee: Lucas Alvares Gomes (lucasagomes) → Dmitry Tantsur (divius)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ironic-python-agent (master)

Change abandoned by Lucas Alvares Gomes (<email address hidden>) on branch: master
Review: https://review.openstack.org/443649

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (master)

Reviewed: https://review.openstack.org/443649
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=3189c16a5e95ade468fa8bc37302eb9979f5a8c9
Submitter: Zuul
Branch: master

commit 3189c16a5e95ade468fa8bc37302eb9979f5a8c9
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Mar 9 14:02:06 2017 +0000

    Fix waiting for target disk to appear

    This patch is changing the _wait_for_disks() method behavior to wait to
    a specific disk if any device hints is specified. There are cases where
    the deployment might fail or succeed randomly depending on the order and
    time that the disks shows up.

    If no root device hints is specified, the method will just wait for any
    suitable disk to show up, like before.

    The _wait_for_disks call was made into a proper hardware manager method.
    It is now also called each time the cached node is updated, not only
    on start up. This is to ensure that we wait for the device, matching
    root device hints (which are part of the node).

    The loop was corrected to avoid redundant sleeps and warnings.

    Finally, this patch adds more logging around detecting the root device.

    Co-Authored-By: Dmitry Tantsur <email address hidden>
    Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
    Closes-Bug: #1670916

Changed in ironic-python-agent:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic-python-agent (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/512643

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic-python-agent (stable/pike)

Reviewed: https://review.openstack.org/512643
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=bd8c6c7420c5a913ff407da569c1b8dfbc0a488a
Submitter: Zuul
Branch: stable/pike

commit bd8c6c7420c5a913ff407da569c1b8dfbc0a488a
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Mar 9 14:02:06 2017 +0000

    Fix waiting for target disk to appear

    This patch is changing the _wait_for_disks() method behavior to wait to
    a specific disk if any device hints is specified. There are cases where
    the deployment might fail or succeed randomly depending on the order and
    time that the disks shows up.

    If no root device hints is specified, the method will just wait for any
    suitable disk to show up, like before.

    The _wait_for_disks call was made into a proper hardware manager method.
    It is now also called each time the cached node is updated, not only
    on start up. This is to ensure that we wait for the device, matching
    root device hints (which are part of the node).

    The loop was corrected to avoid redundant sleeps and warnings.

    Finally, this patch adds more logging around detecting the root device.

    Co-Authored-By: Dmitry Tantsur <email address hidden>
    Change-Id: I10ca70d6a390ed802505c0d10d440dfb52beb56c
    Closes-Bug: #1670916
    (cherry picked from commit 3189c16a5e95ade468fa8bc37302eb9979f5a8c9)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic-python-agent 3.0.0

This issue was fixed in the openstack/ironic-python-agent 3.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ironic-python-agent 2.2.2

This issue was fixed in the openstack/ironic-python-agent 2.2.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic-python-agent (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/516693

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic-python-agent (master)

Reviewed: https://review.openstack.org/516693
Committed: https://git.openstack.org/cgit/openstack/ironic-python-agent/commit/?id=6da0268ebe9e69b891409236a88b8721ce2236eb
Submitter: Zuul
Branch: master

commit 6da0268ebe9e69b891409236a88b8721ce2236eb
Author: Ruby Loo <email address hidden>
Date: Tue Oct 31 10:14:25 2017 -0400

    Fix off-by-one error in warning

    This fixes an off-by-one error in a warning message.

    This is a follow-up to 3189c16a5e95ade468fa8bc37302eb9979f5a8c9.

    Change-Id: I89b56974c1b919f4c03498873d3ce9860d5644c5
    Related-Bug: #1670916

Dmitry Tantsur (divius)
Changed in ironic:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.