Ironic

Third Party CI Systems that use baremetal target nodes are failing

Bug #1683902 reported by Michael Turek on 2017-04-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ironic	Invalid	Undecided	Unassigned

Bug Description

This bug is being opened after a conversation that started here:

http://lists.openstack.org/pipermail/openstack-dev/2017-April/115487.html

Both PowerKVM CI's ironic job and Dell's HW PXE-IPMItool job are failing from a timeout in '/opt/stack/new/ironic/devstack/lib/ironic:wait_for_nova_resources'.

The following is output from calls to ironic run while 'wait_for_nova_resources' is looping:

$ source devstack/accrc/admin/admin

$ ironic node-show node-0
+------------------------+-----------------------------------------------+
| Property | Value |
+------------------------+-----------------------------------------------+
| boot_interface | |
| chassis_uuid | 77b66e65-e4c0-4bc1-a4ed-77c6373c57b0 |
| clean_step | {} |
| console_enabled | False |
| console_interface | |
| created_at | 2017-04-14T15:18:20+00:00 |
| deploy_interface | |
| driver | agent_ipmitool |
| driver_info | {u'deploy_kernel': |
| | u'cd57c951-f9d9-48bc-a2c1-eb4fd2048bbb', |
| | u'ipmi_address': u'*******', |
| | u'deploy_ramdisk': |
| | u'12a65420-1a2b-45f6-b486-bcbd03f7c764', |
| | u'ipmi_password': u'******', |
| | u'ipmi_username': u'******'} |
| driver_internal_info | {} |
| extra | {} |
| inspect_interface | |
| inspection_finished_at | None |
| inspection_started_at | None | | instance_info | {} |
| instance_uuid | None |
| last_error | None |
| maintenance | False |
| maintenance_reason | None |
| management_interface | |
| name | node-0 |
| network_interface | |
| power_interface | |
| power_state | None |
| properties | {u'memory_mb': 51000, u'cpu_arch': u'ppc64el',|
| | u'local_gb': 500, u'cpus': 1} |
| provision_state | available |
| provision_updated_at | None |
| raid_config | |
| raid_interface | |
| reservation | None |
| resource_class | |
| target_power_state | None |
| target_provision_state | None |
| target_raid_config | |
| updated_at | None |
| uuid | 7d03ef35-bd9b-40ec-bf8e-fecb5c1200e5 |
| vendor_interface | |
+------------------------------------------------------------------------+

$ openstack hypervisor stats show
+----------------------+-------+
| Field | Value |
+----------------------+-------+
| count | 1 |
| current_workload | 0 |
| disk_available_least | 0 |
| free_disk_gb | 0 |
| free_ram_mb | 0 |
| local_gb | 0 |
| local_gb_used | 0 |
| memory_mb | 0 |
| memory_mb_used | 0 |
| running_vms | 0 |
| vcpus | 0 |
| vcpus_used | 0 |
+----------------------+-------+

In short, the properties from the node are not propagating to the hypervisor stats. This means that the 'wait_for_nova_resources' will loop indefinitely. We have also confirmed that the stats do not make it to the database.

Vlad pointed out that these rabbitmq errors are suspect:
https://dal05.objectstorage.softlayer.net/v1/AUTH_3d8e6ecb-f597-448c-8ec2-164e9f710dd6/pkvmci/ironic/25/454625/10/check-ironic/tempest-dsvm-ironic-agent_ipmitool/0520958/screen-ir-api.txt.gz

See original description

Tags:

Michael Turek (mjturek) on 2017-04-18

description:

updated