Bug #1651704 “Errors when starting introspection are silently ig...” : Bugs : tripleo

Revision history for this message

Yolanda Robla (yolanda.robla) wrote on 2016-12-21:

#1

I'm hitting that as well in my local runs. My compute node was fine, but the controller failed with that error.

Revision history for this message

Lucas Alvares Gomes (lucasagomes) wrote on 2016-12-21:

#2

Thanks for the report. Yeah I can see that one node got deployed correctly [0] but there's no trace from a second node being deployed in the Ironic logs, which indicates that it failed prior to that (when nova is scheduling) so "No valid hosts" makes sense.

One possible cause for this could be: When the nodes are updated in Ironic (after inspection for example) it takes a while for the values to get propagated to the nova resource tracker. This happens because nova runs a period task that to update it and, as you may know, periodic tasks in OpenStack can be slow (because they run one after the other in a greenthread). In the Ironic DevStack pluging we have a workaround for this: https://github.com/openstack/ironic/blob/master/devstack/lib/ironic#L1151-L1167

Perhaps we need something similar in OOO-quickstart ?

[0]

2016-12-21 06:50:39.867 29447 INFO ironic.conductor.task_manager [req-f77a2cef-c3e7-4863-b313-e8fee9d2a3c2 - - - - -] Node f06c26d1-8008-47a9-a687-d5ef05083e41 moved to provision state "active" from state "deploying"; target provision state is "None"
2016-12-21 06:50:39.869 29447 INFO ironic.drivers.modules.agent_base_vendor [req-f77a2cef-c3e7-4863-b313-e8fee9d2a3c2 - - - - -] Deployment to node f06c26d1-8008-47a9-a687-d5ef05083e41 done

Revision history for this message

Lucas Alvares Gomes (lucasagomes) wrote on 2016-12-21:

#3

I'm also marking ironic-inspector as affected here because in the logs I see (thanks to bfournier for pointing it out):

2016-12-21 06:39:16.254 24006 ERROR ironic_inspector.node_cache [-] [node: f06c26d1-8008-47a9-a687-d5ef05083e41 state None] Database integrity error (pymysql.err.IntegrityError) (1062, u"Duplicate entry 'bmc_address-127.0.0.1' for key 'PRIMARY'") [SQL: u'INSERT INTO attributes (name, value, uuid) VALUES (%(name)s, %(value)s, %(uuid)s)'] [parameters: {'name': 'bmc_address', 'value': '127.0.0.1', 'uuid': u'f06c26d1-8008-47a9-a687-d5ef05083e41'}] during adding attributes
2016-12-21 06:39:16.255 24006 ERROR ironic_inspector.utils [-] [node: f06c26d1-8008-47a9-a687-d5ef05083e41 state None] Some or all of bmc_address's ['127.0.0.1'] are already on introspection

Since we changed to VirtualBMC, all the BMC addresses will point to 127.0.0.1, the ports will be the difference. So, maybe inspector is disconsidering the 2nd node thinking it's duplicated ?

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2016-12-21:

#4

I had reported the db integrity error in a separated LP https://bugs.launchpad.net/tripleo/+bug/1651719 .

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2016-12-21:

#5

So, if we focus in the node where introspection run properly (c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec), we can see following info:

1. Introspection finishes successfully with right information according to inspector logs:

2016-12-21 06:41:24.030 24006 INFO ironic_inspector.plugins.standard [-] [node: c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec state processing MAC 00:fc:e0:2e:a1:ec] Discovered data: CPUs: 2 x86_64, memory 8192 MiB, disk 49 GiB
2016-12-21 06:41:26.408 24006 INFO ironic_inspector.process [-] [node: c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec state None MAC 00:fc:e0:2e:a1:ec] Introspection finished successfully

2. Nova scheduler discards it by RamFilter:

2016-12-21 06:45:57.537 17759 DEBUG nova.filters [req-a2319864-8629-4b7f-8b94-d41911ad02d0 4f103e0230074c2488b7359bc079d323 f21dbfb3b2c840059ec2a0bba03b7385 - - -] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104
2016-12-21 06:45:57.538 17759 DEBUG nova.scheduler.filters.ram_filter [req-a2319864-8629-4b7f-8b94-d41911ad02d0 4f103e0230074c2488b7359bc079d323 f21dbfb3b2c840059ec2a0bba03b7385 - - -] (undercloud, c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec) ram: 0MB disk: 0MB io_ops: 0 instances: 0 does not have 8192 MB usable ram before overcommit, it only has 0 MB. host_passes /usr/lib/python2.7/site-packages/nova/scheduler/filters/ram_filter.py:45

That's about 4 minutes and 30 seconds between end of introspection and nova scheduller error, I'd say that should be enough, what do you think?

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2016-12-23:

#6

Download full text (4.2 KiB)

Checking nova-compute.log it seems it's not a matter of waiting, see:

2016-12-21 06:38:28.987 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:39:29.787 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:40:29.820 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:41:31.332 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:42:32.838 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:43:34.820 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:44:36.839 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:45:38.820 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:46:39.972 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:47:40.938 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:48:43.261 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:49:56.760 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5...

Checking nova-compute.log it seems it's not a matter of waiting, see:

2016-12-21 06:38:28.987 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:39:29.787 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:40:29.820 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:41:31.332 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:42:32.838 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:43:34.820 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:44:36.839 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:45:38.820 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:46:39.972 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:47:40.938 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:48:43.261 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:49:56.760 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:50:47.828 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:51:50.020 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-21 06:52:48.037 23446 INFO nova.compute.resource_tracker [req-4ec06e18-d31c-4254-8bc1-9cad7fb8b5e8 - - - - -] Final resource view: name=c6f8f4ba-9c7c-4c87-b95a-67a5861b7bec phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]

It's reporting 0MB in many resource updates after the introspection has finished.

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2016-12-23:

#7

Download full text (13.0 KiB)

I've reproduced this locally running RDO from master. I have a ironic installation (a tripleo undercloud), and i've registered a node and instrospect it:

This is how the node looks like from ironic perspective:

$ openstack baremetal show e14be55d-8dd9-4b08-aa5d-efab2b5a5c01
This command is deprecated. Instead, use 'openstack baremetal node show'.
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| chassis_uuid | None |
| clean_step | {} |
| console_enabled | False |
| created_at | 2016-12-23T17:08:16+00:00 |
| driver | pxe_ssh |
| driver_info | {u'ssh_username': u'stack', u'deploy_kernel': u'd77546aa-5936-417a-9dec-43f4f98a057d', u'deploy_ramdisk': u'bc508426-aee4-4d7d-97e5-dcbd6761a896', u'ssh_key_contents': u'******', u'ssh_virt_type': u'virsh', |
| | u'ssh_address': u'192.168.23.1'} |
| driver_internal_info | {} |
| extra | {u'hardware_swift_object': u'extra_hardware-e14be55d-8dd9-4b08-aa5d-efab2b5a5c01'} ...

I've reproduced this locally running RDO from master. I have a ironic installation (a tripleo undercloud), and i've registered a node and instrospect it:

This is how the node looks like from ironic perspective:

$ openstack baremetal show e14be55d-8dd9-4b08-aa5d-efab2b5a5c01
This command is deprecated. Instead, use 'openstack baremetal node show'.
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                  | Value                                                                                                                                                                                                             |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| chassis_uuid           | None                                                                                                                                                                                                              |
| clean_step             | {}                                                                                                                                                                                                                |
| console_enabled        | False                                                                                                                                                                                                             |
| created_at             | 2016-12-23T17:08:16+00:00                                                                                                                                                                                         |
| driver                 | pxe_ssh                                                                                                                                                                                                           |
| driver_info            | {u'ssh_username': u'stack', u'deploy_kernel': u'd77546aa-5936-417a-9dec-43f4f98a057d', u'deploy_ramdisk': u'bc508426-aee4-4d7d-97e5-dcbd6761a896', u'ssh_key_contents': u'******', u'ssh_virt_type': u'virsh',    |
|                        | u'ssh_address': u'192.168.23.1'}                                                                                                                                                                                  |
| driver_internal_info   | {}                                                                                                                                                                                                                |
| extra                  | {u'hardware_swift_object': u'extra_hardware-e14be55d-8dd9-4b08-aa5d-efab2b5a5c01'}                                                                                                                                |
| inspection_finished_at | None                                                                                                                                                                                                              |
| inspection_started_at  | None                                                                                                                                                                                                              |
| instance_info          | {}                                                                                                                                                                                                                |
| instance_uuid          | None                                                                                                                                                                                                              |
| last_error             | None                                                                                                                                                                                                              |
| maintenance            | False                                                                                                                                                                                                             |
| maintenance_reason     | None                                                                                                                                                                                                              |
| name                   | control-0                                                                                                                                                                                                         |
| power_state            | power off                                                                                                                                                                                                         |
| properties             | {u'memory_mb': u'4099', u'cpu_arch': u'x86_64', u'local_gb': u'49', u'cpus': u'2', u'capabilities': u'profile:control,cpu_hugepages:true,boot_option:local'}                                                      |
| provision_state        | manageable                                                                                                                                                                                                        |
| provision_updated_at   | 2016-12-23T17:08:50+00:00                                                                                                                                                                                         |
| reservation            | None                                                                                                                                                                                                              |
| target_power_state     | None                                                                                                                                                                                                              |
| target_provision_state | None                                                                                                                                                                                                              |
| updated_at             | 2016-12-23T17:15:09+00:00                                                                                                                                                                                         |
| uuid                   | e14be55d-8dd9-4b08-aa5d-efab2b5a5c01                                                                                                                                                                              |
+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

But from nova:

openstack hypervisor show e14be55d-8dd9-4b08-aa5d-efab2b5a5c01
+----------------------+--------------------------------------+
| Field                | Value                                |
+----------------------+--------------------------------------+
| aggregates           | []                                   |
| cpu_info             |                                      |
| current_workload     | 0                                    |
| disk_available_least | 0                                    |
| free_disk_gb         | 0                                    |
| free_ram_mb          | 0                                    |
| host_ip              | 192.168.23.43                        |
| hypervisor_hostname  | e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 |
| hypervisor_type      | ironic                               |
| hypervisor_version   | 1                                    |
| id                   | 3                                    |
| local_gb             | 0                                    |
| local_gb_used        | 0                                    |
| memory_mb            | 0                                    |
| memory_mb_used       | 0                                    |
| running_vms          | 0                                    |
| service_host         | undercloud                           |
| service_id           | 4                                    |
| state                | up                                   |
| status               | enabled                              |
| vcpus                | 0                                    |
| vcpus_used           | 0                                    |
+----------------------+--------------------------------------+

I've wait for update but data never updates, i see following messages in nova-compute.log:

$ sudo grep "Final.*e14" /var/log/nova/nova-compute.log
2016-12-23 17:09:49.015 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:10:49.590 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:11:51.628 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:12:52.586 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:13:54.658 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:14:55.059 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:15:55.588 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:16:55.592 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:17:56.588 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:18:58.609 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:19:59.093 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:20:59.587 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:22:00.582 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]
2016-12-23 17:23:02.626 22260 INFO nova.compute.resource_tracker [req-25559752-74e3-4811-9bc2-052267bb2c55 - - - - -] Final resource view: name=e14be55d-8dd9-4b08-aa5d-efab2b5a5c01 phys_ram=0MB used_ram=0MB phys_disk=0GB used_disk=0GB total_vcpus=0 used_vcpus=0 pci_stats=[]

I'm adding nova as affected as the issue may be in nova side iiuc.

Revision history for this message

Dmitry Tantsur (divius) wrote on 2017-01-02:

#8

This seems to be a problem:

| provision_state | manageable

I wonder if something is wrong with the workflows. Which command did you use for introspection?

Changed in nova:
status:	New → Invalid
Changed in ironic-inspector (Ubuntu):
status:	New → Invalid
Changed in tripleo:
status:	New → Incomplete
Changed in ironic:
status:	New → Incomplete
affects:	ironic → ironic-inspector

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2017-01-09:

#9

I'm following the tripleo-quickstart workflow.

1. Create VMs in virthost, one for undercloud, one for controller and one for compute
2. I deployed undercloud
3. Run following commands:

openstack baremetal import --json instackenv.json
openstack baremetal configure boot
openstack baremetal introspection bulk start

At that point i executed commands in comment #7

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2017-01-09:

#10

After some additional testing, i'd say that this issue may be related to error handling when running introspection bulk and one of the introspection fails.

In this case one of them is failing because of https://bugs.launchpad.net/tripleo/+bug/1651719 but "openstack baremetal introspection bulk start" is returning 0 and sometimes the server which is properly introspected is in manageable status when the command finishes.

Dmitry Tantsur (divius) on 2017-01-10

Changed in tripleo:
status:	Incomplete → Triaged
summary:	- Overcloud deployment fails when node introspection is enabled + Errors when starting introspection are silently ignored
Changed in tripleo:
importance:	Undecided → High
assignee:	nobody → Dmitry Tantsur (divius)
milestone:	none → ocata-3

Dougal Matthews (d0ugal) on 2017-01-10

Changed in tripleo:
assignee:	Dmitry Tantsur (divius) → Dougal Matthews (d0ugal)

OpenStack Infra (hudson-openstack) on 2017-01-11

Changed in tripleo:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-11: Related fix proposed to tripleo-common (master)

#11

Related fix proposed to branch: master
Review: https://review.openstack.org/418961

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-11:

#12

Related fix proposed to branch: master
Review: https://review.openstack.org/418964

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-11:

#13

Related fix proposed to branch: master
Review: https://review.openstack.org/418966

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-11:

#14

Related fix proposed to branch: master
Review: https://review.openstack.org/418968

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-13: Fix merged to tripleo-common (master)

#15

Reviewed: https://review.openstack.org/418423
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c7b01eba55e5d133ccc19451cf4727170a5dbdd0
Submitter: Jenkins
Branch: master

commit c7b01eba55e5d133ccc19451cf4727170a5dbdd0
Author: Dougal Matthews <email address hidden>
Date: Tue Jan 10 14:35:36 2017 +0000

Fail the baremetal workflows when sending a "FAILED" message

    When Mistral workflows execute a second workflow (a sub-workflow
    execution), the parent workflow can't easily determine if sub-workflow
    failed. This is because the failure is communicated via a Zaqar message
    only and when a workflow ends with a successful Zaqar message it appears
    have been successful. This problem surfaces because parent workflows
    should have an "on-error" attribute but it is never called, as the
    workflow doesn't error.

    This change marks the workflow as failed if the message has the status
    "FAILED". Now when a sub-workflow fails, the task that called it should
    have the on-error triggered. Previously it would always go to
    on-success.

Closes-Bug: #1651704
Change-Id: I60444ec692351c44753649b59b7c1d7c4b61fa8e

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-13: Related fix merged to tripleo-common (master)

#16

Reviewed: https://review.openstack.org/418964
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=ab946c4a0e6b0bf5bc0b31e93af455af5b1887dc
Submitter: Jenkins
Branch: master

commit ab946c4a0e6b0bf5bc0b31e93af455af5b1887dc
Author: Dougal Matthews <email address hidden>
Date: Wed Jan 11 14:16:50 2017 +0000

Fail the plan_management workflows when sending a "FAILED" message

    When Mistral workflows execute a second workflow (a sub-workflow
    execution), the parent workflow can't easily determine if sub-workflow
    failed. This is because the failure is communicated via a Zaqar message
    only and when a workflow ends with a successful Zaqar message it appears
    have been successful. This problem surfaces because parent workflows
    should have an "on-error" attribute but it is never called, as the
    workflow doesn't error.

    This change marks the workflow as failed if the message has the status
    "FAILED". Now when a sub-workflow fails, the task that called it should
    have the on-error triggered. Previously it would always go to
    on-success.

Change-Id: Ie0f1d4e57505e9346ae3f0b25d755f55d73a255a
Related-Bug: #1651704

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-13:

#17

Reviewed: https://review.openstack.org/418968
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=6051a7446bdf415e18ddc5f25697759ca488be17
Submitter: Jenkins
Branch: master

commit 6051a7446bdf415e18ddc5f25697759ca488be17
Author: Dougal Matthews <email address hidden>
Date: Wed Jan 11 14:23:25 2017 +0000

Fail the validation workflows when sending a "FAILED" message

    When Mistral workflows execute a second workflow (a sub-workflow
    execution), the parent workflow can't easily determine if sub-workflow
    failed. This is because the failure is communicated via a Zaqar message
    only and when a workflow ends with a successful Zaqar message it appears
    have been successful. This problem surfaces because parent workflows
    should have an "on-error" attribute but it is never called, as the
    workflow doesn't error.

    This change marks the workflow as failed if the message has the status
    "FAILED". Now when a sub-workflow fails, the task that called it should
    have the on-error triggered. Previously it would always go to
    on-success.

Change-Id: I2170afac89e8261c1e2289da8e9220694fadbad0
Related-Bug: #1651704

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-13:

#18

Reviewed: https://review.openstack.org/418961
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=12f734562c5925c9260a3f038d570a7910c51cfb
Submitter: Jenkins
Branch: master

commit 12f734562c5925c9260a3f038d570a7910c51cfb
Author: Dougal Matthews <email address hidden>
Date: Wed Jan 11 14:12:44 2017 +0000

Fail the deployment workflows when sending a "FAILED" message

    When Mistral workflows execute a second workflow (a sub-workflow
    execution), the parent workflow can't easily determine if sub-workflow
    failed. This is because the failure is communicated via a Zaqar message
    only and when a workflow ends with a successful Zaqar message it appears
    have been successful. This problem surfaces because parent workflows
    should have an "on-error" attribute but it is never called, as the
    workflow doesn't error.

    This change marks the workflow as failed if the message has the status
    "FAILED". Now when a sub-workflow fails, the task that called it should
    have the on-error triggered. Previously it would always go to
    on-success.

Change-Id: Ic1781bcf3922da61d42970cf2b42c951f83f7a74
Related-Bug: #1651704

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-14:

#19

Reviewed: https://review.openstack.org/418966
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c2fb9b025d727fdbfae27d1f25f58efae1cb7679
Submitter: Jenkins
Branch: master

commit c2fb9b025d727fdbfae27d1f25f58efae1cb7679
Author: Dougal Matthews <email address hidden>
Date: Wed Jan 11 14:21:06 2017 +0000

Fail the scale workflows when sending a "FAILED" message

    When Mistral workflows execute a second workflow (a sub-workflow
    execution), the parent workflow can't easily determine if sub-workflow
    failed. This is because the failure is communicated via a Zaqar message
    only and when a workflow ends with a successful Zaqar message it appears
    have been successful. This problem surfaces because parent workflows
    should have an "on-error" attribute but it is never called, as the
    workflow doesn't error.

    This change marks the workflow as failed if the message has the status
    "FAILED". Now when a sub-workflow fails, the task that called it should
    have the on-error triggered. Previously it would always go to
    on-success.

Change-Id: I73c5965627e45b8151f67aaf8025369bf6c870bb
Related-Bug: #1651704

Dmitry Tantsur (divius) on 2017-01-23

Changed in ironic-inspector:
status:	Incomplete → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-16: Fix included in openstack/tripleo-common 5.8.0

#20

This issue was fixed in the openstack/tripleo-common 5.8.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-17: Related fix merged to tripleo-common (master)

#21

Reviewed: https://review.openstack.org/425047
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=5435d9d8235349c986de6fef07d347431ca243ba
Submitter: Jenkins
Branch: master

commit 5435d9d8235349c986de6fef07d347431ca243ba
Author: Dougal Matthews <email address hidden>
Date: Wed Jan 25 08:06:27 2017 +0000

Use the builtin Mistral engine command 'fail' when failing workflows

    In changes 12f7345, c2fb9b0, 6051a74, c7b01eb and ab946c4 Mistral
    Workflows were marked as failed if they sent a Zaqar message with the
    status "FAILED". This is correct, but it turns out there is an easier
    way to do it.

Mistral has a engine command designed for this purpose, using it removes
some of the duplication we have incurred.

    We use on-success rather than on-complete to surface the best error on
    the workflow. If the Zaqar queue post fails, then the workflow will fail
    with this error. If, on-complete we then manually mark the workflow as
    failed it will ovrwrite the Zaqar error with a generic one.

    Closes-Bug: #1664918
    Related-Bug: #1651704
    Change-Id: I9ef9cfef1b8740a535e005769ec0c3ad67ecb103

tripleo

Errors when starting introspection are silently ignored

Bug Description

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to	Milestone
Ironic Inspector	Invalid	Undecided	Unassigned
OpenStack Compute (nova)	Invalid	Undecided	Unassigned
tripleo	Fix Released	High	Dougal Matthews	tripleo ocata-3 "ocata-3"
ironic-inspector (Ubuntu)	Invalid	Undecided	Unassigned