Resource tracker should report virt driver stats

Bug #1348288 reported by Nicholas Randon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Paul Murray

Bug Description

sha1 Nova at: 106fb458c7ac3cc17bb42d1b83ec3f4fa8284e71
sha1 ironic at: 036c79e38f994121022a69a0bc76917e0048fd63

The ironic driver passes stats to nova's resource tracker in get_available_resources(). Sometimes these appear to get through to the database without modification, sometimes they seem to be replaced entirely by other stats generated by the resource tracker. The correct behaviour should be to combine the two.

As an example, the following query on the compute_nodes table in nova's database shows the contents for a tripleo system (all nodes are ironic):

mysql> select hypervisor_hostname, stats from compute_nodes;
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| hypervisor_hostname | stats |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 4e014e26-2f90-4a91-a6f0-c1978df88369 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| fadb50bf-26ec-420c-a13f-f182e38569d6 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| ffe5a5bf-7151-468c-b9bb-980477e5f736 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 752966ea-17f8-4d6d-87a4-03c91cb65354 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| f2f0ecb1-6234-4975-808f-a17534c9ae6c | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 9adf4551-24f0-43a7-9267-a20cfa309137 | {"cpu_arch": "amd64", "ironic_driver": "ironic.nova.virt.ironic.driver.IronicDriver"} |
| 1bd13fc5-4938-4781-9680-ad1e0ccec77c | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 88a39f5d-6174-47c9-9817-13d08bf2e079 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| ec6b5dc6-de38-4e23-a967-b87c10da37e3 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| ac52fd79-e0b9-4749-b794-590d5c181b4a | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| a1b81342-ed57-4310-8d5b-a2aa48718f1f | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 0588e463-748a-4248-9110-6e18988cfa4e | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 8f73d8dc-5d8c-47b0-a866-b829edc3667f | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| bac38b1d-f7f9-4770-9195-ff204a0c05c3 | {"cpu_arch": "amd64", "ironic_driver": "ironic.nova.virt.ironic.driver.IronicDriver"} |
| 62cc33f7-701b-47f6-8f50-3f7c1ca0f0a3 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| af7f79bf-b2c1-405b-9bc7-5370b93b08cf | {"cpu_arch": "amd64", "ironic_driver": "ironic.nova.virt.ironic.driver.IronicDriver"} |
| 4615c72a-9ea0-433e-8c52-308163112f89 | {"cpu_arch": "amd64", "ironic_driver": "ironic.nova.virt.ironic.driver.IronicDriver"} |
| 680e6aa7-9a84-41de-94ba-b761d48b4087 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 2c2d5b87-1be2-4e47-aabe-6822c569446c | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 6f653502-d8ed-4763-b418-3ccfcc430c24 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 006aff97-d3e6-49c8-93f0-4f4c5af1231d | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| addf8ff8-52fe-49da-a4b2-5688554e9161 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| b4b7f7ad-4adc-4dc9-9afb-a9966e2be141 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| e2cb81ca-314f-4436-80fd-e154ca3e9ccc | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 7a7266a9-d72e-49be-b51a-4053ed251b41 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| dc5c63b6-d576-46c9-aa16-0537450cdbd8 | {"cpu_arch": "amd64", "ironic_driver": "ironic.nova.virt.ironic.driver.IronicDriver"} |
| 18794409-10d0-4946-9356-66cd5ab8472e | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| 8135a1be-c8d8-4cea-a381-8ab8be8b15c7 | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
| ed281d00-c16a-474d-8adb-ef525a9045fa | {"num_task_None": 1, "io_workload": 0, "num_instances": 1, "num_vm_active": 1, "num_vcpus_used": 24, "num_os_type_None": 1, "num_proj_505908300744403496b2e64b06606529": 1} |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
29 rows in set (0.00 sec)

The nodes with ironic stats have not had any instances created on them, the ones with resource tracker stats but no ironic stats have instances on them. This is shown by the following:

ironic node-list
+--------------------------------------+--------------------------------------+-------------+-----------------+-------------+
| uuid | instance_uuid | power_state | provision_state | maintenance |
+--------------------------------------+--------------------------------------+-------------+-----------------+-------------+
| dc5c63b6-d576-46c9-aa16-0537450cdbd8 | None | power off | None | False |
| fadb50bf-26ec-420c-a13f-f182e38569d6 | a703ae98-2398-445b-91bf-48f368c5b82a | power on | active | False |
| 9adf4551-24f0-43a7-9267-a20cfa309137 | None | power off | None | False |
| f2f0ecb1-6234-4975-808f-a17534c9ae6c | 308b1375-f8b8-42e5-bf20-143303975135 | power on | active | False |
| 1bd13fc5-4938-4781-9680-ad1e0ccec77c | 4e8c16c4-d0f6-43b8-8c8f-d110d89ac16f | power on | active | False |
| a1b81342-ed57-4310-8d5b-a2aa48718f1f | 01cbe85a-1b12-40ca-ac99-5e7b062b1b50 | power on | active | False |
| 18794409-10d0-4946-9356-66cd5ab8472e | 4d764e7e-a123-4390-9c73-0c05a52f5f23 | power on | active | False |
| ec6b5dc6-de38-4e23-a967-b87c10da37e3 | 87636b3b-b7b0-4e79-bd5f-5189eb5b1134 | power on | active | False |
| ac52fd79-e0b9-4749-b794-590d5c181b4a | 03c06c2f-b606-4ddd-a205-573ea036b5b8 | power on | active | False |
| 4e014e26-2f90-4a91-a6f0-c1978df88369 | 0189aa0e-5d05-4f6d-8a77-d5869b5c79f2 | power on | active | False |
| ed281d00-c16a-474d-8adb-ef525a9045fa | 11e03336-fb12-4448-a8de-b38f82d8b282 | power on | active | False |
| bac38b1d-f7f9-4770-9195-ff204a0c05c3 | None | power off | None | False |
| 62cc33f7-701b-47f6-8f50-3f7c1ca0f0a3 | ab08556b-ce7f-46f3-bb10-91771426d977 | power on | active | False |
| 8135a1be-c8d8-4cea-a381-8ab8be8b15c7 | 4051cc97-bb28-4dbf-b018-d9a61e73a269 | power on | active | False |
| 6f653502-d8ed-4763-b418-3ccfcc430c24 | 855e24b6-f9ae-441b-8520-42d4df9f8703 | power on | active | False |
| 2c2d5b87-1be2-4e47-aabe-6822c569446c | 141e6055-3e2c-4376-aa8b-39ad5c63e8bc | power on | active | False |
| 8f73d8dc-5d8c-47b0-a866-b829edc3667f | a62cab3f-56f7-47f2-813b-23a3255cad15 | power on | active | False |
| 0588e463-748a-4248-9110-6e18988cfa4e | 8fe18a1b-d753-4558-a9e1-b24f552f8e12 | power on | active | False |
| e2cb81ca-314f-4436-80fd-e154ca3e9ccc | 839a77e1-2a9e-4db4-94d9-d68903fe028c | power on | active | False |
| ffe5a5bf-7151-468c-b9bb-980477e5f736 | 205b6dbf-5f75-4d2c-a3b0-1be45d93d493 | power on | active | False |
| 752966ea-17f8-4d6d-87a4-03c91cb65354 | 09d2ad8c-f813-4b1a-9501-530462240657 | power on | active | False |
| 006aff97-d3e6-49c8-93f0-4f4c5af1231d | aa6c8024-ba80-4dc8-810c-c6fae57218c7 | power on | active | False |
| 7a7266a9-d72e-49be-b51a-4053ed251b41 | a76d7b2f-cf8c-4673-96e4-dcd9f2ea3bb5 | power on | active | False |
| 88a39f5d-6174-47c9-9817-13d08bf2e079 | 9a2904d2-d364-4084-a40a-3fbc65d90059 | power on | active | False |
| addf8ff8-52fe-49da-a4b2-5688554e9161 | a86316c0-bdf0-4d6e-81ba-f44da63c906c | power on | active | False |
| b4b7f7ad-4adc-4dc9-9afb-a9966e2be141 | 03b0d70d-42c4-426e-adab-09df927f30bb | power on | active | False |
| 680e6aa7-9a84-41de-94ba-b761d48b4087 | 808dbf2e-8f40-4d6d-9d7b-2c74e3194a6d | power on | active | False |
| 4615c72a-9ea0-433e-8c52-308163112f89 | None | power off | None | False |
| af7f79bf-b2c1-405b-9bc7-5370b93b08cf | None | power off | None | False |
+--------------------------------------+--------------------------------------+-------------+-----------------+-------------+

Revision history for this message
Nicholas Randon (nicholas-randon) wrote :
Revision history for this message
Paul Murray (pmurray) wrote :

The stats are over-written every time the resource tracker's own copy of stats is updated by the line:

        resources['stats'] = jsonutils.dumps(self.stats)

This occurs in the following methods in nova.compute.resource_tracker:

ResourceTracker._update_from_instance()
ResourceTracker._update_from_migration()
ResourceTracker._drop_resize_claim()

The driver stats are picked up in ResourceTracker.update_available_resource() during the periodic task, which calls the above methods if there are instances or migrations on the compute node.

So the code seems to report the driver stats during the periodic task if there are no instances or migrations on the node. If there are any instances or migrations on the host, the driver view of stats will be over-written by the resource tracker view of stats.

Revision history for this message
Paul Murray (pmurray) wrote :

This bug was replaced with https://bugs.launchpad.net/nova/+bug/1347795 when the "Add extensible resources to resource tracker" change was merged, see: https://review.openstack.org/#/c/71557/

That patch moved the over-write so that it always happened.

That changed was reverted, so this bug has been re-introduced, see: https://review.openstack.org/#/c/109033

Tripleo makes use of the stats field provided by ironic and has been experiencing scheduling problems.

Paul Murray (pmurray)
Changed in nova:
assignee: nobody → Paul Murray (pmurray)
Revision history for this message
Paul Murray (pmurray) wrote :

The simplest fix is to keep a copy of the stats reported by the driver as self.driver_stats to compliment the resource tracker's own view in self.stats. Then generate the union of the two whenever one or the other is updated.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/109489

Changed in nova:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/109489
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c363dae6a2b878db6801b502cced1fcc6aad2d0c
Submitter: Jenkins
Branch: master

commit c363dae6a2b878db6801b502cced1fcc6aad2d0c
Author: Paul Murray <email address hidden>
Date: Fri Jul 25 06:03:35 2014 +0100

    Fix Resource tracker should report virt driver stats

    If the virt driver provides any data for resource stats it is
    lost whenever the resource tracker updates its own view of stats.
    Moreover, if the resource tracker has not instances to track it
    only reports the driver's view, which might be nothing.

    This fix adds the driver's view of stats to the resource tracker
    stats to make sure they are correctly handled.

    Change-Id: Icb19148660bca542a8120ecab064551d67ac28af
    Closes-bug: #1348288

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-3 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.