Image and flavor metadata for libvirt watchdog is handled erroneously

Bug #1582693 reported by Ferenc Horváth
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glance
Triaged
Undecided
Unassigned
OpenStack Compute (nova)
Invalid
Undecided
Richil Bhalerao
OpenStack Dashboard (Horizon)
Invalid
Undecided
Unassigned

Bug Description

When I use Horizon to add the Watchdog Action (hw_watchdog_action) metadata to any flavor and I try to use that flavor to create an instance then the boot process fails. However, if I add the same metadata to an image then everything works flawlessly.

I used devstack to try to find out some details about this issue. (I was able to reproduce this issue on stable/mitaka and on master as well.) I found the following:

USE CASE #1 :: flavor + underscore

$ nova flavor-show m1.nano
+----------------------------+---------------------------------+
| Property | Value |
+----------------------------+---------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 0 |
| extra_specs | {"hw_watchdog_action": "reset"} |
| id | 42 |
| name | m1.nano |
| os-flavor-access:is_public | True |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+---------------------------------+
$ nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec-watchdog --poll vm0

Result: Fault
Message
No valid host was found. There are not enough hosts available.
Code
500
Details
File "/opt/stack/nova/nova/conductor/manager.py", line 392, in build_instances context, request_spec, filter_properties) File "/opt/stack/nova/nova/conductor/manager.py", line 436, in _schedule_instances hosts = self.scheduler_client.select_destinations(context, spec_obj) File "/opt/stack/nova/nova/scheduler/utils.py", line 372, in wrapped return func(*args, **kwargs) File "/opt/stack/nova/nova/scheduler/client/__init__.py", line 51, in select_destinations return self.queryclient.select_destinations(context, spec_obj) File "/opt/stack/nova/nova/scheduler/client/__init__.py", line 37, in __run_method return getattr(self.instance, __name)(*args, **kwargs) File "/opt/stack/nova/nova/scheduler/client/query.py", line 32, in select_destinations return self.scheduler_rpcapi.select_destinations(context, spec_obj) File "/opt/stack/nova/nova/scheduler/rpcapi.py", line 121, in select_destinations return cctxt.call(ctxt, 'select_destinations', **msg_args) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call retry=self.retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send timeout=timeout, retry=retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 470, in send retry=retry) File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 461, in _send raise result

n-sch.log shows that nova.scheduler.filters.compute_capabilities_filter removes the only host available during filtering.

Use case #2 :: flavor + colon

$ nova flavor-show m1.nano
+----------------------------+---------------------------------+
| Property | Value |
+----------------------------+---------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 0 |
| extra_specs | {"hw:watchdog_action": "reset"} |
| id | 42 |
| name | m1.nano |
| os-flavor-access:is_public | True |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+---------------------------------+
$ nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec-watchdog --poll vm1
$ virsh dumpxml instance-00000131 | grep "<watchdog" -A 3
    <watchdog model='i6300esb' action='reset'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </watchdog>

Result: The instance boots perfectly and the /dev/watchdog device is present.

USE CASE #3 :: image + underscore

$ nova flavor-show m1.nano
+----------------------------+---------+
| Property | Value |
+----------------------------+---------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| disk | 0 |
| extra_specs | {} |
| id | 42 |
| name | m1.nano |
| os-flavor-access:is_public | True |
| ram | 64 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 1 |
+----------------------------+---------+
$ nova image-show cirros-0.3.4-x86_64-uec-watchdog
+-----------------------------+--------------------------------------+
| Property | Value |
+-----------------------------+--------------------------------------+
| OS-EXT-IMG-SIZE:size | 13375488 |
| created | 2016-05-17T08:49:21Z |
| id | 863c2d04-cdd3-42c2-be78-c831c48929b3 |
| metadata hw_watchdog_action | reset |
| minDisk | 0 |
| minRam | 0 |
| name | cirros-0.3.4-x86_64-uec-watchdog |
| progress | 100 |
| status | ACTIVE |
| updated | 2016-05-17T09:10:59Z |
+-----------------------------+--------------------------------------+
$ nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec-watchdog --poll vm2
$ virsh dumpxml instance-00000132 | grep "<watchdog" -A 3
    <watchdog model='i6300esb' action='reset'>
      <alias name='watchdog0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </watchdog>

Result: The instance boots perfectly and the /dev/watchdog device is present.

USE CASE #4 :: image + colon

$ nova image-show cirros-0.3.4-x86_64-uec-watchdog
+----------------------+--------------------------------------+
| Property | Value |
+----------------------+--------------------------------------+
| OS-EXT-IMG-SIZE:size | 13375488 |
| created | 2016-05-17T08:49:21Z |
| id | 863c2d04-cdd3-42c2-be78-c831c48929b3 |
| metadata hw | watchdog_action: reset |
| minDisk | 0 |
| minRam | 0 |
| name | cirros-0.3.4-x86_64-uec-watchdog |
| progress | 100 |
| status | ACTIVE |
| updated | 2016-05-17T09:16:42Z |
+----------------------+--------------------------------------+
$ nova boot --flavor m1.nano --image cirros-0.3.4-x86_64-uec-watchdog --poll vm2
$ virsh dumpxml instance-00000133 | grep "<watchdog" -A 3

Result: Seemingly there are no errors during the boot process, but the watchdog device is not present.

Tags: scheduler
Revision history for this message
Ferenc Horváth (hferenc) wrote :
tags: added: scheduler
Revision history for this message
Sean Dague (sdague) wrote :

Please provide the scheduler logs at debug level

Changed in nova:
status: New → Incomplete
Revision history for this message
Alex Szarka (xavvior) wrote :

I attach the scheduler log at debug level.

Alex Szarka (xavvior)
Changed in nova:
status: Incomplete → Confirmed
Changed in nova:
assignee: nobody → Richil Bhalerao (richil-bhalerao)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Richil Bhalerao (richil-bhalerao) wrote :

This issue is occurring because scheduler logic (while determining whether host satisfies those extra specs) assumes that extra specs will be in the format "capability:something_something". However, format for watchdog extra specs in Horizon is "hw_watchdog_action" and not "hw:watchdog_action". Should nova support extra specs with format that use '_' instead of ':'?

Revision history for this message
Richil Bhalerao (richil-bhalerao) wrote :

Looks like for flavor extra spec 'hw_watchdog_action' format is deprecated in current release and 'hw:watchdog_action' is recommended to be used instead. Is it safe to say that fix for this bug is really in Horizon and not in Nova?

Revision history for this message
Ferenc Horváth (hferenc) wrote :

I think there was a time when some modules were changed to use the newer ':' format for extra specs, so I guess nova shouldn't support extra specs with the old '_' format.

However, if that is all, then why aren't there any errors in use case #3, i.e. when the '_' format is used in the image metadata?

Revision history for this message
Ferenc Horváth (hferenc) wrote :

At first, I'd have said that this is an easy fix in Horizon, I were (and still is) concerned because of use case #3.

Revision history for this message
Richil Bhalerao (richil-bhalerao) wrote :

@ference I think Format with '_' is only deprecated in case of flavor metadata and not image metadata which is why Use case #3 passes

Revision history for this message
Ferenc Horváth (hferenc) wrote :

Where image metadata is handled? In Nova, as well, or in Glance?

Either way, I think this mixed format situation is very unfortunate, therefore I'd say that it should be consolidated, i.e. all the related projects should use the new ':' format.

Revision history for this message
Richil Bhalerao (richil-bhalerao) wrote :

@ference Within Nova, image metadata supports format with "_" but flavor metadata does not. This is based on the deprecation warning that I see in *.po files and also based on what I see in tests here:

https://github.com/openstack/nova/blob/master/nova/tests/unit/virt/libvirt/test_driver.py#L4123
https://github.com/openstack/nova/blob/master/nova/tests/unit/virt/libvirt/test_driver.py#L4311

Revision history for this message
Ferenc Horváth (hferenc) wrote :

I still think that this is not right and metadata support should be unified, thus all projects/modules should use the ':' format.

Revision history for this message
Matt Riedemann (mriedem) wrote :

My understanding has always been the formats to use are:

1. image metadata: <capability>_<thing>

2. flavor extra specs: <capability>:<thing>

That's how it was done recently for the hyper-v uefi secure boot support:

https://review.openstack.org/#/c/209581/49/releasenotes/notes/hyperv-uefi-secure-boot-a2a617ac2c313afd.yaml

And that's how I've always seen it I guess, as long as I've been thinking about it at least, but I agree it's definitely confusing, and if openstack is going to require a certain format, it should enforce it, i.e. nova shouldn't accept adding an extra spec to a flavor without a '<capability>:' prefix. I think it's a bit harder for glance to define a specific format, besides requiring at least a single underscore, e.g. image meta of foo_bar means foo is the capability, and foo_bar_baz means foo is still the capability and 'bar_baz' is the spec, but image meta like 'foo', '_foo' or 'foo_' is an error.

Revision history for this message
Matt Riedemann (mriedem) wrote :

FYI, the docs also use the <capability>: format for flavor extra specs:

http://docs.openstack.org/admin-guide/compute-flavors.html#extra-specs

So that's what I'd expect users of the compute API to provide, including Horizon.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Also, as far as i can tell, AggregateInstanceExtraSpecsFilter and ComputeCapabilitiesFilter both require the extra spec to have a <cap>: prefix.

However, the AggregateImagePropertiesIsolation scheduler filter allows configuration of the image metadata namespace and separator, which means we can't really validate image metadata keys since they are totally configurable in the scheduler.

Revision history for this message
Matt Riedemann (mriedem) wrote :

This may actually be a problem with the glance metadata catalog, which it looks like that's what horizon uses to build the flavor extra specs dialog. For example, hw:boot_menu and hw:cpu_policy have the expected namespace: prefix, but hw_watchdog_action does not. There are other more exotic things too like capabilities:cpu_info:features and CIM_PASD_InstructionSet, the former is probably OK, the latter is probably not.

Changed in nova:
status: In Progress → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :

You can see the glance metadef prefixes defined here:

https://github.com/openstack/glance/blob/master/etc/metadefs/compute-libvirt.json#L7

    "resource_type_associations": [
        {
            "name": "OS::Glance::Image",
            "prefix": "hw_"
        },
        {
            "name": "OS::Nova::Flavor",
            "prefix": "hw:"
        }
    ],

So that's why for an image the prefix is hw_ and we get hw_boot_menu, but for a flavor extra spec we get hw:boot_menu. But if you look at the watchdog action, it's not the same:

https://github.com/openstack/glance/blob/master/etc/metadefs/compute-watchdog.json#L7

There is no prefix defined in resource_type_associations so the prefix is built into the property name:

    "properties": {
        "hw_watchdog_action": {
            "title": "Watchdog Action",
            "description": "For the libvirt driver, you can enable and set the behavior of a virtual hardware watchdog device for each flavor. Watchdog devices keep an eye on the guest server, and carry out the configured action, if the server hangs. The watchdog uses the i6300esb device (emulating a PCI Intel 6300ESB). If hw_watchdog_action is not specified, the watchdog is disabled. Watchdog behavior set using a specific image's properties will override behavior set using flavors.",
            "type": "string",
            "enum": [
                "disabled",
                "reset",
                "poweroff",
                "pause",
                "none"
            ]
        }
    }

Which is why we don't get hw:watchdog_action.

Changed in horizon:
status: New → Invalid
Changed in glance:
status: New → Triaged
Revision history for this message
Matt Riedemann (mriedem) wrote :

FYI, https://review.openstack.org/#/c/386145/ drops the support for hw_watchdog_action as a flavor extra spec in nova.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.