Pre-defined VirtCPUTopology metadata should have the same name as flavor extra specs in nova

Bug #1568191 reported by Ying Zuo
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Glance
In Progress
Medium
Ying Zuo
OpenStack Compute (nova)
Invalid
Medium
melanie witt

Bug Description

If I understand correctly, the concept of flavor extra specs has been replaced by metadata.

On nova/virt/hardware.py, it's checking for some specific flavor extra specs in _get_cpu_topology_constraints().

https://github.com/openstack/nova/blob/f396826314b9f37eb57151f0dd8a8e3b7d8a8a5c/nova/virt/hardware.py

These specific flavor extra specs are included in the pre-defined metadata json so the user can load them with command "glance-manage db_load_metadefs". However, the names are not exactly the same.

https://github.com/openstack/glance/blob/1c242032fbb26fed3a82691abb030583b4f8940b/etc/metadefs/compute-vcputopology.json

Tags: compute numa
Ying Zuo (yingzuo)
Changed in glance:
assignee: nobody → Ying Zuo (yingzuo)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to glance (master)

Fix proposed to branch: master
Review: https://review.openstack.org/303658

Changed in glance:
status: New → In Progress
Revision history for this message
Erno Kuvaja (jokke) wrote :

Seems that Nova is looking for wrong keys from image metadef when putting this information together.

tags: added: compute
Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

 > If I understand correctly, the concept of flavor extra specs has been replaced by metadata.

No, flavor extra specs are complimented by metadata. For things like CPU policies, image metadata will take precedence over flavor extra specs, but both are valid.

Revision history for this message
Ying Zuo (yingzuo) wrote :

Stephen Finucane,
Thanks for the clarification. I got that impression since Horizon doesn't have a way to set extra specs anymore so I guess that's just a Horizon change. Do you think the pre-defined metadata names defined in Glance should be consistent with the extra spec names defined on nova/virt/hardware.py?

Revision history for this message
melanie witt (melwitt) wrote :

FYI Nova has an API for setting flavor extra specs [1].

Based on the link provided from glance [2] on the key names, it does appear that nova is looking for the wrong image metadata key names for: hw_cpu_maxsockets, hw_cpu_maxcores, hw_cpu_maxthreads

[1] http://developer.openstack.org/api-ref-compute-v2.1.html#flavor-extra-specs-v2.1
[2] https://github.com/openstack/glance/blob/1c242032fbb26fed3a82691abb030583b4f8940b/etc/metadefs/compute-vcputopology.json

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
melanie witt (melwitt)
tags: added: numa
melanie witt (melwitt)
Changed in nova:
assignee: nobody → melanie witt (melwitt)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/367627

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
melanie witt (melwitt) wrote :

I put my nova patch on hold because from what nikhil explained in IRC, glance provides the metadata defs sourced from other projects, such as nova, and doesn't rely on the keys itself. So it sounds like this is more appropriate to fix on the glance side.

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Yes, let's fix this in glance please. The fully underscore separated one is what's used in the nova code [1] and in the original spec [2]. It also has more Google hits :) [3][4] This is a bug in glance and the fix should probably get backported.

[1] https://github.com/gooddata/openstack-nova/blob/6ccf12c/nova/virt/hardware.py#L298-L304
[2] https://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-vcpu-topology.html
[3] https://www.google.ie/search?q="hw_cpu_maxthreads"
[4] https://www.google.ie/search?q="hw_cpu_max_threads"

Revision history for this message
Ying Zuo (yingzuo) wrote :

My patch in Glance is still on. https://review.openstack.org/#/c/303658/.
Flavio Percoco and Erno Kuvaja, can you please take another look?

Changed in glance:
importance: Undecided → Medium
Revision history for this message
Nikhil Komawar (nikhil-komawar) wrote :

Yep, we need to fix this in Glance. It has an upgrade impact and we will have to consider operators' choices. But considering the keys (as per the change) have been in the nova code since nov 2014, it should be rather safe to fix and backport.

https://github.com/openstack/nova/blob/96ab8ab058a2b39d9c9c0f5f5af7fa5b6e470b94/nova/virt/hardware.py#L293-L299 (see date on the commit).

Revision history for this message
melanie witt (melwitt) wrote :

Marking as Invalid for Nova as Nikhil has commented on fixing it in Glance.

Changed in nova:
status: In Progress → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by melanie witt (<email address hidden>) on branch: master
Review: https://review.openstack.org/367627
Reason: As discussed on the launchpad bug, this is more appropriate to fix in Glance.

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

I have a few questions around this issue.

The Compute os-extra_specs API allows you to put arbitrary key/value pairs on a flavor. How do operators discover what keys actually work with Nova, so they know to use hw_cpu_max_sockets instead of hw_cpu_maxsockets?

For interoperability, it makes sense to use the Glance metadefs catalog to define the VirtCPUTopology metadata keys and value datatypes. But operators who are doing that now are apparently putting non-effective metadata keys on their images. Would the non-effectiveness be obvious to operators, so that they'd correct the keys manually, or could this go unnoticed?

I'm asking because if you have a large number of images and server snapshot images around, do we expect an operator to migrate the keys on all these images? A better solution might be to change the nova code to accept either keyname. On the other hand, if the ineffectiveness is so obvious that the ops would have corrected their systems already, then maybe we don't have to worry about backward compatability so much. But as Flavio and Erno already commented on the patch, I'm worried about the effect this change will have on current operators.

It's an empirical question, we can go to the ML to ask. I'd like to know about the current situation with respect to discoverability, though.

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

My 2 cents

> The Compute os-extra_specs API allows you to put arbitrary key/value
> pairs on a flavor. How do operators discover what keys actually work
> with Nova, so they know to use hw_cpu_max_sockets instead of
> hw_cpu_maxsockets?

I guess the documentation would be the best resource here. However,
unlike image metadata, there is no way to validate flavor extra keys.
The only way to max sure the things work is to use exactly what's in
the docs and perhaps validate that things work as expected.

> For interoperability, it makes sense to use the Glance metadefs
> catalog to define the VirtCPUTopology metadata keys and value
> datatypes. But operators who are doing that now are apparently
> putting non-effective metadata keys on their images. Would the
> non-effectiveness be obvious to operators, so that they'd
> correct the keys manually, or could this go unnoticed?

It depends on how important they consider the various properties.
These particular properties are upper limits on how a many
cores/cpus/sockets a user can define, and if these limits are important
then I'd hope the operator would have tested them.

> I'm asking because if you have a large number of images and server
> snapshot images around, do we expect an operator to migrate the keys
> on all these images? A better solution might be to change the nova
> code to accept either keyname.

I think we should expect them to change it, yes. At the moment, the
flavor key provided doesn't do anything: I think it better to keep
doing nothing than possibly break installs.

> On the other hand, if the ineffectiveness is so obvious that the ops
> would have corrected their systems already, then maybe we don't have
> to worry about backward compatability so much. But as Flavio and
> Erno already commented on the patch, I'm worried about the effect
> this change will have on current operators.

Yeah, if operators really care about this then it should have been
validated. If not, see above.

Revision history for this message
Travis Tripp (travis-tripp) wrote :

Many people may not realize that basically there is no effect of the added metadata.

I believe that the correct fix is to do it in Glance. Glance has the ability to migrate metadefs in the catalog, but not a migrate image metadata command. This probably could be added into the glance manage commands for metadefs now and it could also apply the migration to all resource types affected (flavors, volumes (image properties), images, snapshots).

A few reasons this catalog even came into existence:
 * Docs were out of date and inaccurate
 * The prefixes applied vary based on the resource type - e.g. image / volume use _ and flavor uses :
 * No ability for operators to manage visibility of available metadata (public / admin only)
 * Operators needed an ability to define their own metadata and easily correlate between nova scheduler filters (like host aggregates / flavors / images), across regions, and clouds.

It would be better if Nova could publish them to Glance or Glance was able to discover them / validate them from the source service if defined. (for example: [0] [1])

Samples from the original proposal:

[0] https://wiki.openstack.org/wiki/Graffiti/Architecture#Use_Case_Example.CB.90_Compute_Capabilities

[1] https://wiki.openstack.org/wiki/Graffiti/Architecture#Additional_Details

Revision history for this message
Daniel Berrange (berrange) wrote :

> The Compute os-extra_specs API allows you to put arbitrary key/value
> pairs on a flavor. How do operators discover what keys actually work
> with Nova, so they know to use hw_cpu_max_sockets instead of hw_cpu_maxsockets?

Historically there was nothing formal defined - things were just mentioned in docs in an adhoc manner. I don't think we're much better today.

> For interoperability, it makes sense to use the Glance metadefs catalog
> to define the VirtCPUTopology metadata keys and value datatypes. But
> operators who are doing that now are apparently putting non-effective
> metadata keys on their images. Would the non-effectiveness be obvious to
> operators, so that they'd correct the keys manually, or could this go
> unnoticed?

I don't know where the Glance metadefs catalog comes from, but from Nova's
POV it is certainly not the canonical source. Nova has a versioned object
that formally declares what is accepted by Nova, providing reasonably strong
validation on the field data formats. Any other source of information is at
best a duplicate of this object model, or at worse plain wrong.

https://github.com/openstack/nova/blob/master/nova/objects/image_meta.py

What's missing is that we ought to be auto-generating some documentation from
the ImageMeta object model for each relase, so there's clear info on what's
 supported.

> I'm asking because if you have a large number of images and server snapshot
> images around, do we expect an operator to migrate the keys on all these
> images? A better solution might be to change the nova code to accept either
> keyname.

Nova already has back-compat hacks to deal with some old key names, but this is for the case where old versions of nova actually used those old key names. It sounds like here we're talking about things that we just mistakenly documented somewhere and were never actually honoured by nova. For that I don't see any compelling reason to add back compat, as they have would have never worked.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on glance (master)

Change abandoned by Ying Zuo (<email address hidden>) on branch: master
Review: https://review.openstack.org/303658
Reason: doesn't look like we want to move forward with it

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Maciej Kucia (<email address hidden>) on branch: master
Review: https://review.openstack.org/497225
Reason: stale

Revision history for this message
Maciej Kucia (maciejkucia) wrote :

This issue is still alive and kicking in Rocky release.

Revision history for this message
Maciej Kucia (maciejkucia) wrote :

This how to mitigate using kolla-ansible.

ansible/roles/glance/tasks/config.yml

- name: Copying over compute-vcputopology.json
  become: true
  template:
    src: "{{ role_path }}/templates/compute-vcputopology.json.j2"
    dest: "{{ node_config_directory }}/{{ item.key }}/compute-vcputopology.json"
  register: glance_vcpu_confs
  when:
    - item.value.enabled | bool
    - inventory_hostname in groups[item.value.group]
  with_dict: "{{ glance_services }}"
  notify:
    - Restart glance-api container

ansible/roles/glance/handlers/main.yml

- name: Drop Glance metadata from DB
  vars:
    service_name: "glance-api"
    service: "{{ glance_services[service_name] }}"
  command: "docker exec {{ service.container_name }} glance-manage db_unload_metadefs"
  listen: "glance db reload metadefs"
  when:
    - inventory_hostname in groups[service.group][0]

- name: Set Glance metadata in DB
  vars:
    service_name: "glance-api"
    service: "{{ glance_services[service_name] }}"
  command: "docker exec {{ service.container_name }} glance-manage db_load_metadefs"
  listen: "glance db reload metadefs"
  when:
    - inventory_hostname in groups[service.group][0]

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.