cli-overcloud-node-provision not passing capabilities when reserving instances

Bug #1907519 reported by David Vallee Delisle
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ironic
Invalid
Undecided
Unassigned
tripleo
Invalid
Undecided
Unassigned

Bug Description

Description
===========
I believe we should pass capabilities when reserving instances with metalsmith [a].

If we don't, we end up with instances scheduled anywhere and not respecting the scheduling hints.

Steps to reproduce
==================
export VIRTHOST=r720-2
export LIBGUESTFS_BACKEND_SETTINGS=network_bridge=virbr0
cloud_config=config/nodes/3ctlr_2comp_3ceph.yml
work_dir=/home/dvd/.quickstart.r720-2
release=master
STANDARD_ARGS="-w $work_dir -R $release --no-clone --tags all --nodes $cloud_config"
bash quickstart.sh $STANDARD_ARGS -p quickstart.yml $VIRTHOST
bash quickstart.sh $STANDARD_ARGS -I --teardown none -p quickstart-extras-undercloud.yml $VIRTHOST
bash quickstart.sh $STANDARD_ARGS -I --teardown none -p quickstart-extras-overcloud-prep.yml $VIRTHOST
bash quickstart.sh $STANDARD_ARGS -I --teardown none -p quickstart-extras-overcloud.yml $VIRTHOST

Expected result
===============
Capabilities should be honored so each role is scheduled at the right place

Actual result
=============
Bad scheduling [1]

Environment
===========
Master branch with tripleo-quickstart

Logs & Configs
==============
For example, this node [2] was imported with profile:compute but was provisioned with a ceph instance. This is problematic because it's not the same disk layout.

[a] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/playbooks/cli-overcloud-node-provision.yaml#L95-L101

[1]
~~~
Created port overcloud-cephstorage-1-ctlplane (UUID 988fcd9b-162a-4715-b7a4-7b34d7e87ecd) for node ceph-2 (UUID 86c40998-f2a6-4d25-a69e-16509efa633c) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-cephstorage-1-ctlplane'}
Created port overcloud-cephstorage-0-ctlplane (UUID ea60212d-f844-4a88-b89a-6a4328383f08) for node compute-1 (UUID 65ac0317-f8be-4ea0-90ba-e4f265c27128) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-cephstorage-0-ctlplane'}
Created port overcloud-controller-2-ctlplane (UUID b098bcbb-6392-463c-be37-7b65f73cd6fb) for node ceph-1 (UUID ba7e64c7-31f4-4928-a4a7-04acc06830b1) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-controller-2-ctlplane'}
Created port overcloud-cephstorage-2-ctlplane (UUID 8a7bee7e-5f60-48a1-8a42-6a038798b3de) for node compute-0 (UUID c12beec4-51cf-4b73-8906-ffb82db6e429) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-cephstorage-2-ctlplane'}
Created port overcloud-novacompute-0-ctlplane (UUID f1116d34-4aea-4829-9a3f-11f4696f566e) for node ceph-0 (UUID 94fa1ec5-5c63-4d3b-8230-22737e697c90) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-novacompute-0-ctlplane'}
Created port overcloud-controller-1-ctlplane (UUID 61c22794-f74a-46ba-a2c3-2d77352277c0) for node control-1 (UUID e3ca3943-2ac9-4dad-98d4-f846d06091b2) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-controller-1-ctlplane'}
Created port overcloud-novacompute-1-ctlplane (UUID df3abca1-5e33-4a8b-95ca-f7211e628aea) for node control-2 (UUID eeafb6b3-7af5-4b41-ac1d-76530db6b61a) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-novacompute-1-ctlplane'}
Created port overcloud-controller-0-ctlplane (UUID c203c78b-0e82-48b0-bb41-322ae25c414f) for node control-0 (UUID 38dbad09-79d9-4b21-a704-9b7a2e0a603f) with {'network_id': '7d649c94-112c-4bec-bc2a-d2db31313784', 'name': 'overcloud-controller-0-ctlplane'}
Attached port overcloud-cephstorage-1-ctlplane (UUID 988fcd9b-162a-4715-b7a4-7b34d7e87ecd) to node ceph-2 (UUID 86c40998-f2a6-4d25-a69e-16509efa633c)
Attached port overcloud-controller-2-ctlplane (UUID b098bcbb-6392-463c-be37-7b65f73cd6fb) to node ceph-1 (UUID ba7e64c7-31f4-4928-a4a7-04acc06830b1)
Attached port overcloud-cephstorage-0-ctlplane (UUID ea60212d-f844-4a88-b89a-6a4328383f08) to node compute-1 (UUID 65ac0317-f8be-4ea0-90ba-e4f265c27128)
Provisioning started on node ceph-2 (UUID 86c40998-f2a6-4d25-a69e-16509efa633c)
Attached port overcloud-novacompute-0-ctlplane (UUID f1116d34-4aea-4829-9a3f-11f4696f566e) to node ceph-0 (UUID 94fa1ec5-5c63-4d3b-8230-22737e697c90)
Attached port overcloud-controller-1-ctlplane (UUID 61c22794-f74a-46ba-a2c3-2d77352277c0) to node control-1 (UUID e3ca3943-2ac9-4dad-98d4-f846d06091b2)
Attached port overcloud-cephstorage-2-ctlplane (UUID 8a7bee7e-5f60-48a1-8a42-6a038798b3de) to node compute-0 (UUID c12beec4-51cf-4b73-8906-ffb82db6e429)
Attached port overcloud-novacompute-1-ctlplane (UUID df3abca1-5e33-4a8b-95ca-f7211e628aea) to node control-2 (UUID eeafb6b3-7af5-4b41-ac1d-76530db6b61a)
Provisioning started on node ceph-1 (UUID ba7e64c7-31f4-4928-a4a7-04acc06830b1)
Attached port overcloud-controller-0-ctlplane (UUID c203c78b-0e82-48b0-bb41-322ae25c414f) to node control-0 (UUID 38dbad09-79d9-4b21-a704-9b7a2e0a603f)
Provisioning started on node compute-1 (UUID 65ac0317-f8be-4ea0-90ba-e4f265c27128)
Provisioning started on node ceph-0 (UUID 94fa1ec5-5c63-4d3b-8230-22737e697c90)
Provisioning started on node control-1 (UUID e3ca3943-2ac9-4dad-98d4-f846d06091b2)
Provisioning started on node control-2 (UUID eeafb6b3-7af5-4b41-ac1d-76530db6b61a)
Provisioning started on node compute-0 (UUID c12beec4-51cf-4b73-8906-ffb82db6e429)
Provisioning started on node control-0 (UUID 38dbad09-79d9-4b21-a704-9b7a2e0a603f)
~~~

[2]
~~~
(undercloud) [stack@undercloud metalsmith]$ openstack baremetal allocation show 226d3b86-58a8-4541-bcd3-6bc75f7b2e44
+-----------------+--------------------------------------+
| Field | Value |
+-----------------+--------------------------------------+
| candidate_nodes | [] |
| created_at | 2020-12-10T02:08:09+00:00 |
| extra | {} |
| last_error | None |
| name | overcloud-cephstorage-0 |
| node_uuid | 65ac0317-f8be-4ea0-90ba-e4f265c27128 |
| owner | None |
| resource_class | baremetal |
| state | active |
| traits | [] |
| updated_at | 2020-12-10T02:08:18+00:00 |
| uuid | 226d3b86-58a8-4541-bcd3-6bc75f7b2e44 |
+-----------------+--------------------------------------+
(undercloud) [stack@undercloud ~]$ openstack baremetal node show 65ac0317-f8be-4ea0-90ba-e4f265c27128| grep cap
| instance_info | {'traits': [], 'capabilities': {'boot_option': 'local'}, 'display_name': 'overcloud-cephstorage-0', 'image_source': 'file:///var/lib/ironic/images/overcloud-full.raw', 'kernel': 'file:///var/lib/ironic/images/overcloud-full.vmlinuz', 'ramdisk': 'file:///var/lib/ironic/images/overcloud-full.initrd', 'root_gb': 48, 'configdrive': '******', 'image_disk_format': 'raw', 'image_checksum': None, 'image_os_hash_algo': 'sha256', 'image_os_hash_value': '9990e38e6de3e143bd3e837f3a31f860df7159a006fb77cdddadba70f839f646', 'image_url': '******', 'image_type': 'partition', 'preserve_ephemeral': False, 'swap_mb': 0, 'ephemeral_gb': 0, 'root_mb': 49152, 'ephemeral_mb': 0, 'ephemeral_format': None} |
| properties | {'cpus': '6', 'memory_mb': '16384', 'local_gb': '49', 'cpu_arch': 'x86_64', 'capabilities': 'boot_option:local,profile:compute,cpu_vt:true,cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true'}

tags: added: tripleo-ansible
tags: added: metalsmith
tags: added: deployment quickstart
Changed in tripleo:
status: New → Triaged
milestone: none → wallaby-2
Revision history for this message
David Vallee Delisle (valleedelisle) wrote :

After looking deeper into this, it might be because the scheduling is done with profile capabilities. This is a nova concept. I'll try with real scheduler hints and see if it works.

Revision history for this message
David Vallee Delisle (valleedelisle) wrote :

Apparently, we should move away from capabilities and use resource_class in ironic.

Revision history for this message
David Vallee Delisle (valleedelisle) wrote :
Revision history for this message
Harald Jensås (harald-jensas) wrote :

Resource class is already supported:
https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/provisioning/baremetal_provision.html#instance-and-defaults-properties

For example:
 - name: Controller
   count: 3
   defaults:
     resource_class: Controller

You can also use capabilities profile.

For example:
 - name: Controller
   count: 3
   defaults:
     profile: Controller

Changed in tripleo:
status: Triaged → Invalid
Changed in ironic:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.