VM boot up failed by Insufficient compute resources

Bug #1854220 reported by Peng Peng on 2019-11-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
ya.wang

Bug Description

Brief Description
-----------------
Boot up 5 or more VMs, then delete these VMs. Right after, try to Boot up another VM failed. The fault message shows "Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology".

Severity
--------
Major

Steps to Reproduce
------------------
boot up 5 VMs
delete 5 VMs
boot up VM again

TC-name:
nova/test_prioritized_evacuation.py::TestPrioritizedVMEvacuation::test_prioritized_vm_evacuations[reboot-False-diff_priority-same_vcpus-same_mem-same_root_disk-same_swap_disk]

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Intermittent (3/5)

System Configuration
--------------------
Multi-node system

Lab-name: WCP_3-6

Branch/Pull Time/Commit
-----------------------
2019-11-21_20-00-00

Last Pass
---------
2019-08-23_20-59-00

Timestamp/Logs
--------------
[2019-11-23 03:21:26,350] 311 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne stop 85f81e5b-ca6b-4a85-a67c-d4d952e0499f a2615cc1-7af4-4a9a-98a8-63c0cdfcfc21 795fe009-5da9-4090-a230-3e5ba80f38e8 77a0104a-004d-435e-a467-9b016c1c4815 145cbde0-859e-485c-8304-8dbae0228d1c'

[2019-11-23 03:21:35,686] 311 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server list --a'
[2019-11-23 03:21:37,584] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------------------+---------+------------------------------------------------------------+-------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+----------------------+---------+------------------------------------------------------------+-------+--------------+
| 85f81e5b-ca6b-4a85-a67c-d4d952e0499f | tenant1-pve_vm_4-107 | SHUTOFF | tenant1-net0=172.16.0.206; tenant2-mgmt-net=192.168.220.88 | | pve_flavor_4 |
| a2615cc1-7af4-4a9a-98a8-63c0cdfcfc21 | tenant1-pve_vm_3-106 | SHUTOFF | tenant1-net0=172.16.0.170; tenant2-mgmt-net=192.168.220.91 | | pve_flavor_3 |
| 795fe009-5da9-4090-a230-3e5ba80f38e8 | tenant1-pve_vm_2-105 | SHUTOFF | tenant1-net0=172.16.0.223; tenant2-mgmt-net=192.168.220.84 | | pve_flavor_2 |
| 77a0104a-004d-435e-a467-9b016c1c4815 | tenant1-pve_vm_1-104 | SHUTOFF | tenant1-net0=172.16.0.151; tenant2-mgmt-net=192.168.220.74 | | pve_flavor_1 |
| 145cbde0-859e-485c-8304-8dbae0228d1c | tenant1-pve_vm_0-103 | SHUTOFF | tenant1-net0=172.16.0.195; tenant2-mgmt-net=192.168.220.79 | | pve_flavor_0 |
+--------------------------------------+----------------------+---------+------------------------------------------------------------+-------+--------------+
controller-0:~$

# delete all VMs

[2019-11-23 03:21:53,271] 311 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server list --a'
[2019-11-23 03:21:54,955] 433 DEBUG MainThread ssh.expect :: Output:

controller-0:~$
[2019-11-23 03:21:55,011] 225 DEBUG MainThread table_parser.table :: No table returned
[2019-11-23 03:21:55,012] 2901 INFO MainThread vm_helper.delete_vms:: VM(s) deleted successfully: ['795fe009-5da9-4090-a230-3e5ba80f38e8', '145cbde0-859e-485c-8304-8dbae0228d1c', '85f81e5b-ca6b-4a85-a67c-d4d952e0499f', '77a0104a-004d-435e-a467-9b016c1c4815', 'a2615cc1-7af4-4a9a-98a8-63c0cdfcfc21']

[2019-11-23 03:22:49,700] 476 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2019-11-23 03:22:49,700] 311 DEBUG MainThread ssh.send :: Send 'nova --os-username 'tenant1' --os-password 'Li69nux*' --os-project-name tenant1 --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne boot --poll --boot-volume=d364267b-9d15-4535-80f1-66f6e1013e11 --flavor=f52b4a9d-9cf8-4b72-ad9b-5756f3197f23 --key-name=keypair-tenant1 --availability-zone=cgcsauto:compute-1 --nic net-id=4831cc88-1d09-4e47-b81d-697a064a1ef5 --nic net-id=36498e81-3f8e-4b0a-815b-1ccc308c4bf4 tenant1-pve_vm_0-108'
[2019-11-23 03:22:57,115] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+-------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | cgcsauto |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hostname | tenant1-pve-vm-0-108 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | |
| OS-EXT-SRV-ATTR:kernel_id | |
| OS-EXT-SRV-ATTR:launch_index | 0 |
| OS-EXT-SRV-ATTR:ramdisk_id | |
| OS-EXT-SRV-ATTR:reservation_id | r-s3efz9sf |
| OS-EXT-SRV-ATTR:root_device_name | - |
| OS-EXT-SRV-ATTR:user_data | - |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | 2SQ3c5u5rVZC |
| config_drive | |
| created | 2019-11-23T03:25:32Z |
| description | - |
| flavor:disk | 1 |
| flavor:ephemeral | 0 |
| flavor:extra_specs | {"hw:mem_page_size": "large"} |
| flavor:original_name | pve_flavor_0 |
| flavor:ram | 1536 |
| flavor:swap | 0 |
| flavor:vcpus | 3 |
| hostId | |
| host_status | |
| id | 4ae6d555-562a-4bae-8983-14a15ee4f8e5 |
| image | Attempt to boot from volume - no image supplied |
| key_name | keypair-tenant1 |
| locked | False |
| metadata | {} |
| name | tenant1-pve_vm_0-108 |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| server_groups | [] |
| status | BUILD |
| tags | [] |
| tenant_id | e6eda99366be44938afcdbbb941447f0 |
| trusted_image_certificates | - |
| updated | 2019-11-23T03:25:32Z |
| user_id | 6ba2a9882ccb476cb1b74f99a92daef7 |
+--------------------------------------+-------------------------------------------------+

Server building... 0% complete
Error building server
ERROR (ResourceInErrorState):
controller-0:~$

[2019-11-23 03:22:57,172] 311 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server show 4ae6d555-562a-4bae-8983-14a15ee4f8e5'
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| created | 2019-11-23T03:25:32Z |
| fault | {u'message': u'Insufficient compute resources: Requested instance NUMA topology cannot fit the given host NUMA topology.', u'code': 500, u'details': u' File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 1984, in _do_build_and_run_instance\n filter_properties, request_spec)\n File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 2280, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=e.format_message())\n', u'created': u'2019-11-23T03:25:33Z'} |
| flavor | pve_flavor_0 (f52b4a9d-9cf8-4b72-ad9b-5756f3197f23) |

Test Activity
-------------
Regression Testing

Peng Peng (ppeng) wrote :
Peng Peng (ppeng) wrote :
Ghada Khalil (gkhalil) on 2019-11-27
tags: added: stx.distro.openstack
Changed in starlingx:
assignee: nobody → yong hu (yhu6)
Yang Liu (yliu12) on 2019-11-29
tags: added: stx.retestneeded
Ghada Khalil (gkhalil) wrote :

@Yong, please triage and decide whether this is gating for stx.3.0 or not.

yong hu (yhu6) wrote :

@shuquan, could you help assign someone working on this LP?

Changed in starlingx:
assignee: yong hu (yhu6) → Shuquan Huang (shuquan)
tags: added: stx.3.0
yong hu (yhu6) on 2019-12-03
Changed in starlingx:
importance: Undecided → Medium
ya.wang (ya.wang) on 2019-12-03
Changed in starlingx:
assignee: Shuquan Huang (shuquan) → ya.wang (ya.wang)
ya.wang (ya.wang) wrote :

1. From L85-88 of log "ALL_NODES_20191123.070135/compute-1_20191123.070135/var/log/containersddd/nova-compute-compute-1-532206f8-bvgtw_openstack_nova-compute-223a37792011a6a56606409ab5b0e22b99cc9a27fef7478630a8242ea78b119d.log", the compute node 1 has 67099940 KiB memory, of 6004581 pages has 4KiB page size, 0 page has 2048KiB page size, 41 pages has 1048576KiB(1GiB) page size.

2. From L329098 and L332060 of log "TIS_AUTOMATION.log", all flavors are 1 vCPU, 1 GiB memory when first created; in second creation, the flavor pve_flavor_0 (which cause creation failed) is 1 vCPU, 1.5 GiB memory.
All flavors above have attribute "hw:mem_page_size=large", this attribute will enable NUMA by default. [1]

3. When nova create NUMA instance, it will try to generate a suitable NUMA topology for the instance.[2] In this scene the topology should match the requested page size. It needs that the free memory of the suitable page size large than instance memory(flavor ram), and the instance memory can be divided by the page size.[3][4]
The attribute "hw:mem_page_size=large" request huge page, so 4 KiB page is ignored, but the instance memory is 1.5 GiB, which is obviously not divisible by 1GiB, therefore the creation fails.

Suggestion: Set the flavor ram to 2^n(n>=0) GiB, which can be divided by 1 GiB.

[1]: https://docs.openstack.org/nova/train/admin/huge-pages.html#customizing-instance-huge-pages-allocations
[2]: https://github.com/openstack/nova/blob/stable%2Ftrain/nova/compute/claims.py#L139
[3]: https://github.com/openstack/nova/blob/stable%2Ftrain/nova/virt/hardware.py#L674
[4]: https://github.com/openstack/nova/blob/stable%2Ftrain/nova/objects/numa.py#L134

ya.wang (ya.wang) wrote :

In fact, nova will check NUMA topology during schedule and filter the mismatched compute nodes.

But in this case, you specified '--availability-zone=cgcsauto:compute-1' when calling create API(L329460 in TIS_AUTOMATION.log), nova will translate it to 'force_host=compute-1' and skip filters. [1][2]

[1] https://github.com/openstack/nova/blob/master/nova/api/openstack/compute/servers.py#L663
[2] https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L579

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers