ReshapeForPCPUsTest.test_vcpu_to_pcpu_reshape intermittently fails with "Cannot 'migrate' instance while it is in vm_state building"

Bug #1850514 reported by Matt Riedemann on 2019-10-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Train
Medium
Unassigned

Bug Description

Seen here:

https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f05/691390/6/check/nova-tox-functional/f059dc0/testr_results.html.gz

ft1.1: nova.tests.functional.libvirt.test_numa_servers.ReshapeForPCPUsTest.test_vcpu_to_pcpu_reshapetesttools.testresult.real._StringException: pythonlogging:'': {{{
2019-10-29 17:33:28,372 WARNING [placement.db_api] TransactionFactory already started, not reconfiguring.
2019-10-29 17:33:28,423 INFO [nova.service] Starting conductor node (version 20.1.0)
2019-10-29 17:33:28,478 INFO [nova.service] Starting scheduler node (version 20.1.0)
2019-10-29 17:33:29,023 INFO [nova.virt.driver] Loading compute driver 'libvirt.LibvirtDriver'
2019-10-29 17:33:29,026 WARNING [os_brick.initiator.connectors.remotefs] Connection details not present. RemoteFsClient may not initialize properly.
2019-10-29 17:33:29,028 INFO [nova.service] Starting compute node (version 20.1.0)
2019-10-29 17:33:29,030 WARNING [nova.virt.libvirt.driver] The 'vcpu_pin_set' config option has been deprecated and will be removed in a future release. When defined, 'vcpu_pin_set' will be used to calculate 'VCPU' inventory and schedule instances that have 'VCPU' allocations. If you wish to define specific host CPUs to be used for 'VCPU' or 'PCPU' inventory, you must migrate the 'vcpu_pin_set' config option value to '[compute] cpu_shared_set' and '[compute] cpu_dedicated_set', respectively, and undefine 'vcpu_pin_set'.
2019-10-29 17:33:29,038 WARNING [nova.virt.libvirt.driver] my_ip address (38.108.68.36) was not found on any of the interfaces:
2019-10-29 17:33:29,039 WARNING [nova.virt.libvirt.driver] Running Nova with a libvirt version less than 4.0.0 is deprecated. The required minimum version of libvirt will be raised to 4.0.0 in the next release.
2019-10-29 17:33:29,039 WARNING [nova.virt.libvirt.driver] Running Nova with a QEMU version less than 2.11.0 is deprecated. The required minimum version of QEMU will be raised to 2.11.0 in the next release.
2019-10-29 17:33:29,059 WARNING [nova.compute.manager] Compute node test_compute0 not found in the database. If this is the first time this service is starting on this host, then you can ignore this warning.
2019-10-29 17:33:29,066 INFO [nova.compute.manager] Looking for unclaimed instances stuck in BUILDING status for nodes managed by this host
2019-10-29 17:33:29,081 WARNING [nova.compute.manager] No compute node record found for host test_compute0. If this is the first time this service is starting on this host, then you can ignore this warning.
2019-10-29 17:33:29,094 WARNING [nova.compute.resource_tracker] No compute node record for test_compute0:test_compute0
2019-10-29 17:33:29,101 INFO [nova.compute.resource_tracker] Compute node record created for test_compute0:test_compute0 with uuid: cada8f90-3f3d-4f22-8312-770a0a818828
2019-10-29 17:33:29,179 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers?in_tree=cada8f90-3f3d-4f22-8312-770a0a818828" status: 200 len: 26 microversion: 1.14
2019-10-29 17:33:29,192 INFO [placement.requestlog] 127.0.0.1 "POST /placement/resource_providers" status: 200 len: 836 microversion: 1.20
2019-10-29 17:33:29,193 INFO [nova.scheduler.client.report] [req-3fabe3ee-b8f4-4d5e-9c62-185c0ae18c74] Created resource provider record via placement API for resource provider with UUID cada8f90-3f3d-4f22-8312-770a0a818828 and name test_compute0.
2019-10-29 17:33:29,195 INFO [nova.virt.libvirt.host] kernel doesn't support AMD SEV
2019-10-29 17:33:29,227 INFO [placement.requestlog] 127.0.0.1 "PUT /placement/resource_providers/cada8f90-3f3d-4f22-8312-770a0a818828/inventories" status: 200 len: 405 microversion: 1.26
2019-10-29 17:33:29,241 INFO [placement.requestlog] 127.0.0.1 "GET /placement/traits?name=in:COMPUTE_IMAGE_TYPE_ARI,COMPUTE_IMAGE_TYPE_AKI,COMPUTE_VOLUME_EXTEND,COMPUTE_IMAGE_TYPE_AMI,COMPUTE_IMAGE_TYPE_RAW,HW_CPU_X86_VMX,COMPUTE_NET_ATTACH_INTERFACE,COMPUTE_VOLUME_MULTI_ATTACH,HW_CPU_X86_AESNI,HW_CPU_HYPERTHREADING,COMPUTE_IMAGE_TYPE_QCOW2,COMPUTE_VOLUME_ATTACH_WITH_TAG,COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG,COMPUTE_IMAGE_TYPE_ISO,COMPUTE_TRUSTED_CERTS,COMPUTE_DEVICE_TAGGING" status: 200 len: 447 microversion: 1.6
2019-10-29 17:33:29,266 INFO [placement.requestlog] 127.0.0.1 "PUT /placement/resource_providers/cada8f90-3f3d-4f22-8312-770a0a818828/traits" status: 200 len: 482 microversion: 1.6
2019-10-29 17:33:29,284 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers?name=test_compute0" status: 200 len: 440 microversion: 1.0
2019-10-29 17:33:29,328 INFO [nova.virt.driver] Loading compute driver 'libvirt.LibvirtDriver'
2019-10-29 17:33:29,332 WARNING [os_brick.initiator.connectors.remotefs] Connection details not present. RemoteFsClient may not initialize properly.
2019-10-29 17:33:29,334 INFO [nova.service] Starting compute node (version 20.1.0)
2019-10-29 17:33:29,336 WARNING [nova.virt.libvirt.driver] The 'vcpu_pin_set' config option has been deprecated and will be removed in a future release. When defined, 'vcpu_pin_set' will be used to calculate 'VCPU' inventory and schedule instances that have 'VCPU' allocations. If you wish to define specific host CPUs to be used for 'VCPU' or 'PCPU' inventory, you must migrate the 'vcpu_pin_set' config option value to '[compute] cpu_shared_set' and '[compute] cpu_dedicated_set', respectively, and undefine 'vcpu_pin_set'.
2019-10-29 17:33:29,342 WARNING [nova.virt.libvirt.driver] my_ip address (38.108.68.36) was not found on any of the interfaces:
2019-10-29 17:33:29,343 WARNING [nova.virt.libvirt.driver] Running Nova with a libvirt version less than 4.0.0 is deprecated. The required minimum version of libvirt will be raised to 4.0.0 in the next release.
2019-10-29 17:33:29,343 WARNING [nova.virt.libvirt.driver] Running Nova with a QEMU version less than 2.11.0 is deprecated. The required minimum version of QEMU will be raised to 2.11.0 in the next release.
2019-10-29 17:33:29,363 WARNING [nova.compute.manager] Compute node test_compute1 not found in the database. If this is the first time this service is starting on this host, then you can ignore this warning.
2019-10-29 17:33:29,368 INFO [nova.compute.manager] Looking for unclaimed instances stuck in BUILDING status for nodes managed by this host
2019-10-29 17:33:29,388 WARNING [nova.compute.manager] No compute node record found for host test_compute1. If this is the first time this service is starting on this host, then you can ignore this warning.
2019-10-29 17:33:29,397 WARNING [nova.compute.resource_tracker] No compute node record for test_compute1:test_compute1
2019-10-29 17:33:29,401 INFO [nova.compute.resource_tracker] Compute node record created for test_compute1:test_compute1 with uuid: 3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7
2019-10-29 17:33:29,466 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers?in_tree=3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7" status: 200 len: 26 microversion: 1.14
2019-10-29 17:33:29,479 INFO [placement.requestlog] 127.0.0.1 "POST /placement/resource_providers" status: 200 len: 836 microversion: 1.20
2019-10-29 17:33:29,480 INFO [nova.scheduler.client.report] [req-3f530fdb-07b5-42e6-b147-968b7df51b8d] Created resource provider record via placement API for resource provider with UUID 3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7 and name test_compute1.
2019-10-29 17:33:29,481 INFO [nova.virt.libvirt.host] kernel doesn't support AMD SEV
2019-10-29 17:33:29,506 INFO [placement.requestlog] 127.0.0.1 "PUT /placement/resource_providers/3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7/inventories" status: 200 len: 405 microversion: 1.26
2019-10-29 17:33:29,518 INFO [placement.requestlog] 127.0.0.1 "GET /placement/traits?name=in:COMPUTE_IMAGE_TYPE_ARI,COMPUTE_IMAGE_TYPE_AKI,COMPUTE_VOLUME_EXTEND,COMPUTE_IMAGE_TYPE_AMI,COMPUTE_IMAGE_TYPE_RAW,HW_CPU_X86_VMX,COMPUTE_NET_ATTACH_INTERFACE,COMPUTE_VOLUME_MULTI_ATTACH,HW_CPU_X86_AESNI,HW_CPU_HYPERTHREADING,COMPUTE_IMAGE_TYPE_QCOW2,COMPUTE_VOLUME_ATTACH_WITH_TAG,COMPUTE_NET_ATTACH_INTERFACE_WITH_TAG,COMPUTE_IMAGE_TYPE_ISO,COMPUTE_TRUSTED_CERTS,COMPUTE_DEVICE_TAGGING" status: 200 len: 447 microversion: 1.6
2019-10-29 17:33:29,543 INFO [placement.requestlog] 127.0.0.1 "PUT /placement/resource_providers/3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7/traits" status: 200 len: 482 microversion: 1.6
2019-10-29 17:33:29,556 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers?name=test_compute1" status: 200 len: 440 microversion: 1.0
2019-10-29 17:33:29,567 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7/inventories" status: 200 len: 405 microversion: 1.0
2019-10-29 17:33:29,578 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/cada8f90-3f3d-4f22-8312-770a0a818828/inventories" status: 200 len: 405 microversion: 1.0
2019-10-29 17:33:29,582 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1" status: 204 len: 0 microversion: - time: 0.001150
2019-10-29 17:33:29,737 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/flavors/detail" status: 200 len: 3007 microversion: 2.74 time: 0.150618
2019-10-29 17:33:29,774 INFO [nova.api.openstack.requestlog] 127.0.0.1 "POST /v2.1/6f70656e737461636b20342065766572/flavors" status: 200 len: 533 microversion: 2.74 time: 0.032605
2019-10-29 17:33:29,804 INFO [nova.api.openstack.requestlog] 127.0.0.1 "POST /v2.1/6f70656e737461636b20342065766572/flavors/11878059/os-extra_specs" status: 200 len: 47 microversion: 2.74 time: 0.026474
2019-10-29 17:33:29,811 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/images/detail" status: 200 len: 5635 microversion: 2.35 time: 0.003158
2019-10-29 17:33:29,827 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/flavors/11878059" status: 200 len: 561 microversion: 2.74 time: 0.011782
2019-10-29 17:33:29,945 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/servers/detail" status: 200 len: 15 microversion: 2.74 time: 0.114258
2019-10-29 17:33:30,178 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocation_candidates?in_tree=cada8f90-3f3d-4f22-8312-770a0a818828&limit=1000&required=%21COMPUTE_STATUS_DISABLED&resources=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CPCPU%3A2" status: 200 len: 53 microversion: 1.32
2019-10-29 17:33:30,255 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocation_candidates?in_tree=cada8f90-3f3d-4f22-8312-770a0a818828&limit=1000&required=%21COMPUTE_STATUS_DISABLED&resources=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CVCPU%3A2" status: 200 len: 893 microversion: 1.32
2019-10-29 17:33:30,298 INFO [nova.scheduler.host_manager] Host filter only checking host test_compute0 and node test_compute0
2019-10-29 17:33:30,304 INFO [nova.virt.hardware] Computed NUMA topology CPU pinning: usable pCPUs: [[0, 1], [2, 3]], vCPUs mapping: [(0, 0), (1, 1)]
2019-10-29 17:33:30,316 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocations/1dbe015f-9d41-40fe-a556-7fa17b9927e3" status: 200 len: 19 microversion: 1.28
2019-10-29 17:33:30,350 INFO [placement.requestlog] 127.0.0.1 "PUT /placement/allocations/1dbe015f-9d41-40fe-a556-7fa17b9927e3" status: 204 len: 0 microversion: 1.32
2019-10-29 17:33:30,353 INFO [nova.virt.hardware] Computed NUMA topology CPU pinning: usable pCPUs: [[0, 1], [2, 3]], vCPUs mapping: [(0, 0), (1, 1)]
2019-10-29 17:33:30,491 INFO [nova.api.openstack.requestlog] 127.0.0.1 "POST /v2.1/6f70656e737461636b20342065766572/servers" status: 202 len: 478 microversion: 2.74 time: 0.541017
2019-10-29 17:33:30,616 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocations/1dbe015f-9d41-40fe-a556-7fa17b9927e3" status: 200 len: 232 microversion: 1.28
2019-10-29 17:33:30,618 ERROR [nova.virt.libvirt.host] Hostname has changed from test_compute0 to test_compute1. A restart is required to take effect.
2019-10-29 17:33:30,644 INFO [nova.virt.hardware] Computed NUMA topology CPU pinning: usable pCPUs: [[0, 1], [2, 3]], vCPUs mapping: [(0, 0), (1, 1)]
2019-10-29 17:33:30,644 INFO [nova.compute.claims] Claim successful on node test_compute0
2019-10-29 17:33:30,722 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/servers/1dbe015f-9d41-40fe-a556-7fa17b9927e3" status: 200 len: 1944 microversion: 2.74 time: 0.227641
2019-10-29 17:33:31,300 INFO [nova.virt.libvirt.driver] Instance spawned successfully.
2019-10-29 17:33:31,303 INFO [nova.compute.manager] Took 0.17 seconds to spawn the instance on the hypervisor.
2019-10-29 17:33:31,450 INFO [nova.scheduler.host_manager] Received an update from an unknown host 'test_compute0'. Re-created its InstanceList.
2019-10-29 17:33:31,457 INFO [nova.compute.manager] Took 0.90 seconds to build instance.
2019-10-29 17:33:31,600 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocation_candidates?in_tree=cada8f90-3f3d-4f22-8312-770a0a818828&limit=1000&required=%21COMPUTE_STATUS_DISABLED&resources=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CPCPU%3A2" status: 200 len: 53 microversion: 1.32
2019-10-29 17:33:31,678 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocation_candidates?in_tree=cada8f90-3f3d-4f22-8312-770a0a818828&limit=1000&required=%21COMPUTE_STATUS_DISABLED&resources=DISK_GB%3A20%2CMEMORY_MB%3A2048%2CVCPU%3A2" status: 200 len: 897 microversion: 1.32
2019-10-29 17:33:31,708 INFO [nova.scheduler.host_manager] Host filter only checking host test_compute0 and node test_compute0
2019-10-29 17:33:31,720 INFO [nova.virt.hardware] Computed NUMA topology CPU pinning: usable pCPUs: [[2, 3]], vCPUs mapping: [(0, 2), (1, 3)]
2019-10-29 17:33:31,748 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocations/15246707-924c-4d39-b746-4499ec174f90" status: 200 len: 19 microversion: 1.28
2019-10-29 17:33:31,806 INFO [placement.requestlog] 127.0.0.1 "PUT /placement/allocations/15246707-924c-4d39-b746-4499ec174f90" status: 204 len: 0 microversion: 1.32
2019-10-29 17:33:31,811 INFO [nova.virt.hardware] Computed NUMA topology CPU pinning: usable pCPUs: [[2, 3]], vCPUs mapping: [(0, 2), (1, 3)]
2019-10-29 17:33:32,050 INFO [nova.api.openstack.requestlog] 127.0.0.1 "POST /v2.1/6f70656e737461636b20342065766572/servers" status: 202 len: 478 microversion: 2.74 time: 1.322930
2019-10-29 17:33:32,229 INFO [placement.requestlog] 127.0.0.1 "GET /placement/allocations/15246707-924c-4d39-b746-4499ec174f90" status: 200 len: 232 microversion: 1.28
2019-10-29 17:33:32,232 ERROR [nova.virt.libvirt.host] Hostname has changed from test_compute0 to test_compute1. A restart is required to take effect.
2019-10-29 17:33:32,248 INFO [nova.virt.hardware] Computed NUMA topology CPU pinning: usable pCPUs: [[2, 3]], vCPUs mapping: [(0, 2), (1, 3)]
2019-10-29 17:33:32,248 INFO [nova.compute.claims] Claim successful on node test_compute0
2019-10-29 17:33:32,312 INFO [nova.api.openstack.requestlog] 127.0.0.1 "GET /v2.1/6f70656e737461636b20342065766572/servers/15246707-924c-4d39-b746-4499ec174f90" status: 200 len: 1944 microversion: 2.74 time: 0.255498
2019-10-29 17:33:32,365 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/cada8f90-3f3d-4f22-8312-770a0a818828/inventories" status: 200 len: 405 microversion: 1.0
2019-10-29 17:33:32,438 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/cada8f90-3f3d-4f22-8312-770a0a818828/usages" status: 200 len: 92 microversion: 1.0
2019-10-29 17:33:32,489 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7/inventories" status: 200 len: 405 microversion: 1.0
2019-10-29 17:33:32,562 INFO [placement.requestlog] 127.0.0.1 "GET /placement/resource_providers/3b8479be-2d4d-4ccc-8f31-c7bc31e4b8e7/usages" status: 200 len: 88 microversion: 1.0
2019-10-29 17:33:33,382 INFO [nova.api.openstack.wsgi] HTTP exception thrown: Cannot 'migrate' instance 15246707-924c-4d39-b746-4499ec174f90 while it is in vm_state building
2019-10-29 17:33:33,385 INFO [nova.api.openstack.requestlog] 127.0.0.1 "POST /v2.1/6f70656e737461636b20342065766572/servers/15246707-924c-4d39-b746-4499ec174f90/action" status: 409 len: 147 microversion: 2.74 time: 0.158307
2019-10-29 17:33:33,397 INFO [nova.virt.libvirt.driver] Instance spawned successfully.
2019-10-29 17:33:33,398 INFO [nova.compute.manager] Took 0.15 seconds to spawn the instance on the hypervisor.
2019-10-29 17:33:33,538 INFO [nova.compute.manager] Took 1.42 seconds to build instance.
2019-10-29 17:33:33,630 INFO [oslo_db.sqlalchemy.provision] DROP ALL OBJECTS, BACKEND sqlite://
2019-10-29 17:33:33,659 INFO [oslo_db.sqlalchemy.provision] DISPOSE ENGINE Engine(sqlite://)
}}}

Traceback (most recent call last):
  File "nova/tests/functional/libvirt/test_numa_servers.py", line 625, in test_vcpu_to_pcpu_reshape
    self.api.post_server_action(server2['id'], post)
  File "nova/tests/functional/api/client.py", line 294, in post_server_action
    '/servers/%s/action' % server_id, data, **kwargs).body
  File "nova/tests/functional/api/client.py", line 235, in api_post
    return APIResponse(self.api_request(relative_uri, **kwargs))
  File "nova/tests/functional/api/client.py", line 213, in api_request
    response=response)
nova.tests.functional.api.client.OpenStackApiException: Unexpected status code: {"conflictingRequest": {"message": "Cannot 'migrate' instance 15246707-924c-4d39-b746-4499ec174f90 while it is in vm_state building", "code": 409}}

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22%7B%5C%5C%5C%22conflictingRequest%5C%5C%5C%22%3A%20%7B%5C%5C%5C%22message%5C%5C%5C%22%3A%20%5C%5C%5C%22Cannot%20'migrate'%20instance%5C%22%20AND%20message%3A%5C%22while%20it%20is%20in%20vm_state%20building%5C%5C%5C%22%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

4 hits in 7 days, check and gate, all failures. I wouldn't be surprised if this is a test that is using the ServersTestBase version of _wait_for_state_change which waits for the status to change from what is passed in rather than waiting for a terminal status.

Fix proposed to branch: master
Review: https://review.opendev.org/691941

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: Confirmed → In Progress

Reviewed: https://review.opendev.org/691941
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=56a391aafc1d75e33197423b0f1bb540c6dd2673
Submitter: Zuul
Branch: master

commit 56a391aafc1d75e33197423b0f1bb540c6dd2673
Author: Matt Riedemann <email address hidden>
Date: Tue Oct 29 14:29:52 2019 -0400

    Fix race in test_vcpu_to_pcpu_reshape

    This test uses the ServersTestBase._wait_for_state_change method
    which waits for the status to change *from* what is provided, so
    when creating a server and waiting for the status to change from
    ACTIVE makes _wait_for_state_change return immediately since the
    status starts as BUILD. This can lead to a failure when the test
    tries to migrate a server that is in BUILD status rather than
    ACTIVE status.

    This fixes the test by using this version of _wait_for_state_change
    correctly, not to be confused with the same method in
    InstanceHelperMixin which is more accurate (it waits for the
    terminal status of the server operation).

    Change-Id: I56ff050194d0eb465b8c41795fdea2a8b0d764d6
    Closes-Bug: #1850514

Changed in nova:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers