Overcloud fails on vexxhost while passing on rdo-cloud:
passing rdo-cloud job: https://review.rdoproject.org/zuul/build/11a0c21f4bcb4d93ac28c097ea59e702
failing vexx job: https://review.rdoproject.org/zuul/build/3d429434e77648648d3ca51160642e07
https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
openstack.exceptions.ResourceFailure: Allocation 266e1270-9da0-4cd7-af95-f48b96d099b5 failed: Failed to process allocation 266e1270-9da0-4cd7-af95-f48b96d099b5: none of the requested nodes are available and match the resource class baremetal.
https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/var/log/containers/neutron/server.log.txt.gz
2020-05-18 16:45:17.311 GET failed (client error): The resource could not be found.
2020-05-18 16:45:17.312 192.168.24.1 "GET /v2.0/networks/ctlplane HTTP/1.1" status: 404 len: 292 time: 0.0605259
2020-05-18 16:45:17.314 GET failed (client error): The resource could not be found.
2020-05-18 16:45:17.315 192.168.24.1 "GET /v2.0/networks/ctlplane HTTP/1.1" status: 404 len: 292 time: 0.0609751
2020-05-18 16:45:17.343 GET failed (client error): The resource could not be found.
2020-05-18 16:45:17.344 192.168.24.1 "GET /v2.0/networks/ctlplane HTTP/1.1" status: 404 len: 292 time: 0.0470848
2020-05-18 16:45:17.422 38 DEBUG neutron.wsgi [-] (38) accepted ('192.168.24.1', 49262) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:985
2020-05-18 16:45:17.425 38 INFO neutron.wsgi [-] 192.168.24.1 "OPTIONS / HTTP/1.0" status: 200 len: 249 time: 0.0028582
2020-05-18 16:45:17.496 Attributes excluded by policy engine: ['vlan_transparent'] _exclude_attributes_by_policy /usr/lib/python3.6/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256
2020-05-18 16:45:17.498 192.168.24.1 "GET /v2.0/networks?name=ctlplane HTTP/1.1" status: 200 len: 925 time: 0.1783998
2020-05-18 16:45:17.509 Request body: {'port': {'network_id': 'd001679f-f792-4cf4-84f0-068e958a1f0b', 'name': 'overcloud-controller-1-ctlplane'}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:719
2020-05-18 16:45:17.515 Resources subnetpool,port have unlimited quota limit. It is not required to calculate headroom make_reservation /usr/lib/python3.6/site-packages/neutron/db/quota/driver.py:221
2020-05-18 16:45:17.517 Attributes excluded by policy engine: ['vlan_transparent'] _exclude_attributes_by_policy /usr/lib/python3.6/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256
2020-05-18 16:45:17.518 192.168.24.1 "GET /v2.0/networks?name=ctlplane HTTP/1.1" status: 200 len: 925 time: 0.2029653
2020-05-18 16:45:17.531 Request body: {'port': {'network_id': 'd001679f-f792-4cf4-84f0-068e958a1f0b', 'name': 'overcloud-controller-0-ctlplane'}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:719
2020-05-18 16:45:17.535 Made reservation on behalf of 7d28c8c3e9df46dba4a84c716ddb1b33 for: {'port': 1} before /usr/lib/python3.6/site-packages/neutron/pecan_wsgi/hooks/quota_enforcement.py:55
2020-05-18 16:45:17.537 Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-1631485'] for port, before_create _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193
2020-05-18 16:45:17.542 Resources subnetpool,port have unlimited quota limit. It is not required to calculate headroom make_reservation /usr/lib/python3.6/site-packages/neutron/db/quota/driver.py:221
https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/var/log/containers/ironic/ironic-conductor.log.txt.gz
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils [req-f4babce5-8a94-4ecd-95e4-e313d8fc7227 48650596bc9948bd9822a9f73440a836 7d28c8c3e9df46dba4a84c716ddb1b33 - default default] Node 00ea8aec-c3bb-451a-b4c5-076c377fd83c failed deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: Unexpected error while running command.
Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.part /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.converted
Exit code: -6
Stdout: ''
Stderr: 'qemu: qemu_thread_create: Resource temporarily unavailable\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.part /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.converted
Exit code: -6
Stdout: ''
Stderr: 'qemu: qemu_thread_create: Resource temporarily unavailable\n'
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Traceback (most recent call last):
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/deployments.py", line 245, in do_next_deploy_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = interface.execute_deploy_step(task, step)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/agent_base.py", line 804, in execute_deploy_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils task, step)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 359, in execute_deploy_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils return self._execute_step(task, step)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 282, in _execute_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils return getattr(self, step['step'])(task)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils return f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 689, in deploy
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils deploy_utils.cache_instance_image(task.context, node)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 858, in cache_instance_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 235, in fetch_images
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 135, in fetch_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils href, master_path, dest_path, ctx=ctx, force_raw=force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 162, in _download_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils _fetch(ctx, href, tmp_path, force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 319, in _fetch
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils images.image_to_raw(image_href, path, path_tmp)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 439, in image_to_raw
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils disk_utils.convert_image(path_tmp, staged, 'raw')
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 381, in convert_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils utils.execute(*cmd, run_as_root=run_as_root, prlimit=QEMU_IMG_LIMITS)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/utils.py", line 99, in execute
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = processutils.execute(*cmd, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils cmd=sanitized_cmd)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.part /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.converted
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Exit code: -6
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Stdout: ''
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Stderr: 'qemu: qemu_thread_create: Resource temporarily unavailable\n'
2020-05-18 16:44:07.367 8 WARNING ironic.drivers.modules.agent_base [req-c2afee82-3e1d-4d52-bd9b-c5a7137ad82a - - - - -] Failed to soft power off node 4b2b6f87-3c4b-4770-8686-29641ecbbe4c in at least 30 seconds. RetryError: RetryError[Attempts: 7, Value: power on]: retrying.RetryError: RetryError[Attempts: 7, Value: power on]
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base Traceback (most recent call last):
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/agent_base.py", line 1015, in reboot_and_finish_deploy
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base _wait_until_powered_off(task)
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base return Retrying(*dargs, **dkw).call(f, *args, **kw)
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base File "/usr/lib/python3.6/site-packages/retrying.py", line 231, in call
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base raise RetryError(attempt)
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base retrying.RetryError: RetryError[Attempts: 7, Value: power on]
2020-05-18 16:45:17.190 8 ERROR ironic.conductor.allocations [-] Failed to process allocation f586e937-13f9-4905-9bb2-7908b14dcd1f: none of the requested nodes are available and match the resource class baremetal.: ironic.common.exception.AllocationFailed: Failed to process allocation f586e937-13f9-4905-9bb2-7908b14dcd1f: none of the requested nodes are available and match the resource class baremetal.
https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/var/log/extra/baremetal_list.txt.gz
d45d73d4-1f22-4e90-ac4f-a1e2583d6908 | baremetal-728132-5-63091-0 | 524aae70-4dd4-413a-8c65-94129d0c09d7 | power on | active | False |
| c1bb7b4a-4e89-4ae3-a2f0-0660b9fd9962 | baremetal-728132-5-63091-1 | f8a16fdd-70cb-4d37-b17b-a4fc0a917e19 | power on | active | False |
| 00ea8aec-c3bb-451a-b4c5-076c377fd83c | baremetal-728132-5-63091-2 | None | power off | deploy failed | False |
| 4b2b6f87-3c4b-4770-8686-29641ecbbe4c | baremetal-728132-5-63091-3 | 5744145e-6b9b-4965-96bb-410285a08a1b | power on | active | False |
Is this happening regularly, or just a one-time event?
Looking at the logs, the 'qemu: qemu_thread_create: Resource temporarily unavailable\n' error points to a resource issue on one hypervisor. If this is a recurring issue, we may have to reduce the number of instances being spawned on Vexxhost, since we may be hitting some limits.