OVB failures on Vexx cloud: none of the requested nodes are available and match the resource class baremetal

Bug #1879379 reported by Sagi (Sergey) Shnaidman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Overcloud fails on vexxhost while passing on rdo-cloud:

passing rdo-cloud job: https://review.rdoproject.org/zuul/build/11a0c21f4bcb4d93ac28c097ea59e702
failing vexx job: https://review.rdoproject.org/zuul/build/3d429434e77648648d3ca51160642e07

https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

openstack.exceptions.ResourceFailure: Allocation 266e1270-9da0-4cd7-af95-f48b96d099b5 failed: Failed to process allocation 266e1270-9da0-4cd7-af95-f48b96d099b5: none of the requested nodes are available and match the resource class baremetal.

https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/var/log/containers/neutron/server.log.txt.gz

2020-05-18 16:45:17.311 GET failed (client error): The resource could not be found.
2020-05-18 16:45:17.312 192.168.24.1 "GET /v2.0/networks/ctlplane HTTP/1.1" status: 404 len: 292 time: 0.0605259
2020-05-18 16:45:17.314 GET failed (client error): The resource could not be found.
2020-05-18 16:45:17.315 192.168.24.1 "GET /v2.0/networks/ctlplane HTTP/1.1" status: 404 len: 292 time: 0.0609751
2020-05-18 16:45:17.343 GET failed (client error): The resource could not be found.
2020-05-18 16:45:17.344 192.168.24.1 "GET /v2.0/networks/ctlplane HTTP/1.1" status: 404 len: 292 time: 0.0470848
2020-05-18 16:45:17.422 38 DEBUG neutron.wsgi [-] (38) accepted ('192.168.24.1', 49262) server /usr/lib/python3.6/site-packages/eventlet/wsgi.py:985
2020-05-18 16:45:17.425 38 INFO neutron.wsgi [-] 192.168.24.1 "OPTIONS / HTTP/1.0" status: 200 len: 249 time: 0.0028582
2020-05-18 16:45:17.496 Attributes excluded by policy engine: ['vlan_transparent'] _exclude_attributes_by_policy /usr/lib/python3.6/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256
2020-05-18 16:45:17.498 192.168.24.1 "GET /v2.0/networks?name=ctlplane HTTP/1.1" status: 200 len: 925 time: 0.1783998
2020-05-18 16:45:17.509 Request body: {'port': {'network_id': 'd001679f-f792-4cf4-84f0-068e958a1f0b', 'name': 'overcloud-controller-1-ctlplane'}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:719
2020-05-18 16:45:17.515 Resources subnetpool,port have unlimited quota limit. It is not required to calculate headroom make_reservation /usr/lib/python3.6/site-packages/neutron/db/quota/driver.py:221
2020-05-18 16:45:17.517 Attributes excluded by policy engine: ['vlan_transparent'] _exclude_attributes_by_policy /usr/lib/python3.6/site-packages/neutron/pecan_wsgi/hooks/policy_enforcement.py:256
2020-05-18 16:45:17.518 192.168.24.1 "GET /v2.0/networks?name=ctlplane HTTP/1.1" status: 200 len: 925 time: 0.2029653
2020-05-18 16:45:17.531 Request body: {'port': {'network_id': 'd001679f-f792-4cf4-84f0-068e958a1f0b', 'name': 'overcloud-controller-0-ctlplane'}} prepare_request_body /usr/lib/python3.6/site-packages/neutron/api/v2/base.py:719
2020-05-18 16:45:17.535 Made reservation on behalf of 7d28c8c3e9df46dba4a84c716ddb1b33 for: {'port': 1} before /usr/lib/python3.6/site-packages/neutron/pecan_wsgi/hooks/quota_enforcement.py:55
2020-05-18 16:45:17.537 Notify callbacks ['neutron.plugins.ml2.plugin.SecurityGroupDbMixin._ensure_default_security_group_handler-1631485'] for port, before_create _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:193
2020-05-18 16:45:17.542 Resources subnetpool,port have unlimited quota limit. It is not required to calculate headroom make_reservation /usr/lib/python3.6/site-packages/neutron/db/quota/driver.py:221

https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/var/log/containers/ironic/ironic-conductor.log.txt.gz

2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils [req-f4babce5-8a94-4ecd-95e4-e313d8fc7227 48650596bc9948bd9822a9f73440a836 7d28c8c3e9df46dba4a84c716ddb1b33 - default default] Node 00ea8aec-c3bb-451a-b4c5-076c377fd83c failed deploy step {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} with unexpected error: Unexpected error while running command.
Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.part /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.converted
Exit code: -6
Stdout: ''
Stderr: 'qemu: qemu_thread_create: Resource temporarily unavailable\n': oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.part /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.converted
Exit code: -6
Stdout: ''
Stderr: 'qemu: qemu_thread_create: Resource temporarily unavailable\n'
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Traceback (most recent call last):
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/deployments.py", line 245, in do_next_deploy_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = interface.execute_deploy_step(task, step)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/agent_base.py", line 804, in execute_deploy_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils task, step)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 359, in execute_deploy_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils return self._execute_step(task, step)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/base.py", line 282, in _execute_step
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils return getattr(self, step['step'])(task)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/conductor/task_manager.py", line 148, in wrapper
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils return f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/iscsi_deploy.py", line 689, in deploy
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils deploy_utils.cache_instance_image(task.context, node)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/metrics.py", line 59, in wrapped
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = f(*args, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 858, in cache_instance_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/deploy_utils.py", line 235, in fetch_images
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils cache.fetch_image(href, path, ctx=ctx, force_raw=force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 135, in fetch_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils href, master_path, dest_path, ctx=ctx, force_raw=force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 162, in _download_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils _fetch(ctx, href, tmp_path, force_raw)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/image_cache.py", line 319, in _fetch
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils images.image_to_raw(image_href, path, path_tmp)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic/common/images.py", line 439, in image_to_raw
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils disk_utils.convert_image(path_tmp, staged, 'raw')
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/disk_utils.py", line 381, in convert_image
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils utils.execute(*cmd, run_as_root=run_as_root, prlimit=QEMU_IMG_LIMITS)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/ironic_lib/utils.py", line 99, in execute
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils result = processutils.execute(*cmd, **kwargs)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils File "/usr/lib/python3.6/site-packages/oslo_concurrency/processutils.py", line 424, in execute
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils cmd=sanitized_cmd)
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Command: /usr/bin/python3 -m oslo_concurrency.prlimit --as=1073741824 -- qemu-img convert -O raw /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.part /var/lib/ironic/master_images/tmpct0mu5cy/ee96d1d1-5c4d-4cf0-ba30-4b6db054c96d.converted
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Exit code: -6
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Stdout: ''
2020-05-18 16:39:16.010 8 ERROR ironic.conductor.utils Stderr: 'qemu: qemu_thread_create: Resource temporarily unavailable\n'

2020-05-18 16:44:07.367 8 WARNING ironic.drivers.modules.agent_base [req-c2afee82-3e1d-4d52-bd9b-c5a7137ad82a - - - - -] Failed to soft power off node 4b2b6f87-3c4b-4770-8686-29641ecbbe4c in at least 30 seconds. RetryError: RetryError[Attempts: 7, Value: power on]: retrying.RetryError: RetryError[Attempts: 7, Value: power on]
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base Traceback (most recent call last):
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base File "/usr/lib/python3.6/site-packages/ironic/drivers/modules/agent_base.py", line 1015, in reboot_and_finish_deploy
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base _wait_until_powered_off(task)
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base File "/usr/lib/python3.6/site-packages/retrying.py", line 68, in wrapped_f
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base return Retrying(*dargs, **dkw).call(f, *args, **kw)
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base File "/usr/lib/python3.6/site-packages/retrying.py", line 231, in call
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base raise RetryError(attempt)
2020-05-18 16:44:07.367 8 ERROR ironic.drivers.modules.agent_base retrying.RetryError: RetryError[Attempts: 7, Value: power on]

2020-05-18 16:45:17.190 8 ERROR ironic.conductor.allocations [-] Failed to process allocation f586e937-13f9-4905-9bb2-7908b14dcd1f: none of the requested nodes are available and match the resource class baremetal.: ironic.common.exception.AllocationFailed: Failed to process allocation f586e937-13f9-4905-9bb2-7908b14dcd1f: none of the requested nodes are available and match the resource class baremetal.

https://logserver.rdoproject.org/32/728132/5/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-vexxhost/3d42943/logs/undercloud/var/log/extra/baremetal_list.txt.gz

d45d73d4-1f22-4e90-ac4f-a1e2583d6908 | baremetal-728132-5-63091-0 | 524aae70-4dd4-413a-8c65-94129d0c09d7 | power on | active | False |
| c1bb7b4a-4e89-4ae3-a2f0-0660b9fd9962 | baremetal-728132-5-63091-1 | f8a16fdd-70cb-4d37-b17b-a4fc0a917e19 | power on | active | False |
| 00ea8aec-c3bb-451a-b4c5-076c377fd83c | baremetal-728132-5-63091-2 | None | power off | deploy failed | False |
| 4b2b6f87-3c4b-4770-8686-29641ecbbe4c | baremetal-728132-5-63091-3 | 5744145e-6b9b-4965-96bb-410285a08a1b | power on | active | False |

wes hayutin (weshayutin)
tags: added: promotion-blocker
Revision history for this message
Javier Peña (jpena-c) wrote :

Is this happening regularly, or just a one-time event?

Looking at the logs, the 'qemu: qemu_thread_create: Resource temporarily unavailable\n' error points to a resource issue on one hypervisor. If this is a recurring issue, we may have to reduce the number of instances being spawned on Vexxhost, since we may be hitting some limits.

wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Revision history for this message
Alan Pevec (apevec) wrote :

This is qemu-img convert failing in undercloud Ironic driver, so unlikely hosting cloud infra quota.

Revision history for this message
Bob Fournier (bfournie) wrote :

There is a patch to Ussuri that should help - https://review.opendev.org/#/c/727678/. Its currently failing an upstream job.

Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.