Bug #1950916 “qemu 6.1 in CentOS Stream 8/9 regression in q35, u...” : Bugs : tripleo

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-14:

#1

Download full text (4.0 KiB)

logs/undercloud/var/log/extra/errors.txt shows:

2021-11-14 05:26:11.097 ERROR /var/log/containers/nova/nova-api.log: 11 ERROR oslo.messaging._drivers.impl_rabbit [-] [03e67fe0-daf8-4ac3-8bda-beae53b27e78] AMQP server on standalone.ctlplane.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova Traceback (most recent call last):
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/bin/nova-conductor", line 10, in <module>
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova sys.exit(main())
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/cmd/conductor.py", line 46, in main
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova topic=rpcapi.RPC_TOPIC)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 256, in create
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova periodic_interval_max=periodic_interval_max)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/service.py", line 116, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova self.manager = manager_class(host=self.host, *args, **kwargs)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 119, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova self.compute_task_mgr = ComputeTaskManager()
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 244, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova self.report_client = report.SchedulerReportClient()
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 188, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova self._client = self._create_client()
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 231, in _create_client
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova client = self._adapter or utils.get_sdk_adapter('placement')
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova File "/usr/lib/python3.6/site-packages/nova/utils.py", line 984, in get_sdk_adapter
2021-11-14 04:48:11.612 ERROR /var/log/con...

logs/undercloud/var/log/extra/errors.txt shows:

2021-11-14 05:26:11.097 ERROR /var/log/containers/nova/nova-api.log: 11 ERROR oslo.messaging._drivers.impl_rabbit [-] [03e67fe0-daf8-4ac3-8bda-beae53b27e78] AMQP server on standalone.ctlplane.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova Traceback (most recent call last):
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/bin/nova-conductor", line 10, in <module>
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     sys.exit(main())
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/cmd/conductor.py", line 46, in main
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     topic=rpcapi.RPC_TOPIC)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/service.py", line 256, in create
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     periodic_interval_max=periodic_interval_max)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/service.py", line 116, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     self.manager = manager_class(host=self.host, *args, **kwargs)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 119, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     self.compute_task_mgr = ComputeTaskManager()
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/conductor/manager.py", line 244, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     self.report_client = report.SchedulerReportClient()
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 188, in __init__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     self._client = self._create_client()
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/scheduler/client/report.py", line 231, in _create_client
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     client = self._adapter or utils.get_sdk_adapter('placement')
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/nova/utils.py", line 984, in get_sdk_adapter
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     return getattr(conn, service_type)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/openstack/service_description.py", line 87, in __get__
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     proxy = self._make_proxy(instance)
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova   File "/usr/lib/python3.6/site-packages/openstack/service_description.py", line 271, in _make_proxy
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova     region_name=region_name))
2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova openstack.exceptions.NotSupported: The placement service for 192.168.24.3:regionOne exists but does not have any supported versions.

Changed in tripleo:
milestone:	none → yoga-1
importance:	Undecided → Critical
status:	New → Triaged
tags:	added: ci promotion-blocker

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-14:

#2

Extracting the error:

2021-11-14 04:48:11.612 ERROR /var/log/containers/nova/nova-conductor.log: 7 ERROR nova openstack.exceptions.NotSupported: The placement service for 192.168.24.3:regionOne exists but does not have any supported versions.

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-14:

#3

https://trunk.rdoproject.org/centos8-master/report.html

dlrn openstack/neuron is also failing from 11/13

Ronelle Landy (rlandy) on 2021-11-14

summary:

- master check/gate/periodic tests are failing tempest - connectivity
- issues
+ master and wallaby check/gate/periodic tests are failing tempest -
+ connectivity issues

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-14: Re: master and wallaby check/gate/periodic tests are failing tempest - connectivity issues

#4

https://logserver.rdoproject.org/56/36256/18/check/periodic-tripleo-ci-centos-8-scenario001-standalone-wallaby/beef770/logs/undercloud/var/log/extra/podman/containers/neutron_api/log/neutron/server.log.txt.gz

2021-11-14 20:09:03.802 29 ERROR neutron_lib.callbacks.manager [req-a0e7bce2-e36b-4eef-bfac-eb0deff38b82 368880513bcf4433b237d4a88fb84ffb 5be0afd7bbfb4cfdaaeccca5d14e055c - default default] Error during notification for neutron.services.logapi.logging_plugin.LoggingPlugin._clean_security_group_logs-72805 security_group, after_delete: TypeError: _clean_security_group_logs() got an unexpected keyword argument 'context'
2021-11-14 20:09:03.802 29 ERROR neutron_lib.callbacks.manager Traceback (most recent call last):
2021-11-14 20:09:03.802 29 ERROR neutron_lib.callbacks.manager File "/usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py", line 197, in _notify_loop
2021-11-14 20:09:03.802 29 ERROR neutron_lib.callbacks.manager callback(resource, event, trigger, **kwargs)
2021-11-14 20:09:03.802 29 ERROR neutron_lib.callbacks.manager TypeError: _clean_security_group_logs() got an unexpected keyword argument 'context'
2021-11-14 20:09:03.802 29 ERROR neutron_lib.callbacks.manager

Revision history for this message

Amol Kahat (amolkahat) wrote on 2021-11-15:

#5

Download full text (8.0 KiB)

Tempest Test found router:
1-11-15 01:57:27,120 158054 INFO [tempest.lib.common.rest_client] Request (TestNetworkBasicOps:test_network_basic_ops): 201 POST http://192.168.24.3:9696/v2.0/routers 3.188s
2021-11-15 01:57:27,121 158054 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: {"router": {"name": "tempest-TestNetworkBasicOps-router-1232673168", "admin_state_up": true, "project_id": "08157c21409d440dba084ccabed97461", "external_gateway_info": {"network_id": "aa047c87-117f-404e-98ff-5ccb675ff93a"}}}
    Response - Headers: {'content-type': 'application/json', 'content-length': '690', 'x-openstack-request-id': 'req-bf5c0242-4cea-4480-9dc7-f75cefbb98e2', 'date': 'Mon, 15 Nov 2021 01:57:27 GMT', 'connection': 'close', 'status': '201', 'content-location': 'http://192.168.24.3:9696/v2.0/routers'}
        Body: b'{"router": {"id": "93922895-a67b-4e2a-abee-ddff06583470", "name": "tempest-TestNetworkBasicOps-router-1232673168", "tenant_id": "08157c21409d440dba084ccabed97461", "admin_state_up": true, "status": "ACTIVE", "external_gateway_info": {"network_id": "aa047c87-117f-404e-98ff-5ccb675ff93a", "external_fixed_ips": [{"subnet_id": "34588ca7-5b76-4dee-9909-3926138ce219", "ip_address": "192.168.24.181"}], "enable_snat": true}, "description": "", "availability_zones": [], "availability_zone_hints": [], "routes": [], "flavor_id": null, "tags": [], "created_at": "2021-11-15T01:57:24Z", "updated_at": "2021-11-15T01:57:25Z", "revision_number": 3, "project_id": "08157c21409d440dba084ccabed97461"}}'
2021-11-15 01:57:27,480 158054 INFO [tempest.lib.common.rest_client] Request (TestNetworkBasicOps:test_network_basic_ops): 200 GET http://192.168.24.3:9696/v2.0/subnets?project_id=08157c21409d440dba084ccabed97461&cidr=10.100.0.0%2F28 0.358s
2021-11-15 01:57:27,481 158054 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}

- https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario007-standalone-master/8664fc6/logs/undercloud/var/log/tempest/stestr_results.html.gz

At the same time on neutron side, router is missing:
2021-11-15 01:57:26.855 129917 DEBUG neutron.agent.l3.agent [req-bf5c0242-4cea-4480-9dc7-f75cefbb98e2 c7a8f9be62a04c3d8a1407a13c46b26a 08157c21409d440dba084ccabed97461 - - -] Got routers updated notification :['93922895-a67b-4e2a-abee-ddff06583470'] routers_updated /usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py:582
2021-11-15 01:57:26.857 129917 INFO neutron.agent.l3.agent [-] Starting processing update 93922895-a67b-4e2a-abee-ddff06583470, action 3, priority 1, update_id bd1a5495-227d-45a6-91d8-303df87f84fb. Wait time elapsed: 0.001
2021-11-15 01:57:26.858 129917 INFO neutron.agent.l3.agent [-] Starting router update for 93922895-a67b-4e2a-abee-ddff06583470, action 3, priority 1, update_id bd1a5495-227d-45a6-91d8-303df87f84fb. Wait time elapsed: 0.001
2021-11-15 01:57:26.858 129917 DEBUG neutron.common.utils [-] Time...

Tempest Test found router:
1-11-15 01:57:27,120 158054 INFO     [tempest.lib.common.rest_client] Request (TestNetworkBasicOps:test_network_basic_ops): 201 POST http://192.168.24.3:9696/v2.0/routers 3.188s
2021-11-15 01:57:27,121 158054 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
        Body: {"router": {"name": "tempest-TestNetworkBasicOps-router-1232673168", "admin_state_up": true, "project_id": "08157c21409d440dba084ccabed97461", "external_gateway_info": {"network_id": "aa047c87-117f-404e-98ff-5ccb675ff93a"}}}
    Response - Headers: {'content-type': 'application/json', 'content-length': '690', 'x-openstack-request-id': 'req-bf5c0242-4cea-4480-9dc7-f75cefbb98e2', 'date': 'Mon, 15 Nov 2021 01:57:27 GMT', 'connection': 'close', 'status': '201', 'content-location': 'http://192.168.24.3:9696/v2.0/routers'}
        Body: b'{"router": {"id": "93922895-a67b-4e2a-abee-ddff06583470", "name": "tempest-TestNetworkBasicOps-router-1232673168", "tenant_id": "08157c21409d440dba084ccabed97461", "admin_state_up": true, "status": "ACTIVE", "external_gateway_info": {"network_id": "aa047c87-117f-404e-98ff-5ccb675ff93a", "external_fixed_ips": [{"subnet_id": "34588ca7-5b76-4dee-9909-3926138ce219", "ip_address": "192.168.24.181"}], "enable_snat": true}, "description": "", "availability_zones": [], "availability_zone_hints": [], "routes": [], "flavor_id": null, "tags": [], "created_at": "2021-11-15T01:57:24Z", "updated_at": "2021-11-15T01:57:25Z", "revision_number": 3, "project_id": "08157c21409d440dba084ccabed97461"}}'
2021-11-15 01:57:27,480 158054 INFO     [tempest.lib.common.rest_client] Request (TestNetworkBasicOps:test_network_basic_ops): 200 GET http://192.168.24.3:9696/v2.0/subnets?project_id=08157c21409d440dba084ccabed97461&cidr=10.100.0.0%2F28 0.358s
2021-11-15 01:57:27,481 158054 DEBUG    [tempest.lib.common.rest_client] Request - Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}

- https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario007-standalone-master/8664fc6/logs/undercloud/var/log/tempest/stestr_results.html.gz

At the same time on neutron side, router is missing:
2021-11-15 01:57:26.855 129917 DEBUG neutron.agent.l3.agent [req-bf5c0242-4cea-4480-9dc7-f75cefbb98e2 c7a8f9be62a04c3d8a1407a13c46b26a 08157c21409d440dba084ccabed97461 - - -] Got routers updated notification :['93922895-a67b-4e2a-abee-ddff06583470'] routers_updated /usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py:582
2021-11-15 01:57:26.857 129917 INFO neutron.agent.l3.agent [-] Starting processing update 93922895-a67b-4e2a-abee-ddff06583470, action 3, priority 1, update_id bd1a5495-227d-45a6-91d8-303df87f84fb. Wait time elapsed: 0.001
2021-11-15 01:57:26.858 129917 INFO neutron.agent.l3.agent [-] Starting router update for 93922895-a67b-4e2a-abee-ddff06583470, action 3, priority 1, update_id bd1a5495-227d-45a6-91d8-303df87f84fb. Wait time elapsed: 0.001
2021-11-15 01:57:26.858 129917 DEBUG neutron.common.utils [-] Time-cost: call ae4c8533-8ad5-4e1c-9e11-975a01eb1a69 function get_routers start wrapper /usr/lib/python3.6/site-packages/neutron/common/utils.py:927
2021-11-15 01:57:27.096 129917 DEBUG neutron.agent.l3.agent [req-bf5c0242-4cea-4480-9dc7-f75cefbb98e2 c7a8f9be62a04c3d8a1407a13c46b26a 08157c21409d440dba084ccabed97461 - - -] Got routers updated notification :['93922895-a67b-4e2a-abee-ddff06583470'] routers_updated /usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py:582
2021-11-15 01:57:28.731 129917 DEBUG neutron.common.utils [-] Time-cost: call ae4c8533-8ad5-4e1c-9e11-975a01eb1a69 function get_routers took 1.874s seconds to run wrapper /usr/lib/python3.6/site-packages/oslo_utils/timeutils.py:388
2021-11-15 01:57:28.732 129917 DEBUG neutron.agent.l3.agent [-] Router 93922895-a67b-4e2a-abee-ddff06583470 info not in cache, will do the router add action. _process_router_if_compatible /usr/lib/python3.6/site-packages/neutron/agent/l3/agent.py:637
2021-11-15 01:57:28.734 129917 DEBUG neutron_lib.callbacks.manager [-] Publish callbacks ['neutron.agent.linux.pd.add_router-8791547810197'] for router, before_create _notify_loop /usr/lib/python3.6/site-packages/neutron_lib/callbacks/manager.py:176
2021-11-15 01:57:28.734 129917 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" acquired by "neutron.agent.linux.pd.add_router" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:390
2021-11-15 01:57:28.735 129917 DEBUG oslo_concurrency.lockutils [-] Lock "l3-agent-pd" "released" by "neutron.agent.linux.pd.add_router" :: held 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:405
2021-11-15 01:57:28.738 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[0754215c-2f44-4e21-a953-836a76119faf]: (4, False) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:28.768 131158 DEBUG oslo.privsep.daemon [-] privsep: reply[83291ba2-3422-43fc-9754-27c24e4afbeb]: (4, None) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:28.807 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[6a3b47c1-0bb0-4aea-bbb3-a1b72ca0efc1]: (4, ('net.ipv4.conf.all.promote_secondaries = 1\n', '', 0)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:28.851 152270 DEBUG oslo.privsep.daemon [-] privsep: reply[a6a6db33-803e-469c-9ad2-5bc23ca630af]: (4, ({'header': {'length': 36, 'type': 2, 'flags': 256, 'sequence_number': 255, 'pid': 159461, 'error': None, 'target': 'qrouter-93922895-a67b-4e2a-abee-ddff06583470', 'stats': Stats(qsize=0, delta=0, delay=0)}, 'event': 'NLMSG_ERROR'},)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:28.900 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[e251aed6-a9f8-4cb8-9867-225dc0c26bd0]: (4, ('net.ipv4.ip_forward = 1\n', '', 0)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:28.931 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[082b3c8a-0a07-4d80-b340-630fc3d3f198]: (4, ('net.ipv4.conf.all.arp_ignore = 1\n', '', 0)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:28.979 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[0cf05d48-6343-4606-a18d-f2820d9c4248]: (4, ('net.ipv4.conf.all.arp_announce = 2\n', '', 0)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:29.008 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[d404593d-ccdf-456b-abab-c6f4c23dca30]: (4, ('net.ipv6.conf.all.forwarding = 0\n', '', 0)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:29.047 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[6faab3ea-5cd3-4acd-bf24-8fed9b30b669]: (4, ('net.ipv4.ip_nonlocal_bind = 0\n', '', 0)) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503
2021-11-15 01:57:29.051 129917 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/93922895-a67b-4e2a-abee-ddff06583470.pid.keepalived; Error: [Errno 2] No such file or directory: '/var/lib/neutron/ha_confs/93922895-a67b-4e2a-abee-ddff06583470.pid.keepalived' get_value_from_file /usr/lib/python3.6/site-packages/neutron/agent/linux/utils.py:253
2021-11-15 01:57:29.070 151911 DEBUG neutron.privileged.agent.linux.ip_lib [-] Interface ha-345b8eb7-7f not found in namespace qrouter-93922895-a67b-4e2a-abee-ddff06583470 get_link_id /usr/lib/python3.6/site-packages/neutron/privileged/agent/linux/ip_lib.py:203
2021-11-15 01:57:29.071 151911 DEBUG oslo.privsep.daemon [-] privsep: reply[bfad6561-a712-4ffa-9659-9b4c9981cbb2]: (4, False) _call_back /usr/lib/python3.6/site-packages/oslo_privsep/daemon.py:503

- https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario007-standalone-master/8664fc6/logs/undercloud/var/log/containers/neutron/l3-agent.log.txt.gz

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-15:

#6

slaweq> but I don't think that this is an issue really
<slaweq> it seems for me that vms created there aren't started
<slaweq> if You check console log of any of them VMs there, it is always empty
<slaweq> I booted one VM manually and its console log is empty also
<slaweq> I think that someone from the compute team should take a look into that first

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-15:

#7

This failure started on 11/13 - impacting only wallaby and master - train, ussuri and victoria are still running - as are all the component lines

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-15:

#8

Compute container logs:

https://91ddfc31996f32105a65-e86ceb7af3919e0f43bac254438f726c.ssl.cf5.rackcdn.com/817548/1/gate/tripleo-ci-centos-8-standalone/822b8ee/logs/undercloud/var/log/containers/nova/index.html

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-15:

#9

https://91ddfc31996f32105a65-e86ceb7af3919e0f43bac254438f726c.ssl.cf5.rackcdn.com/817548/1/gate/tripleo-ci-centos-8-standalone/822b8ee/logs/undercloud/var/log/containers/nova/nova-compute.log

shows:

2021-11-15 11:59:02.487 7 WARNING os_brick.initiator.connectors.nvmeof [req-8e3ef8ff-57bc-462f-99b0-3887fc839c88 72284933f018441c9ac44e33f762c911 fa49b82a9af1415ca9bf83c9e8baa080 - default default] Process execution error in _get_host_uuid: Unexpected error while running command.
Command: blkid overlay -s UUID -o value
Exit code: 2

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-15:

#10

libvirt logs:

http://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_3c4/periodic/opendev.org/openstack/tripleo-ci/master/tripleo-ci-centos-8-standalone/3c4b9f5/logs/undercloud/var/log/containers/libvirt/index.html

Revision history for this message

Artom Lifshitz (notartom) wrote on 2021-11-15:

#11

Download full text (7.5 KiB)

Using zuul@38.102.83.147 as an investigation environment, using the test_snapshot_pattern failed test with the following traceback:

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_snapshot_pattern.py", line 63, in test_snapshot_pattern
        server=server)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 1068, in create_timestamp
        username=username)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 726, in get_remote_client
        linux_client.validate_authentication()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
        return function(self, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 114, in validate_authentication
        self.ssh_client.test_connection_auth()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 216, in test_connection_auth
        connection = self._get_ssh_connection()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 128, in _get_ssh_connection
        password=self.password)
    tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.152 via SSH timed out.
    User: cirros, Password: None

That test's boot request was:

2021-11-15 12:02:30,178 315743 INFO [tempest.lib.common.rest_client] Request (TestSnapshotPattern:test_snapshot_pattern): 202 POST http://192.168.24.3:8774/v2.1/servers 2.413s

With request ID req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb:

Response - Headers: {'date': 'Mon, 15 Nov 2021 12:02:27 GMT', 'server': 'Apache', 'content-length': '400', 'location': 'http://192.168.24.3:8774/v2.1/servers/e0021d5d-daa0-40bb-9be6-f2d203dc222e', 'openstack-api-version': 'compute 2.1', 'x-openstack-nova-api-version': '2.1', 'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', 'x-openstack-request-id': 'req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb', 'x-compute-request-id': 'req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb', 'connection': 'close', 'content-type': 'application/json', 'status': '202', 'content-location': 'http://192.168.24.3:8774/v2.1/servers'}

The instance ID is e0021d5d-daa0-40bb-9be6-f2d203dc222e:

2021-11-15 12:02:31.607 7 INFO nova.compute.claims [req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb 091e701229054f2793ed6f17dc4cf675 8ee1e30c6e7f4fa0a346af8667b600db - default default] [instance: e0021d5d-daa0-40bb-9be6-f2d203dc222e] Claim successful on node standalone.localdomain

As far as nova-compute knows, it built just fine:

2021-11-15 12:02:45.126 7 INFO nova.compute.manager [req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb 091e701229054f2793ed6f17dc4cf675 8ee1e30c6e7f4fa0a346af8667b600db - default default] [instance:
e0021d5d-daa0-40bb-9be6-f2d203dc222e] Took 13.59 seconds to build instance.

Nothing suspicious in the instance qemu log at qemu/instance-00000001.log either:

[...]
char device redirected to /dev/pts/0 (label charserial...

Using zuul@38.102.83.147 as an investigation environment, using the test_snapshot_pattern failed test with the following traceback:

Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
        return f(*func_args, **func_kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/test_snapshot_pattern.py", line 63, in test_snapshot_pattern
        server=server)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 1068, in create_timestamp
        username=username)
      File "/usr/lib/python3.6/site-packages/tempest/scenario/manager.py", line 726, in get_remote_client
        linux_client.validate_authentication()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
        return function(self, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 114, in validate_authentication
        self.ssh_client.test_connection_auth()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 216, in test_connection_auth
        connection = self._get_ssh_connection()
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/ssh.py", line 128, in _get_ssh_connection
        password=self.password)
    tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.152 via SSH timed out.
    User: cirros, Password: None

That test's boot request was:

2021-11-15 12:02:30,178 315743 INFO     [tempest.lib.common.rest_client] Request (TestSnapshotPattern:test_snapshot_pattern): 202 POST http://192.168.24.3:8774/v2.1/servers 2.413s

With request ID req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb:

Response - Headers: {'date': 'Mon, 15 Nov 2021 12:02:27 GMT', 'server': 'Apache', 'content-length': '400', 'location': 'http://192.168.24.3:8774/v2.1/servers/e0021d5d-daa0-40bb-9be6-f2d203dc222e', 'openstack-api-version': 'compute 2.1', 'x-openstack-nova-api-version': '2.1', 'vary': 'OpenStack-API-Version,X-OpenStack-Nova-API-Version', 'x-openstack-request-id': 'req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb', 'x-compute-request-id': 'req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb', 'connection': 'close', 'content-type': 'application/json', 'status': '202', 'content-location': 'http://192.168.24.3:8774/v2.1/servers'}

The instance ID is e0021d5d-daa0-40bb-9be6-f2d203dc222e:

2021-11-15 12:02:31.607 7 INFO nova.compute.claims [req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb 091e701229054f2793ed6f17dc4cf675 8ee1e30c6e7f4fa0a346af8667b600db - default default] [instance: e0021d5d-daa0-40bb-9be6-f2d203dc222e] Claim successful on node standalone.localdomain

As far as nova-compute knows, it built just fine:

2021-11-15 12:02:45.126 7 INFO nova.compute.manager [req-301843dd-1ad1-4548-a5d4-d1f10f57b5bb 091e701229054f2793ed6f17dc4cf675 8ee1e30c6e7f4fa0a346af8667b600db - default default] [instance: 
e0021d5d-daa0-40bb-9be6-f2d203dc222e] Took 13.59 seconds to build instance.

Nothing suspicious in the instance qemu log at qemu/instance-00000001.log either:

[...]
char device redirected to /dev/pts/0 (label charserial0)
2021-11-15T12:02:44.362617Z qemu-kvm: -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead
2021-11-15T12:08:08.225959Z qemu-kvm: terminating on signal 15 from pid 205746 (/usr/sbin/virtqemud)
2021-11-15 12:08:08.629+0000: shutting down, reason=destroyed

Interestingly, there is no console.log file for our instance. There are however, console.log files for 2 other instances:

/var/lib/nova/instances/9f27d8ce-045d-4bc8-a0fd-735b8952b9c9/console.log
/var/lib/nova/instances/11f894e3-a732-47d9-b5a3-f4762417a958/console.log

Neither of which appear in the tempest_run.log file, presumably because those tests passed?

Using 9f27d8ce-045d-4bc8-a0fd-735b8952b9c9 as a "passing control" of sorts, the only apparent difference in its qemu log file qemu/instance-0000000d.log is the lack of desctruction, it ends with:

char device redirected to /dev/pts/7 (label charserial0)
2021-11-15T14:03:39.005096Z qemu-kvm: -device cirrus-vga,id=video0,bus=pcie.0,addr=0x1: warning: 'cirrus-vga' is deprecated, please use a different VGA card instead

Its virtlogd entires looks identical to the ones for the failing instances:

Good instance:

021-11-15 13:32:35.057+0000: 197248: debug : virRotatingFileWriterEntryNew:104 : Opening /var/log/libvirt/qemu/instance-0000000d.log mode=0600
2021-11-15 13:32:35.057+0000: 197248: debug : virThreadJobClear:122 : Thread 197248 finished job virLogManagerProtocolDispatchDomainOpenLogFile with ret=1
2021-11-15 13:32:35.087+0000: 197248: debug : virThreadJobSet:97 : Thread 197248 is now running job virLogManagerProtocolDispatchDomainOpenLogFile
2021-11-15 13:32:35.087+0000: 197248: debug : virLogManagerProtocolDispatchDomainOpenLogFileHelper:76 : server=0x55d60a698880 client=0x55d60a6a4260 msg=0x55d60a6a8690 rerr=0x7ffe699dad90 arg
s=0x55d60a6aff80 ret=0x55d60a6a8a10
2021-11-15 13:32:35.087+0000: 197248: debug : virRotatingFileWriterEntryNew:104 : Opening /var/lib/nova/instances/9f27d8ce-045d-4bc8-a0fd-735b8952b9c9/console.log mode=0600
2021-11-15 13:32:35.087+0000: 197248: debug : virThreadJobClear:122 : Thread 197248 finished job virLogManagerProtocolDispatchDomainOpenLogFile with ret=1
2021-11-15 13:32:35.087+0000: 197248: debug : virThreadJobSet:97 : Thread 197248 is now running job virLogManagerProtocolDispatchDomainGetLogFilePosition
2021-11-15 13:32:35.087+0000: 197248: debug : virLogManagerProtocolDispatchDomainGetLogFilePositionHelper:49 : server=0x55d60a698880 client=0x55d60a6a4260 msg=0x55d60a6a8690 rerr=0x7ffe699dad90 args=0x55d60a6a8170 ret=0x55d60a6aaec0
2021-11-15 13:32:35.087+0000: 197248: debug : virThreadJobClear:122 : Thread 197248 finished job virLogManagerProtocolDispatchDomainGetLogFilePosition with ret=0

Failing instance:

2021-11-15 12:02:43.503+0000: 197248: debug : virRotatingFileWriterEntryNew:104 : Opening /var/log/libvirt/qemu/instance-00000001.log mode=0600
2021-11-15 12:02:43.503+0000: 197248: debug : virThreadJobClear:122 : Thread 197248 finished job virLogManagerProtocolDispatchDomainOpenLogFile with ret=1
2021-11-15 12:02:43.586+0000: 197248: debug : virThreadJobSet:97 : Thread 197248 is now running job virLogManagerProtocolDispatchDomainOpenLogFile
2021-11-15 12:02:43.586+0000: 197248: debug : virLogManagerProtocolDispatchDomainOpenLogFileHelper:76 : server=0x55d60a698880 client=0x55d60a6a4040 msg=0x55d60a699cb0 rerr=0x7ffe699dad90 arg
s=0x55d60a6a8130 ret=0x55d60a6a8170
2021-11-15 12:02:43.586+0000: 197248: debug : virRotatingFileWriterEntryNew:104 : Opening /var/lib/nova/instances/e0021d5d-daa0-40bb-9be6-f2d203dc222e/console.log mode=0600
2021-11-15 12:02:43.586+0000: 197248: debug : virThreadJobClear:122 : Thread 197248 finished job virLogManagerProtocolDispatchDomainOpenLogFile with ret=1
2021-11-15 12:02:43.594+0000: 197248: debug : virThreadJobSet:97 : Thread 197248 is now running job virLogManagerProtocolDispatchDomainGetLogFilePosition
2021-11-15 12:02:43.594+0000: 197248: debug : virLogManagerProtocolDispatchDomainGetLogFilePositionHelper:49 : server=0x55d60a698880 client=0x55d60a6a4040 msg=0x55d60a699cb0 rerr=0x7ffe699da
d90 args=0x55d60a6a8b60 ret=0x55d60a6a8b80
2021-11-15 12:02:43.594+0000: 197248: debug : virThreadJobClear:122 : Thread 197248 finished job virLogManagerProtocolDispatchDomainGetLogFilePosition with ret=0
2021-11-15 12:02:44.880+0000: 197248: debug : virLogDaemonClientFree:314 : priv=0x55d60a699d80 client=205746

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-16:

#12

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c38/818004/2/check/tripleo-ci-centos-8-standalone/c387080/logs/undercloud/var/lib/config-data/puppet-generated/nova_libvirt/etc/nova/nova.conf

shows:

cpu_mode=custom

#
# An ordered list of CPU models the host supports. For more information, refer
# to the documentation. (list value)
# Deprecated group/name - [libvirt]/cpu_model
#cpu_models =
cpu_models=Nehalem

but still has failing tempest with SSh timeouts:

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_c38/818004/2/check/tripleo-ci-centos-8-standalone/c387080/logs/undercloud/var/log/tempest/stestr_results.html

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-16: Related fix proposed to tripleo-quickstart (master)

#13

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/818043

Revision history for this message

Marios Andreou (marios-b) wrote on 2021-11-16: Re: master and wallaby check/gate/periodic tests are failing tempest - connectivity issues

#14

ykarel patch in comment #13 is trying to pin to the 'older' (working) versions of libvirt/qemu

10:55 < marios> ykarel: so you think it is to do with some conflicting versions of those packages ? coming from different repos?
10:56 < ykarel> marios, i just think it's those newver versions that caused the issue, the proposed fix is just workaround to clear gates until actual fix available
10:57 < ykarel> based on some of my local tests this should clear the ci
10:57 < ykarel> just waiting for ci result, but i believe it will pass

hopefully that can unblock us until the issue with the newer versions is resolved - waiting for ykarel to confirm his tests and we can try to merge it

Revision history for this message

yatin (yatinkarel) wrote on 2021-11-16:

#15

https://review.opendev.org/c/openstack/tripleo-quickstart/+/818043 is Temporary patch to exclude libvirt/qemu from AppStream repo to unblock the CI.

Based on some local tests, qemu-kvm exclude was enough but ok to exclude libvirt too considering the workaround as temporary.

Good version:- qemu-kvm-6.0.0-33.el8s.x86_64
Bad version:- qemu-kvm-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64

Also checking further it seems issue is specific to q35 machine type which is set for TripleO deployments[1] since wallaby. Just to test tried manually changing machine type to victoria equivalent and instance booted fine. Also NOTE issue is not seen in devstack/packstac/puppet jobs as those don't have q35 machine type set.

I think someone from Compute will have more insights on why latest qemu-kvm have issues with q35 machine type instances. The workaround can be reverted once the issue is root caused and fixed properly.

[1] https://github.com/openstack/tripleo-heat-templates/commit/4efd15e15a2db4307bed625cf85973ef093a2b6e

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2021-11-16:

#16

After digging into this issue, i have some thoughts. Some facts:

- New version of qemu-kvm pushed to CentOS Stream 8 OS break oooq jobs because of the details showed in previous comments.
- If i understood correctly from conversation in #oooq, all reviews in tripleo repos are creating new containers from current-tripleo dlrn repo + current status of deps, OS and other dependencies.
- In this case, recreating containers on each review made all jobs to fail, not only promotion ones but also check and gate ones.
- As per conversations, this is a design decision, where rebuilding and serving containers (from provider job) was preferred over pulling and doing controlled update (only the packages being reviewed) because of problems with rate limiting in registry.

And now, just my opinion (as external non-ci team guy):

- Rebuilding the containers on check/gate jobs breaks the idea of promoting content, and limits it to only dlrn content.
- One of the advantages of including containers in the promotions was to extend the isolation of changes to not only the dlrn content but to the whole content of the containers. This still can have issues with packages installed in the hosts themselves, but provide a better protection against OS or deps changes.

I understand that this was evaluated when making the decision of rebuilding containers on each review, so this behaviour was expected, but it may be good to still think about alternatives (using a different registry?). There may be a viable one?

Revision history for this message

Sagi (Sergey) Shnaidman (sshnaidm) wrote on 2021-11-16:

#17

@Alfredo, before rebuilding containers they were pulled from docker.io and updated on the host with same deps and OS packages. It's not a question of rebuilding them or pulling from a registry, but about their updates after the host get all of them.
We provide a list of repos that should be active when updating the containers, so that they can include built packages and newest tripleo code. This was always the way and "Stream" is known by its breakages for CI.
We have an option to disable OS repos (or any other) when updating the containers, but I'm afraid it wouldn't be supported in the community.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2021-11-16:

#18

note: there should be no libvirt on hosts, only in containers

Ronelle Landy (rlandy) on 2021-11-16

summary:

master and wallaby check/gate/periodic tests are failing tempest -
- connectivity issues
+ updated libvirt/qemu to 6.1 from CentOS appstream

Revision history for this message

Kashyap Chamarthy (kashyapc) wrote on 2021-11-16: Re: master and wallaby check/gate/periodic tests are failing tempest - updated libvirt/qemu to 6.1 from CentOS appstream

#19

Please attach these things to the bug:

- The buggy libvirt guest XML
- The full QEMU command-line of the above guest XML (/var/log/libvirt/qemu/instance-yyyyyyy.log)

So, I was told moving to QEMU 6.1 the issue goes away (i.e. the console log is created); however with QEMU 6.0 the console.log is not created.

If we know the buggy QEMU version and the working QEMU version, that can allow the QEMU folks to `git bisect` QEMU to find the problem commit.

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2021-11-16:

#20

virtqemud.tar.gz Edit (40.2 KiB, application/x-tar)

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2021-11-16:

#21

instance-00000001.log Edit (5.7 KiB, text/plain)

Revision history for this message

Alfredo Moralejo (amoralej) wrote on 2021-11-16:

#22

My understanding is that, before moving to rebuilding all containers, the packages update was done in a way that only the packages in the temporary repo built by dlrn in the same job and the tripleo packages in delorean-current where updated.

There was some logic in the tooling to not update deps and os packages in the container. I remember we hit some issues with some tooling but i'm pretty sure the update was done in a controlled way.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-16: Related fix merged to tripleo-quickstart (master)

#23

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart/+/818043
Committed: https://opendev.org/openstack/tripleo-quickstart/commit/1d5262a32a2cd8d2b6a3e4659812641f49e17658
Submitter: "Zuul (22348)"
Branch: master

commit 1d5262a32a2cd8d2b6a3e4659812641f49e17658
Author: yatinkarel <email address hidden>
Date: Tue Nov 16 10:41:05 2021 +0530

Exclude libvirt/qemu from AppStream repo

    libvirt/qemu are rebased to newer version in
    AppStream repo than advanced-virtualization, and
    likely due to that jobs started failing. Somehow
    jobs are passing in releases before wallaby even
    with newver version.

Unless it get's root caused and get fixed temporary
exclude libvirt/qemu from AppStream repo.

Related-Bug: #1950916
Change-Id: Ia6a9e01ca2adbde1e7a0b7cce9fe5842a0d54b1b

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-16: Related fix proposed to tripleo-ci (master)

#24

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/818139

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-17: Related fix merged to tripleo-ci (master)

#25

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/818139
Committed: https://opendev.org/openstack/tripleo-ci/commit/dd077fbaaf5cd90345e2daa8fbe510dec80059d9
Submitter: "Zuul (22348)"
Branch: master

commit dd077fbaaf5cd90345e2daa8fbe510dec80059d9
Author: Ronelle Landy <email address hidden>
Date: Tue Nov 16 15:47:28 2021 -0500

Exclude libvirt/qemu from AppStream: container

    libvirt/qemu are rebased to newer version in
    AppStream repo than advanced-virtualization, and
    likely due to that jobs started failing. Somehow
    jobs are passing in releases before wallaby even
    with newver version.

This patch is a temporary fix to
exclude libvirt/qemu from AppStream repo.

Change-Id: Iac4425d655df2f519f5b467e3754dad08ac9b1f5
Related-Bug: #1950916

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-17: Related fix proposed to tripleo-quickstart-extras (master)

#26

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/818229

Revision history for this message

Ronelle Landy (rlandy) wrote on 2021-11-17: Re: master and wallaby check/gate/periodic tests are failing tempest - updated libvirt/qemu to 6.1 from CentOS appstream

#27

A second fix was required for container builds: https://review.opendev.org/c/openstack/tripleo-ci/+/818139

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-11-17:

#28

Along with using q35 these jobs are also using accel=tcg ([libvirt]virt_type=qemu) and might be seeing increased memory pressure during tempest runs due to https://bugs.launchpad.net/nova/+bug/1949606 (QEMU >= 5.0.0 with -accel tcg uses a tb-size of 1GB causing OOM issues in CI) . Others have already ruled that out as an underlying cause here but I think we should keep it in mind here especially when combined with q35.

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2021-11-17:

#29

nvm it looks like this is https://bugzilla.redhat.com/show_bug.cgi?id=2006409 with another workaround being tested here https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/818229

Revision history for this message

yatin (yatinkarel) wrote on 2021-11-18:

#30

<< nvm it looks like this is https://bugzilla.redhat.com/show_bug.cgi?id=2006409 with another workaround being tested here https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/818229

The workaround(NovaLibvirtNumPciePorts: 12) works fine[1] with qemu-kvm-6.1.0 + q35 instances for CentOS8 stream too.

[1] https://logserver.rdoproject.org/72/36772/6/check/tripleo-ci-centos-8-standalone/b6c7cfd/logs/undercloud/var/log/tempest/stestr_results.html.gz

Revision history for this message

Alan Pevec (apevec) wrote on 2021-11-18:

#31

Also reported by oVirt:
https://bugzilla.redhat.com/show_bug.cgi?id=2024605

Revision history for this message

Alan Pevec (apevec) wrote on 2021-11-18:

#32

And existing RHEL 8 rhbz is https://bugzilla.redhat.com/show_bug.cgi?id=2007129
the new one by oVirt is likely a dup.

summary:

- master and wallaby check/gate/periodic tests are failing tempest -
- updated libvirt/qemu to 6.1 from CentOS appstream
+ qemu 6.1 in CentOS Stream 8/9 regression in q35, unable to add more than
+ 15 pcie-root-ports

Revision history for this message

Artom Lifshitz (notartom) wrote on 2021-11-18:

#33

BZ has been filed with qemu: https://bugzilla.redhat.com/show_bug.cgi?id=2024662

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-01-11: Related fix merged to tripleo-quickstart-extras (master)

#34

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/822344
Committed: https://opendev.org/openstack/tripleo-quickstart-extras/commit/37a64ea0ca7d5f409dd46b38a534a4a863041ce9
Submitter: "Zuul (22348)"
Branch: master

commit 37a64ea0ca7d5f409dd46b38a534a4a863041ce9
Author: Amol Kahat <email address hidden>
Date: Mon Dec 20 22:32:50 2021 +0530

set NovaLibvirtNumPciePorts to 12

    New qemu-kvm-6.1.0 version is buggy version and excluded in master
    and wallaby (Ia6a9e01ca2adbde1e7a0b7cce9fe5842a0d54b1b,
    Iac4425d655df2f519f5b467e3754dad08ac9b1f5).
    New version was impacted q35 machines for CentOS-8 and Stream-9.

qemu-kvm-6.1.0 + NovaLibvirtNumPciePorts: 12 + q35 instances
for CentOS8 and Stream works fine.

    Related-Bug: #1950916
    Signed-off-by: Amol Kahat <email address hidden>
    Change-Id: I39c00edd97e321963d1ced74dcf5a2a27fa032cc

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-09-09: Change abandoned on tripleo-quickstart-extras (master)

#35

Change abandoned by "Takashi Kajinami <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/818229
Reason: This looks like a stale DNM patch. Abandoning it to clean up our queue.

Revision history for this message

Alan Pevec (apevec) wrote on 2023-06-28:

#36

CS9 fix qemu-kvm-6.2.0-1.el9

Alan Pevec (apevec) on 2023-06-28

Changed in tripleo:
status:	Triaged → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-26: Change abandoned on tripleo-ci (master)

#37

Change abandoned by "Ghanshyam <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/818904
Reason: TrieplO project is retiring now, for details, please see https://review.opendev.org/c/openstack/governance/+/905145 or reach out to OpenStack TC.

tripleo

qemu 6.1 in CentOS Stream 8/9 regression in q35, unable to add more than 15 pcie-root-ports

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches