landscape-package-reporter causes CPU load spikes

Bug #1999671 reported by Bas de Bruijne
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Landscape Client
New
Undecided
Unassigned
OpenStack Neutron API Charm
Invalid
Undecided
Unassigned
OpenStack Nova Cloud Controller Charm
Invalid
Undecided
Unassigned
OpenStack Nova Compute Charm
Invalid
Undecided
Unassigned

Bug Description

In testrun https://solutions.qa.canonical.com/v2/testruns/dca52771-8fb8-4206-85ab-bb558341fe09, tempest fails with a failure we see regularly:

```
Traceback (most recent call last):
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.10/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.10/ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/api/network/test_dhcp_ipv6.py", line 261, in test_dhcpv6_64_subnets
    subnet_dhcp = self.create_subnet(
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/api/network/base.py", line 141, in create_subnet
    body = client.create_subnet(
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/services/network/subnets_client.py", line 27, in create_subnet
    return self.create_resource(uri, post_data)
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/services/network/base.py", line 62, in create_resource
    resp, body = self.post(req_uri, req_post_data)
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/common/rest_client.py", line 299, in post
    return self.request('POST', url, extra_headers, headers, body, chunked)
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/common/rest_client.py", line 704, in request
    resp, resp_body = self._request(method, url, headers=headers,
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/common/rest_client.py", line 583, in _request
    resp, resp_body = self.raw_request(
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/common/rest_client.py", line 623, in raw_request
    resp, resp_body = self.http_obj.request(
  File "/home/ubuntu/snap/fcbtest/35/.rally/verification/verifier-1bc0bb7e-b194-41dc-8479-3ec9aac7ed28/repo/tempest/lib/common/http.py", line 110, in request
    r = super(ClosingHttp, self).request(method, url, retries=retry,
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/request.py", line 78, in request
    return self.request_encode_body(
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/request.py", line 170, in request_encode_body
    return self.urlopen(method, url, **extra_kw)
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/util/retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/connectionpool.py", line 451, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/snap/fcbtest/35/lib/python3.10/site-packages/urllib3/connectionpool.py", line 340, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='neutron-api.silo2.lab0.solutionsqa', port=9696): Read timed out. (read timeout=60)
```

In the neutron logs, we see:
```
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/managers.py", line 498, in _call_on_drivers
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 877, in update_port_postcommit
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers self._ovn_update_port(context._plugin_context, port, original_port,
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 763, in _ovn_update_port
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers self._ovn_client.update_port(plugin_context, port,
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 668, in update_port
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id'])
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/impl_idl_ovn.py", line 207, in lookup
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers return super().lookup(table, record, default=default, timeout=timeout,
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 208, in lookup
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers return self._lookup(table, record)
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 268, in _lookup
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers row = idlutils.row_by_value(self, rl.table, rl.column, record)
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3/dist-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 114, in row_by_value
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers raise RowNotFound(table=table, col=column, match=match)
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=be6adb03-68d5-4045-bf69-8f69ea255c57
neutron/neutron-server.log:2022-12-13 23:51:37.458 213474 ERROR neutron.plugins.ml2.managers
```

And
```
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event Traceback (most recent call last):
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/db/db_base_plugin_common.py", line 289, in _get_port
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event port = model_query.get_by_id(context, models_v2.Port, id,
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/model_query.py", line 169, in get_by_id
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return query.filter(model.id == object_id).one()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/sqlalchemy/orm/query.py", line 2856, in one
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return self._iter().one()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/sqlalchemy/engine/result.py", line 1407, in one
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return self._only_one_row(
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/sqlalchemy/engine/result.py", line 561, in _only_one_row
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise exc.NoResultFound(
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event sqlalchemy.exc.NoResultFound: No row was found when one was required
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event During handling of the above exception, another exception occurred:
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event Traceback (most recent call last):
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/ovsdbapp/event.py", line 177, in notify_loop
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event match.run(event, row, updates)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py", line 431, in run
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.l3_plugin.update_router_gateway_port_bindings(
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/services/ovn_l3/plugin.py", line 348, in update_router_gateway_port_bindings
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event port = self._plugin.update_port(
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/common/utils.py", line 701, in inner
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 218, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return method(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event setattr(e, '_RETRY_EXCEEDED', True)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.force_reraise()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise self.value
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event ectxt.value = e.inner_exc
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.force_reraise()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise self.value
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event LOG.debug("Retry wrapper got retriable exception: %s", e)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.force_reraise()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise self.value
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*dup_args, **dup_kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/plugins/ml2/plugin.py", line 1770, in update_port
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event original_port = self.get_port(context, id)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 218, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return method(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 139, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event setattr(e, '_RETRY_EXCEEDED', True)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.force_reraise()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise self.value
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 135, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 154, in wrapper
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event ectxt.value = e.inner_exc
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.force_reraise()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise self.value
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_db/api.py", line 142, in wrapper
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 183, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event LOG.debug("Retry wrapper got retriable exception: %s", e)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 227, in __exit__
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event self.force_reraise()
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 200, in force_reraise
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise self.value
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron_lib/db/api.py", line 179, in wrapped
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return f(*dup_args, **dup_kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1010, in wrapper
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event return fn(*args, **kwargs)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/db/db_base_plugin_v2.py", line 1558, in get_port
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event port = self._get_port(context, id, lazy_fields=lazy_fields)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event File "/usr/lib/python3/dist-packages/neutron/db/db_base_plugin_common.py", line 292, in _get_port
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event raise exceptions.PortNotFound(port_id=id)
neutron/neutron-server.log:2022-12-13 23:47:15.031 213475 ERROR ovsdbapp.event neutron_lib.exceptions.PortNotFound: Port f3ad8f89-8209-470c-93bf-b2bb69f3b701 could not be found.
```

I'm not sure which parts are relevant here.

Crashdumps and configs can be found here:
https://oil-jenkins.canonical.com/artifacts/dca52771-8fb8-4206-85ab-bb558341fe09/index.html

tags: added: cdo-qa cdo-tempest foundations-engine
summary: - Tempest times out reaching neutron-api
+ [jammy][yoga] Tempest times out reaching neutron-api
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote : Re: [jammy][yoga] Tempest times out reaching neutron-api

I'm not able to triage just yet, but I have a feeling that this might be a bug related to synchronisation between the OVN and neutron backends. e.g. OVN maintains its own (internal) database of what's connected and this is synced to the neutron database (on mysql). It may be that the syncing has missed out or been delayed (e.g. the failed port look-up was about a minute behind the port being assigned). So this could also be resource contention if the neutron servers (or the machines that they are being run on) are heavily loaded; this is fairly typical in the SQA lab.

If the above is true, then it's more a neutron/OVN bug than a charm bug, and we may need to reallocate it. In order to discount/rule-out the resource contention (and thus a race-hazard type issue), it would be good to capture the load stats for the machine in question - which may not currently be recorded.

Revision history for this message
Konstantinos Kaskavelis (kaskavel) wrote :
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

The load stats are recorded since recently using landscape, we don't yet export them anywhere. I will look into making that happen.

In the meantime, the number of occurrences of this bug is increasing. Maybe its possible to extract some extra information from the other occurrences? An overview can be found here: https://solutions.qa.canonical.com/bugs/1999671

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

I think it would be good to rule out resource contention/loading, if possible, please.

Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

We're going to more than just adding links to more failed runs; please add some load figures from landscape correlated with the tempest errors.

Changed in charm-neutron-api:
status: New → Incomplete
Changed in charm-nova-compute:
status: New → Incomplete
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

We have some load figures, for testrun eff35558-063b-4313-8ee6-30a406a8782d:

Build output (jujus status at the bottom): https://oil-jenkins.canonical.com/job/fce_build/27515//console
Load figures: https://solutions.qa.canonical.com/grafana/d/r73je3CVz/test-run-metrics?orgId=1&var-testrun_uuid=eff35558-063b-4313-8ee6-30a406a8782d&var-metric=mem_free&var-juju_model=openstack&from=1691713088000&to=1691727038000

This test run is using yoga/jammy. 39 tempest tests failed, and they are mostly related to timeouts reaching neutron. The load figures show that memory runs very low towards the end and the neutron leader machine (server-10) starts using the swap memory. I suppose that could lead to very slow responses.

Please let me know what you think. Also, if you have different metrics you wish to see we can update the automation to collect more.

Changed in charm-neutron-api:
status: Incomplete → New
Revision history for this message
Moises Emilio Benzan Mora (moisesbenzan) wrote :
Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

I did some more investigation on our systems. It turns out that while the memory does run low on our systems, the bottleneck is actually the CPU. I noticed a spike to 100% CPU usage every 30 minutes and htop revealed that this is due to the landscape-package-reporter process.

The landscape-client charm reports the packages on all 13 machines (12 lxd machines + the baremetal machine itself) at the same time every 30 minutes, which results in a CPU overload that lasts about 2 minutes. This results in the neutron server responding too slowly to requests which causes tempest to fail.

Sure enough, after removing the landscape-client charm all tempest tests passed without errors. I also realize now that we started seeing this bug a lot only when we added landscape-client to all our OpenStack SKUS.

I'm moving this bug to the landscape client. I think it would be very helpful if the various checks that landscape runs are staggered for the different machines, rather than all at the same time. I did find an option in the landscape UI to stagger the package updates, but I couldn't find a similar option for the package-reporter.

Changed in charm-neutron-api:
status: New → Invalid
Changed in charm-nova-cloud-controller:
status: New → Invalid
Changed in charm-nova-compute:
status: Incomplete → Invalid
summary: - [jammy][yoga] Tempest times out reaching neutron-api
+ landscape-package-reporter causes CPU load spikes
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

Hi Bas; great sleuthing! I want to say thanks for your detailed investigation into this issue. It's incredibly helpful to understand how CPU contention can cause an OpenStack system to start failing, so this is a really good datapoint! There's not a great deal we can do about neutron-api slow down, except to note that it needs to have sufficient CPU headroom to operate; again very useful information for deployment scenarios and great that we can catch this in testing. Nicely done!

Revision history for this message
Mitch Burton (mitchburton) wrote :

Hi Bas. You also have my compliments on the investigation.

Might I suggest a couple prospective workarounds that might help with the CPU spikes:
  1. configure the landscape-client instances, via /etc/landscape/client.conf with a startup stagger:
         stagger_launch = 0.5
     The stagger value is between 0 and 1 (defaults to 0) and is multiplied by the run interval of a monitor (with a random other factor) to delay the start-up. In the case of PackageMonitor, this would delay the startup by 1800 * <stagger_launch> * <random value [0-1]> for each client.

  2. manually configure the package_monitor_interval to be different for each client (also in the client config):
         package_monitor_interval = <number of seconds>
     Gives you a bit more control over when they run.

Let me know if either of these seems appealing. If neither do, I can dig a bit deeper, maybe.

Revision history for this message
Bas de Bruijne (basdbruijne) wrote :

Hi Mitch, thanks for the info. I think increasing the stagger_launch option would indeed help a lot. The default of 0.1 may be a bit low, but we also have unusually small systems for a production environment.

Is there a way to set these options from the charm config? I see an "additional_service_config" option for the landscape-server, but not a similar option for the landscape-client charm.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.