StarlingX

Bug #1970645
Comment #10

Comment 10 for bug 1970645

Revision history for this message

Thales Elero Cervi (tcervi) wrote on 2022-05-18:

#10

Hi Alexandru, thanks for checking it on a virtual environment. I also used a virtual AIO-SX to test the reapply after lock/unlock and was not able to reproduce the apply failure using the master build (20220512T035413Z).

Was this failure 100% reproducible for you? I ask because I can see two alarms related to high resource usage on your system on the same time the apply failed:

So I wonder if that was not a problem related to physical resources constraints during the reapply time. From your logs I can see that the reapply timed out waiting for the "nova-api-proxy-cdffff877-fc6fb" pod around 2022-05-18 09:02:23

2022-05-18 09:02:23.142 454 ERROR armada.handlers.wait [-] [chart=openstack-nova-api-proxy]: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-nova-api-proxy)). These pods were not ready=['nova-api-proxy-cdffff877-fc6fb'][00m
2022-05-18 09:02:23.143 454 ERROR armada.handlers.armada [-] Chart deploy [openstack-nova-api-proxy] failed: armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-nova-api-proxy)). These pods were not ready=['nova-api-proxy-cdffff877-fc6fb']
2022-05-18 09:02:23.143 454 ERROR armada.handlers.armada Traceback (most recent call last)

But I can also see that later this pod was able to come up, probably ~5min after the reapply timeout. From containerization_kube.info:

09:37:28 - 32m ~= 09:05:28

Will try to reproduce it again on my environment later today, just wanted point out the resource constraint on your virtual environment that could be causing your apply failure.

Anyways, thanks for updating your test results here!

Was this failure 100% reproducible for you? I ask because I can see two alarms related to high resource usage on your system on the same time the apply failed:

But I can also see that later this pod was able to come up, probably ~5min after the reapply timeout. From containerization_kube.info:

Wed May 18 09:37:28 UTC 2022 :  : kubectl describe nodes
...
  Namespace                   Name                                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                                  ------------  ----------  ---------------  -------------  ---
openstack                   nova-api-proxy-cdffff877-6lkxm                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         32m

09:37:28 - 32m ~= 09:05:28

Will try to reproduce it again on my environment later today, just wanted point out the resource constraint on your virtual environment that could be causing your apply failure.

Anyways, thanks for updating your test results here!