Hi Alexandru, thanks for checking it on a virtual environment. I also used a virtual AIO-SX to test the reapply after lock/unlock and was not able to reproduce the apply failure using the master build (20220512T035413Z).
Was this failure 100% reproducible for you? I ask because I can see two alarms related to high resource usage on your system on the same time the apply failed:
* There was an instance trying to reboot on your active controller:
| 700.005 | Instance admin-vm-1 owned by admin is rebooting on host controller-0 |
* There was a peak of memory usage:
| 100.103 | Memory threshold exceeded ; threshold 90.00%, actual 111.50% | host=controller-1.memory=platform |
So I wonder if that was not a problem related to physical resources constraints during the reapply time. From your logs I can see that the reapply timed out waiting for the "nova-api-proxy-cdffff877-fc6fb" pod around 2022-05-18 09:02:23
2022-05-18 09:02:23.142 454 ERROR armada.handlers.wait [-] [chart=openstack-nova-api-proxy]: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-nova-api-proxy)). These pods were not ready=['nova-api-proxy-cdffff877-fc6fb'][00m
2022-05-18 09:02:23.143 454 ERROR armada.handlers.armada [-] Chart deploy [openstack-nova-api-proxy] failed: armada.exceptions.k8s_exceptions.KubernetesWatchTimeoutException: Timed out waiting for pods (namespace=openstack, labels=(release_group=osh-openstack-nova-api-proxy)). These pods were not ready=['nova-api-proxy-cdffff877-fc6fb']
2022-05-18 09:02:23.143 454 ERROR armada.handlers.armada Traceback (most recent call last)
But I can also see that later this pod was able to come up, probably ~5min after the reapply timeout. From containerization_kube.info:
Wed May 18 09:37:28 UTC 2022 : : kubectl describe nodes
...
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
openstack nova-api-proxy-cdffff877-6lkxm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32m
09:37:28 - 32m ~= 09:05:28
Will try to reproduce it again on my environment later today, just wanted point out the resource constraint on your virtual environment that could be causing your apply failure.
Anyways, thanks for updating your test results here!
Hi Alexandru, thanks for checking it on a virtual environment. I also used a virtual AIO-SX to test the reapply after lock/unlock and was not able to reproduce the apply failure using the master build (20220512T035413Z).
Was this failure 100% reproducible for you? I ask because I can see two alarms related to high resource usage on your system on the same time the apply failed:
* There was an instance trying to reboot on your active controller: -1.memory= platform |
| 700.005 | Instance admin-vm-1 owned by admin is rebooting on host controller-0 |
* There was a peak of memory usage:
| 100.103 | Memory threshold exceeded ; threshold 90.00%, actual 111.50% | host=controller
So I wonder if that was not a problem related to physical resources constraints during the reapply time. From your logs I can see that the reapply timed out waiting for the "nova-api- proxy-cdffff877 -fc6fb" pod around 2022-05-18 09:02:23
2022-05-18 09:02:23.142 454 ERROR armada. handlers. wait [-] [chart= openstack- nova-api- proxy]: Timed out waiting for pods (namespace= openstack, labels= (release_ group=osh- openstack- nova-api- proxy)) . These pods were not ready=[ 'nova-api- proxy-cdffff877 -fc6fb' ][00m handlers. armada [-] Chart deploy [openstack- nova-api- proxy] failed: armada. exceptions. k8s_exceptions. KubernetesWatch TimeoutExceptio n: Timed out waiting for pods (namespace= openstack, labels= (release_ group=osh- openstack- nova-api- proxy)) . These pods were not ready=[ 'nova-api- proxy-cdffff877 -fc6fb' ] handlers. armada Traceback (most recent call last)
2022-05-18 09:02:23.143 454 ERROR armada.
2022-05-18 09:02:23.143 454 ERROR armada.
But I can also see that later this pod was able to come up, probably ~5min after the reapply timeout. From containerizatio n_kube. info:
Wed May 18 09:37:28 UTC 2022 : : kubectl describe nodes proxy-cdffff877 -6lkxm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 32m
...
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
openstack nova-api-
09:37:28 - 32m ~= 09:05:28
Will try to reproduce it again on my environment later today, just wanted point out the resource constraint on your virtual environment that could be causing your apply failure.
Anyways, thanks for updating your test results here!