commit 17c1b8894deeb973dfb29a5fcac9fd630591b649
Author: Robert Church <email address hidden>
Date: Wed Sep 2 00:59:44 2020 -0400
Introduce k8s pod recovery service
Add a recovery service, started by systemd on a host boot, that waits
for pod transitions to stabilize and then takes corrective action for
the following set of conditions:
- Delete to restart pods stuck in an Unknown or Init:Unknown state for
the 'openstack' and 'monitor' namespaces.
- Delete to restart Failed pods stuck in a NodeAffinity state that occur
in any namespace.
- Delete to restart the libvirt pod in the 'openstack' namespace when
any of its conditions (Initialized, Ready, ContainersReady,
PodScheduled) are not True.
This will only recover pods specific to the host where the service is
installed.
This service is installed on all controller types. There is currently no
evidence that we need this on dedicated worker nodes.
Each of these conditions should to be evaluated after the next k8s
component rebase to determine if any of these recovery action can be
removed.
Change-Id: I0e304d1a2b0425624881f3b2d9c77f6568844196
Closes-Bug: #1893977
Signed-off-by: Robert Church <email address hidden>
Reviewed: https:/ /review. opendev. org/749634 /git.openstack. org/cgit/ starlingx/ integ/commit/ ?id=17c1b8894de eb973dfb29a5fca c9fd630591b649
Committed: https:/
Submitter: Zuul
Branch: master
commit 17c1b8894deeb97 3dfb29a5fcac9fd 630591b649
Author: Robert Church <email address hidden>
Date: Wed Sep 2 00:59:44 2020 -0400
Introduce k8s pod recovery service
Add a recovery service, started by systemd on a host boot, that waits
for pod transitions to stabilize and then takes corrective action for
the following set of conditions:
- Delete to restart pods stuck in an Unknown or Init:Unknown state for
the 'openstack' and 'monitor' namespaces.
- Delete to restart Failed pods stuck in a NodeAffinity state that occur
in any namespace.
- Delete to restart the libvirt pod in the 'openstack' namespace when
any of its conditions (Initialized, Ready, ContainersReady,
PodScheduled) are not True.
This will only recover pods specific to the host where the service is
installed.
This service is installed on all controller types. There is currently no
evidence that we need this on dedicated worker nodes.
Each of these conditions should to be evaluated after the next k8s
component rebase to determine if any of these recovery action can be
removed.
Change-Id: I0e304d1a2b0425 624881f3b2d9c77 f6568844196
Closes-Bug: #1893977
Signed-off-by: Robert Church <email address hidden>