Recently, it has been observed that, on systems with multiple controller
nodes, the `clients` pods are failing to initialize on standby
controllers due to the absence of their respective working directories.
In the past, this wasn't a problem because the working directory was
originally mounted with type `DirectoryOrCreate`, that is, K8s was
responsible for ensuring that this directory existed during `clients`
pods initialization.
However, the problem with this parameter is that it creates directories
with `root:root` permissions, which isn't ideal for system setups
involving multiple user accesses.
At the time, we solved this problem by simply moving the working
directory creation logic to the application's lifecycle code, as seen in
[1].
This turned out to have side effects on systems with multiple
controller nodes, however, as not all lifecycle hooks run on standby
controllers. Consequently, the working directories weren't being created
on these nodes.
Simply put, we can solve the pod initialization problem by mounting the
directories with `DirectoryOrCreate` (again). However, we must ensure
that these directories will have the right permissions when a host
swacts, and that's exactly what this change is aimed at.
This change also improves the code, by:
* Replacing the `change_file_mode()` and `change_file_owner()` utility
functions with `os` builtins;
* Synchronizing LDAP groups with Linux groups.
- In some scenarios, e.g., multiple "applies followed by removes",
the `openstack` LDAP group was created with a different GID than
the `openstack` Linux group, which caused issues with checking
the clients' working directory permissions.
Test Plan (on AIO-DX):
PASS - Build python3-k8sapp-openstack package
PASS - Build stx-openstack-helm-fluxcd package
PASS - Build stx-openstack helm charts
PASS - Upload/apply stx-openstack
PASS - Verify that all `clients` pods are running
On active controller:
PASS - Verify that the `clients` working directory has the right permissions
On standby controller:
PASS - Verify that the `clients` working directory *does not* have the
right permissions
PASS - Perform a host swact
PASS - Verify that the `clients` working directory has the right permissions on the former standby controller
PASS - Remove/delete stx-openstack
PASS - Repeat the test plan from upload/apply step successfully
Reviewed: https:/ /review. opendev. org/c/starlingx /openstack- armada- app/+/896523 /opendev. org/starlingx/ openstack- armada- app/commit/ 979572890aaaf29 14a2ebb62ac35c7 d2b0476bf0
Committed: https:/
Submitter: "Zuul (22348)"
Branch: f/antelope
commit 979572890aaaf29 14a2ebb62ac35c7 d2b0476bf0
Author: Luan Nunes Utimura <email address hidden>
Date: Fri Aug 11 09:09:38 2023 -0300
clients: Fix dir. creation on standby controllers
Recently, it has been observed that, on systems with multiple controller
nodes, the `clients` pods are failing to initialize on standby
controllers due to the absence of their respective working directories.
In the past, this wasn't a problem because the working directory was ate`, that is, K8s was
originally mounted with type `DirectoryOrCre
responsible for ensuring that this directory existed during `clients`
pods initialization.
However, the problem with this parameter is that it creates directories
with `root:root` permissions, which isn't ideal for system setups
involving multiple user accesses.
At the time, we solved this problem by simply moving the working
directory creation logic to the application's lifecycle code, as seen in
[1].
This turned out to have side effects on systems with multiple
controller nodes, however, as not all lifecycle hooks run on standby
controllers. Consequently, the working directories weren't being created
on these nodes.
Simply put, we can solve the pod initialization problem by mounting the
directories with `DirectoryOrCreate` (again). However, we must ensure
that these directories will have the right permissions when a host
swacts, and that's exactly what this change is aimed at.
This change also improves the code, by: file_mode( )` and `change_ file_owner( )` utility
* Replacing the `change_
functions with `os` builtins;
* Synchronizing LDAP groups with Linux groups.
- In some scenarios, e.g., multiple "applies followed by removes",
the `openstack` LDAP group was created with a different GID than
the `openstack` Linux group, which caused issues with checking
the clients' working directory permissions.
[1] https:/ /opendev. org/starlingx/ openstack- armada- app/src/ commit/ b2e10bfc5f25b3a 7d2ed4d4c29cc67 bf1dea3bdd/ python3- k8sapp- openstack/ k8sapp_ openstack/ k8sapp_ openstack/ lifecycle/ lifecycle_ openstack. py#L310
Test Plan (on AIO-DX): k8sapp- openstack package helm-fluxcd package
PASS - Build python3-
PASS - Build stx-openstack-
PASS - Build stx-openstack helm charts
PASS - Upload/apply stx-openstack
PASS - Verify that all `clients` pods are running
On active controller:
permissio ns
PASS - Verify that the `clients` working directory has the right
On standby controller:
PASS - Verify that the `clients` working directory *does not* have the
right permissions
PASS - Perform a host swact
permissions on the former standby controller
PASS - Verify that the `clients` working directory has the right
PASS - Remove/delete stx-openstack
PASS - Repeat the test plan from upload/apply step successfully
Closes-Bug: 2031058
Change-Id: Ic75d1d0d60855e 7956b797079313f a369049384b e28b9dbfbef1e86 ba8993942e)
Signed-off-by: Luan Nunes Utimura <email address hidden>
(cherry picked from commit dca8b7519244149