Comment 4 for bug 2031058

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-armada-app (f/antelope)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/896523
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/979572890aaaf2914a2ebb62ac35c7d2b0476bf0
Submitter: "Zuul (22348)"
Branch: f/antelope

commit 979572890aaaf2914a2ebb62ac35c7d2b0476bf0
Author: Luan Nunes Utimura <email address hidden>
Date: Fri Aug 11 09:09:38 2023 -0300

    clients: Fix dir. creation on standby controllers

    Recently, it has been observed that, on systems with multiple controller
    nodes, the `clients` pods are failing to initialize on standby
    controllers due to the absence of their respective working directories.

    In the past, this wasn't a problem because the working directory was
    originally mounted with type `DirectoryOrCreate`, that is, K8s was
    responsible for ensuring that this directory existed during `clients`
    pods initialization.

    However, the problem with this parameter is that it creates directories
    with `root:root` permissions, which isn't ideal for system setups
    involving multiple user accesses.

    At the time, we solved this problem by simply moving the working
    directory creation logic to the application's lifecycle code, as seen in
    [1].

    This turned out to have side effects on systems with multiple
    controller nodes, however, as not all lifecycle hooks run on standby
    controllers. Consequently, the working directories weren't being created
    on these nodes.

    Simply put, we can solve the pod initialization problem by mounting the
    directories with `DirectoryOrCreate` (again). However, we must ensure
    that these directories will have the right permissions when a host
    swacts, and that's exactly what this change is aimed at.

    This change also improves the code, by:
      * Replacing the `change_file_mode()` and `change_file_owner()` utility
        functions with `os` builtins;
      * Synchronizing LDAP groups with Linux groups.
          - In some scenarios, e.g., multiple "applies followed by removes",
            the `openstack` LDAP group was created with a different GID than
            the `openstack` Linux group, which caused issues with checking
            the clients' working directory permissions.

    [1] https://opendev.org/starlingx/openstack-armada-app/src/commit/b2e10bfc5f25b3a7d2ed4d4c29cc67bf1dea3bdd/python3-k8sapp-openstack/k8sapp_openstack/k8sapp_openstack/lifecycle/lifecycle_openstack.py#L310

    Test Plan (on AIO-DX):
    PASS - Build python3-k8sapp-openstack package
    PASS - Build stx-openstack-helm-fluxcd package
    PASS - Build stx-openstack helm charts
    PASS - Upload/apply stx-openstack
    PASS - Verify that all `clients` pods are running

    On active controller:
      PASS - Verify that the `clients` working directory has the right
             permissions

    On standby controller:
      PASS - Verify that the `clients` working directory *does not* have the
             right permissions

    PASS - Perform a host swact
    PASS - Verify that the `clients` working directory has the right
           permissions on the former standby controller
    PASS - Remove/delete stx-openstack
    PASS - Repeat the test plan from upload/apply step successfully

    Closes-Bug: 2031058

    Change-Id: Ic75d1d0d60855e7956b797079313fa369049384b
    Signed-off-by: Luan Nunes Utimura <email address hidden>
    (cherry picked from commit dca8b7519244149e28b9dbfbef1e86ba8993942e)