StarlingX

Bug #2031058
Comment #4

Comment 4 for bug 2031058

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-09-26: Fix merged to openstack-armada-app (f/antelope)

Reviewed: https://review.opendev.org/c/starlingx/openstack-armada-app/+/896523
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/979572890aaaf2914a2ebb62ac35c7d2b0476bf0
Submitter: "Zuul (22348)"
Branch: f/antelope

commit 979572890aaaf2914a2ebb62ac35c7d2b0476bf0
Author: Luan Nunes Utimura <email address hidden>
Date: Fri Aug 11 09:09:38 2023 -0300

clients: Fix dir. creation on standby controllers

    Recently, it has been observed that, on systems with multiple controller
    nodes, the `clients` pods are failing to initialize on standby
    controllers due to the absence of their respective working directories.

    In the past, this wasn't a problem because the working directory was
    originally mounted with type `DirectoryOrCreate`, that is, K8s was
    responsible for ensuring that this directory existed during `clients`
    pods initialization.

    However, the problem with this parameter is that it creates directories
    with `root:root` permissions, which isn't ideal for system setups
    involving multiple user accesses.

    At the time, we solved this problem by simply moving the working
    directory creation logic to the application's lifecycle code, as seen in
    [1].

    This turned out to have side effects on systems with multiple
    controller nodes, however, as not all lifecycle hooks run on standby
    controllers. Consequently, the working directories weren't being created
    on these nodes.

    Simply put, we can solve the pod initialization problem by mounting the
    directories with `DirectoryOrCreate` (again). However, we must ensure
    that these directories will have the right permissions when a host
    swacts, and that's exactly what this change is aimed at.

    This change also improves the code, by:
      * Replacing the `change_file_mode()` and `change_file_owner()` utility
        functions with `os` builtins;
      * Synchronizing LDAP groups with Linux groups.
          - In some scenarios, e.g., multiple "applies followed by removes",
            the `openstack` LDAP group was created with a different GID than
            the `openstack` Linux group, which caused issues with checking
            the clients' working directory permissions.

[1] https://opendev.org/starlingx/openstack-armada-app/src/commit/b2e10bfc5f25b3a7d2ed4d4c29cc67bf1dea3bdd/python3-k8sapp-openstack/k8sapp_openstack/k8sapp_openstack/lifecycle/lifecycle_openstack.py#L310

    Test Plan (on AIO-DX):
    PASS - Build python3-k8sapp-openstack package
    PASS - Build stx-openstack-helm-fluxcd package
    PASS - Build stx-openstack helm charts
    PASS - Upload/apply stx-openstack
    PASS - Verify that all `clients` pods are running

    On active controller:
      PASS - Verify that the `clients` working directory has the right
             permissions

    On standby controller:
      PASS - Verify that the `clients` working directory *does not* have the
             right permissions

    PASS - Perform a host swact
    PASS - Verify that the `clients` working directory has the right
           permissions on the former standby controller
    PASS - Remove/delete stx-openstack
    PASS - Repeat the test plan from upload/apply step successfully

Closes-Bug: 2031058

    Change-Id: Ic75d1d0d60855e7956b797079313fa369049384b
    Signed-off-by: Luan Nunes Utimura <email address hidden>
    (cherry picked from commit dca8b7519244149e28b9dbfbef1e86ba8993942e)

Reviewed:  https://review.opendev.org/c/starlingx/openstack-armada-app/+/896523
Committed: https://opendev.org/starlingx/openstack-armada-app/commit/979572890aaaf2914a2ebb62ac35c7d2b0476bf0
Submitter: "Zuul (22348)"
Branch:    f/antelope

commit 979572890aaaf2914a2ebb62ac35c7d2b0476bf0
Author: Luan Nunes Utimura <LuanNunes.Utimura@windriver.com>
Date:   Fri Aug 11 09:09:38 2023 -0300

clients: Fix dir. creation on standby controllers
    
    Recently, it has been observed that, on systems with multiple controller
    nodes, the `clients` pods are failing to initialize on standby
    controllers due to the absence of their respective working directories.
    
    In the past, this wasn't a problem because the working directory was
    originally mounted with type `DirectoryOrCreate`, that is, K8s was
    responsible for ensuring that this directory existed during `clients`
    pods initialization.
    
    However, the problem with this parameter is that it creates directories
    with `root:root` permissions, which isn't ideal for system setups
    involving multiple user accesses.
    
    At the time, we solved this problem by simply moving the working
    directory creation logic to the application's lifecycle code, as seen in
    [1].
    
    This turned out to have side effects on systems with multiple
    controller nodes, however, as not all lifecycle hooks run on standby
    controllers. Consequently, the working directories weren't being created
    on these nodes.
    
    Simply put, we can solve the pod initialization problem by mounting the
    directories with `DirectoryOrCreate` (again). However, we must ensure
    that these directories will have the right permissions when a host
    swacts, and that's exactly what this change is aimed at.
    
    This change also improves the code, by:
      * Replacing the `change_file_mode()` and `change_file_owner()` utility
        functions with `os` builtins;
      * Synchronizing LDAP groups with Linux groups.
          - In some scenarios, e.g., multiple "applies followed by removes",
            the `openstack` LDAP group was created with a different GID than
            the `openstack` Linux group, which caused issues with checking
            the clients' working directory permissions.
    
    [1] https://opendev.org/starlingx/openstack-armada-app/src/commit/b2e10bfc5f25b3a7d2ed4d4c29cc67bf1dea3bdd/python3-k8sapp-openstack/k8sapp_openstack/k8sapp_openstack/lifecycle/lifecycle_openstack.py#L310
    
    Test Plan (on AIO-DX):
    PASS - Build python3-k8sapp-openstack package
    PASS - Build stx-openstack-helm-fluxcd package
    PASS - Build stx-openstack helm charts
    PASS - Upload/apply stx-openstack
    PASS - Verify that all `clients` pods are running
    
    On active controller:
      PASS - Verify that the `clients` working directory has the right
             permissions
    
    On standby controller:
      PASS - Verify that the `clients` working directory *does not* have the
             right permissions
    
    PASS - Perform a host swact
    PASS - Verify that the `clients` working directory has the right
           permissions on the former standby controller
    PASS - Remove/delete stx-openstack
    PASS - Repeat the test plan from upload/apply step successfully
    
    Closes-Bug: 2031058
    
    Change-Id: Ic75d1d0d60855e7956b797079313fa369049384b
    Signed-off-by: Luan Nunes Utimura <LuanNunes.Utimura@windriver.com>
    (cherry picked from commit dca8b7519244149e28b9dbfbef1e86ba8993942e)