Comment 3 for bug 2056560

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/912261
Committed: https://opendev.org/starlingx/config/commit/f8d30588ade9469dbbd97bc4b2655b30c19da6bb
Submitter: "Zuul (22348)"
Branch: master

commit f8d30588ade9469dbbd97bc4b2655b30c19da6bb
Author: Steven Webster <email address hidden>
Date: Fri Mar 8 08:30:07 2024 -0500

    Fix LDAP issue for DC subcloud

    This commit fixes an LDAP authentication issue seen on worker nodes
    of a subcloud after a rehoming procedure was performed.

    There are two main parts:

    1. Since every host of a subcloud authenticates with the system
       controller, we need to reconfigure the LDAP URI across all nodes
       of the system when the system controller network changes (upon
       rehome). Currently, it is only being reconfigured on controller
       nodes.

    2. Currently, the system uses an SNAT rule to allow worker/storage
       nodes to authenticate with the system controller when the admin
       network is in use. This is because the admin network only exists
       between controller nodes of a distributed cloud. The SNAT rule
       is needed to allow traffic from the (private) management network
       of the subcloud over the admin network to the system controller
       and back again. If the admin network is _not_ being used,
       worker/storage nodes of the subcloud can authenticate with the
       system controller, but routes must be installed on the
       worker/storage nodes to facilitate this. It becomes tricky to
       manage in certain circumstances of rehoming/network config.
       This traffic really should be treated in the same way as that
       of the admin network.

    This commit addresses the above by:

    1. Reconfiguring the ldap_server config across all nodes upon
       system controller network changes.

    2. Generalizing the current admin network nat implementation to
       handle the management network as well.

    Test Plan:

    IPv4, IPv6 distributed clouds

    1. Rehome a subcloud to another system controller and back again
       (mgmt network)
    2. Update the subcloud to use the admin network (mgmt -> admin)
    3. Rehome the subcloud to another system controller and back again
       (admin network)
    4. Update the subcloud to use the mgmt network (admin -> mgmt)

    After each of the numbered steps, the following were performed:

    a. Ensure the system controller could become managed, online, in-sync
    b. Ensure the iptables SNAT rules were installed or updated
       appropriately on the subcloud controller nodes.
    c. Log into a worker node of the subcloud and ensure sudo commands
       could be issued without LDAP timeout.
    d. Log into worder node with LDAP USER X via console and verify
       login succeed

    In general, tcpdump was also used to ensure the SNAT translation was
    actually happening.

    Partial-Bug: #2056560

    Change-Id: Ia675a4ff3a2cba93e4ef62b27dba91802811e097
    Signed-off-by: Steven Webster <email address hidden>