Comment 4 for bug 2056560

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/912262
Committed: https://opendev.org/starlingx/stx-puppet/commit/ff0782df3932b38136fd49a22d4e8509e611cd39
Submitter: "Zuul (22348)"
Branch: master

commit ff0782df3932b38136fd49a22d4e8509e611cd39
Author: Steven Webster <email address hidden>
Date: Fri Mar 8 08:38:02 2024 -0500

    Fix LDAP issue for DC subcloud

    This commit fixes an LDAP authentication issue seen on worker nodes
    of a subcloud after a rehoming procedure was performed.

    Currently, the system uses an SNAT rule to allow worker/storage nodes
    to authenticate with the system controller when the admin network is
    in use. This is because the admin network only exists between
    controller nodes of a distributed cloud. The SNAT rule is needed to
    allow traffic from the (private) management network of the subcloud
    over the admin network to the system controller and back again.
    If the admin network is _not_ being used, worker/storage nodes of
    the subcloud can authenticate with the system controller, but routes
    must be installed on the worker/storage nodes to facilitate this.
    It becomes tricky to manage in certain circumstances of rehoming.
    This traffic really should be treated in the same way as that of the
    admin network.

    This commit addresses the above by generalizing the current admin
    network nat implementation to handle the management network as well.

    Test Plan:

    IPv4, IPv6 distributed clouds

    1. Rehome a subcloud to another system controller and back again
       (mgmt network)
    2. Update the subcloud to use the admin network (mgmt -> admin)
    3. Rehome the subcloud to another system controller and back again
       (admin network)
    4. Update the subcloud to use the mgmt network (admin -> mgmt)

    After each of the numbered steps, the following were performed:

    a. Ensure the system controller could become managed, online, in-sync
    b. Ensure the iptables SNAT rules were installed or updated
       appropriately on the subcloud controller nodes.
    c. Log into a worker node of the subcloud and ensure sudo commands
       could be issued without LDAP timeout.

    In general, tcpdump was also used to ensure the SNAT translation was
    actually happening.

    Closes-Bug: #2056560
    Depends-On: https://review.opendev.org/c/starlingx/config/+/912261

    Change-Id: If583b8eec7a385fb9b38e3ff80d58f5d842fe944
    Signed-off-by: Steven Webster <email address hidden>