Wrong ssh password used in platform::firewall::runtime causes sysadmin linux user to be locked

Bug #2038550 reported by Andre Kantek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andre Kantek

Bug Description

Brief Description

Puppet runtime class platform::firewall::runtime uses script calico_firewall_remote_apply_policy.sh to remotely apply firewall policies in the system controller from a compute node. Currently, the compute node reads from keyring to get the OS_PASSWORD and uses it to connect to the active controller with ansible by passing for ansible_ssh_pass.

The issue is that OS_PASSWORD is not the Linux password. During installation they may be set to the same password, but will diverge as soon as the Linux password is changed with 'passwd' command.

OS_PASSWORD actually is the keystone password, same as 'keyring get CGCS admin'

Severity

Major: System is usable but degraded

Steps to Reproduce

1 - With the master load, install a standard system.
2 - On the active controller, change the password to something different than 'Li69nux*'
3 - Keep track of the failed login attempts with the below command:

[sysadmin@controller-0 ~(keystone_admin)]$ faillock --user sysadmin
sysadmin:
When Type Source Valid
2026-09-11 00:32:45 RHOST 192.169.1.45 V
2026-09-11 00:32:51 RHOST 192.169.1.45 V
Once 5 failed attempts are reached, sysadmin user will be locked and it will be not possible
to login anymore to the active controller.
4 - Exit and try to login again to the active controller and you will get permission denied now.

Expected Behavior

Script calico_firewall_remote_apply_policy.sh should use the Linux password instead.

Actual Behavior
Script calico_firewall_remote_apply_policy.sh erroneously uses keystone password and will lock the active controller because of failed login attempts.

Reproducibility

Seen multiple times, 100%

System Configuration
Muti-node std config

Load info (eg: 2022-03-10_20-00-07)

all mentioned labs shared same load
[sysadmin@controller-0 ~(keystone_admin)]$ cat /etc/build.info
BUILD_ID="2023-09-21_18-00-12"

Any load greater than 15/05/2023 should have this behaviour as well.

Timestamp/Logs

Failed logins with both compute-0 and compute-1 ips:

root@controller-0:/var/home/sysadmin# grep "\.40\|\.45" /var/log/auth.log | tail
2026-09-11T00:39:46.336 controller-0 sshd[1793416]: info Failed password for sysadmin from 192.169.1.45 port 50276 ssh2
2026-09-11T00:39:47.069 controller-0 sshd[1793416]: info Connection closed by authenticating user sysadmin 192.169.1.45 port 50276 [preauth]
2026-09-11T00:39:53.314 controller-0 sshd[1794506]: info Failed password for sysadmin from 192.169.1.45 port 38120 ssh2
2026-09-11T00:39:55.193 controller-0 sshd[1794506]: info Connection closed by authenticating user sysadmin 192.169.1.45 port 38120 [preauth]
2026-09-11T00:40:00.991 controller-0 sshd[1795182]: info Failed password for sysadmin from 192.169.1.45 port 60418 ssh2
2026-09-11T00:40:01.144 controller-0 sshd[1795182]: info Connection closed by authenticating user sysadmin 192.169.1.45 port 60418 [preauth]
2026-09-11T00:40:08.682 controller-0 sshd[1796436]: info Failed password for sysadmin from 192.169.1.40 port 37330 ssh2
2026-09-11T00:40:10.514 controller-0 sshd[1796436]: info Connection closed by authenticating user sysadmin 192.169.1.40 port 37330 [preauth]
2026-09-11T00:40:16.956 controller-0 sshd[1797758]: info Failed password for sysadmin from 192.169.1.40 port 47582 ssh2
2026-09-11T00:40:18.346 controller-0 sshd[1797758]: info Connection closed by authenticating user sysadmin 192.169.1.40 port 47582 [preauth]
root@controller-0:/var/home/sysadmin#

Alarms
N/A

Test Activity

DEV testing

Workaround

No workaround known so far

Impacted Regression TCs

N/A

Andre Kantek (akantek)
Changed in starlingx:
assignee: nobody → Andre Kantek (akantek)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/897467

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/897467
Committed: https://opendev.org/starlingx/stx-puppet/commit/eebf90d20e34ab43feb36471113787ba7ce3df82
Submitter: "Zuul (22348)"
Branch: master

commit eebf90d20e34ab43feb36471113787ba7ce3df82
Author: Andre Kantek <email address hidden>
Date: Wed Oct 4 08:06:47 2023 -0300

    Remove worker remote firewall scripts

    The implementation for worker firewall avoided using local kubectl
    commands. This required access to the keyring for remote ansible
    ad-hoc commands and leaves the /opt/platform/.config mounted on the
    worker.

    Use kubectl command with /etc/kubernetes/kubelet.conf instead, so we
    can refrain from mounting /opt/platform/.config

    Since all firewall data is generated in the host's hierada file, the
    worker node needs to be able to access the calico firewall resources.
    To achieve that a ClusterRole and ClusterRoleBinding are added, via
    the controller node, allowing access to only the necessary resources.

    Test Plan:
    [PASS] Install a standard setup and validate the worker node firewall
            configuration
    [PASS] Execute a DOR test in the cluster and check if the worker nodes
            install the firewall GNP and HE
    [PASS] Execute worker node lock/unlock and check if the worker nodes
            install the firewall GNP and HE

    Closes-Bug: 2038550

    Change-Id: Icf31b513427120fe81c53be21b8d8a81a8e323f8
    Signed-off-by: Andre Kantek <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.networking stx.security
Revision history for this message
Ghada Khalil (gkhalil) wrote (last edit ):

Re-opening as this change is causing an issue where controller-1 is failing to unlock. An incremental fix is required.

Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/897873

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/897873
Committed: https://opendev.org/starlingx/stx-puppet/commit/58581b88e9c21c990dd0219bd2278fc40b42ed8f
Submitter: "Zuul (22348)"
Branch: master

commit 58581b88e9c21c990dd0219bd2278fc40b42ed8f
Author: Andre Kantek <email address hidden>
Date: Tue Oct 10 17:07:40 2023 -0300

    Add k8s cfg file to the OAM firewall script

    In the change
    https://review.opendev.org/c/starlingx/stx-puppet/+/897467 the OAM
    firewall was not updated to pass the k8s config file as argument
    to calico_firewall_apply_policy.sh. It then created an error that
    prevented the global network policy to be created, making the OAM
    interface to block all traffic, except for the failsafed ones.

    This change corrects that

    Test Plan
    [PASS] In AIO-DX remove the current OAM GNP and execute lock/unlock
            on one of the controllers, verify the OAM GNP is recreated.
    [PASS] In AIO-DX remove the current OAM GNP and force the runtime
            execution by creating the file
            /etc/platform/.platform_firewall_config_required and observe
            the request to recreate the OAM GNP

    Closes-Bug: 2038550

    Change-Id: Ica03dbf6ffd9f6f592fa53efa40293191203377a
    Signed-off-by: Andre Kantek <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.