sssd service file has Restart set to on-failure

Bug #2023421 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Andy

Bug Description

Brief Description
-----------------
sssd service file has Restart=on-failure. This will cause systemd trying to restart sssd service in case it fails. Meanwhile sssd is monitored by pmon which will try to restart it as well. The fight between systemd and pmon sometimes causes noise errors in pmon logs.

Any processes monitored by pmon should have Restart set to "no".

Severity
--------
Minor: System/Feature is usable with minor issue

Steps to Reproduce
------------------
In a live system, run "systemctl cat sssd"

Expected Behavior
------------------
"systemctl cat sssd" shows Restart=no.

Actual Behavior
----------------
"systemctl cat sssd" shows Restart=on-failure

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
STX master latest.

Last Pass
---------
Unknown

Timestamp/Logs
--------------
Sometimes pmon.log shows sssd respawn failed error:

2023-06-08T21:37:14.323 [66337.07535] controller-0 pmond mon pmonHdlr.cpp ( 292) manage_process_failure :Error : sssd failed (4017152) (p:1 a:0)
2023-06-08T21:37:14.327 [66337.07536] controller-0 pmond com nodeUtil.cpp (1898) get_system_state : Info : systemctl reports host in 'degraded' state (0)
2023-06-08T21:37:14.327 [66337.07537] controller-0 pmond mon pmonFsm.cpp ( 512) pmon_passive_handler : Info : sssd Sending Log Event to Maintenance
2023-06-08T21:37:14.327 [66337.07538] controller-0 pmond mon pmonHdlr.cpp (1529) manage_alarm : Info : sssd process has failed ; Auto recovery in progress.
2023-06-08T21:37:14.327 [66337.07539] controller-0 pmond mon pmonMsg.cpp ( 327) pmon_send_event : Info : controller-0 pmon log sent
2023-06-08T21:37:14.405 [66337.07540] controller-0 pmond mon pmonHdlr.cpp ( 990) process_running : Info : sssd process not running
2023-06-08T21:37:14.405 [66337.07541] controller-0 pmond mon pmonHdlr.cpp (1294) respawn_process : Info : sssd Spawn (4019729)
2023-06-08T21:37:19.905 [66337.07542] controller-0 pmond mon pmonFsm.cpp ( 624) pmon_passive_handler : Info : sssd Monitor (4019730)

2023-06-08T21:37:33.681 [66337.07543] controller-0 pmond mon pmonHdlr.cpp ( 990) process_running : Info : sssd process not running
2023-06-08T21:37:33.681 [66337.07544] controller-0 pmond mon pmonFsm.cpp ( 645) pmon_passive_handler : Warn : sssd Respawn Monitor Failed (1 of 3), retrying in (5 secs)
2023-06-08T21:37:40.406 [66337.07545] controller-0 pmond mon pmonHdlr.cpp ( 990) process_running : Info : sssd process not running
2023-06-08T21:37:40.406 [66337.07546] controller-0 pmond mon pmonHdlr.cpp (1294) respawn_process : Info : sssd Spawn (4021081)
2023-06-08T21:37:45.409 [66337.07547] controller-0 pmond mon pmonFsm.cpp ( 624) pmon_passive_handler : Info : sssd Monitor (4021082)
2023-06-08T21:38:07.904 [66337.07548] controller-0 pmond mon pmonFsm.cpp ( 659) pmon_passive_handler : Info : sssd Stable (4021082)
2023-06-08T21:38:08.405 [66337.07549] controller-0 pmond mon pmonFsm.cpp ( 731) pmon_passive_handler : Info : sssd Recovered (4021082)
2023-06-08T21:38:08.405 [66337.07550] controller-0 pmond mon pmonHdlr.cpp (1125) register_process : Info : sssd Registered (4021082)

Test Activity
-------------
Developer Testing

Workaround
----------
N/A

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config-files (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config-files (master)

Change abandoned by "Andy Ning <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/config-files/+/885907

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Andy Ning <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/config-files/+/885906
Reason: This will be implemented in puppet.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/886257

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/886257
Committed: https://opendev.org/starlingx/stx-puppet/commit/559b79b72eb4f1bd70c6d24749f5cad3b946b36c
Submitter: "Zuul (22348)"
Branch: master

commit 559b79b72eb4f1bd70c6d24749f5cad3b946b36c
Author: Andy Ning <email address hidden>
Date: Fri Jun 16 09:06:33 2023 -0400

    Add sssd systemd service file override

    sssd is monitored by pmon. But currently the Restart option in its
    systemd service file is set to on-failure. This sometimes causes
    systemd and pmon to fight to restart the service when it fails. All
    processes monitored by pmon should have Restart set to "no".

    This change added a systemd override file to set Restart to "no" for
    sssd service.

    Test Plan:
    PASS: Standard system deployment.
    PASS: Check sssd Restart option using "systemctl cat sssd", verify
          Restart option is set to "no", as following:

          # /etc/systemd/system/sssd.service.d/sssd-stx-override.conf
          [Service]
          # pmond monitors sssd service
          Restart=no
    PASS: Kill sssd process, verify pmon restart it successfully by
          tailing pmon.log, and verify sssd is running by "systemctl
          status sssd" command.

    Closes-Bug: 2023421
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: I84521caf3745122492afe9ef4a251e42129b29b0

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
importance: Undecided → Low
tags: added: stx.config stx.sec
tags: added: stx.9.0 stx.security
removed: stx.sec
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.