Systemd do not respawn sssd on failure

Bug #1821927 reported by Bogdan on 2019-03-27
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
sssd (Ubuntu)
Undecided
Andreas Hasenack

Bug Description

The sssd crashed recently (probably due to lack of free memory, other process was killed by OOM same time). However automated respawn was not performed because it was not configured in systemd unit file.

$ lsb_release -rd
Description: Ubuntu 16.04.5 LTS
Release: 16.04
brudas@hqv074:~$ apt-cache policy sssd

$ apt-cache policy sssd
sssd:
  Installed: 1.13.4-1ubuntu1.12
  Candidate: 1.13.4-1ubuntu1.12
  Version table:
 *** 1.13.4-1ubuntu1.12 500
        500 http://usa.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1.13.4-1ubuntu1.10 500
        500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
     1.13.4-1ubuntu1 500
        500 http://usa.archive.ubuntu.com/ubuntu xenial/main amd64 Packages

$ cat /lib/systemd/system/sssd.service
[Unit]
Description=System Security Services Daemon
# SSSD must be running before we permit user sessions
Before=systemd-user-sessions.service nss-user-lookup.target autofs.service
Wants=nss-user-lookup.target

[Service]
ExecStart=/usr/sbin/sssd -i -f
Type=notify
NotifyAccess=main
PIDFile=/var/run/sssd.pid

[Install]
WantedBy=multi-user.target

Expected behavior: systemd attempts to restart the service

Thank you.

Related branches

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in sssd (Ubuntu):
status: New → Confirmed
tags: added: server-triage-discuss
tags: removed: server-triage-discuss

We discussed this yesterday and think that is a vlaid bug report - thanks!
It spawned a spin-off discussion how we want to handle auto-restart in general which will go on for a while.

For this particular case Andreas wanted to take a look at the details and maybe bring the suggestion also upstream (to avoid there being known reasons for not restarting it).
To mark that I'll subscribe the team and assign it to Andreas (for now)

tags: added: server-next
Changed in sssd (Ubuntu):
assignee: nobody → Andreas Hasenack (ahasenack)

It seems this fell through the cracks, sorry for that.
While cleaning up bugs and tasks I found it and filed an upstream discussion [1].
Depending on the outcome of this we can make the modification (or not).

[1]: https://pagure.io/SSSD/sssd/issue/4040

Changed in sssd (Ubuntu):
assignee: Andreas Hasenack (ahasenack) → nobody
Changed in sssd (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Andreas Hasenack (ahasenack)
Robie Basak (racb) wrote :

I'm not keen on doing something special just for sssd. Either all "server-y" packages should be Restart=on-failure, or none of them, or we should have clear criteria. It's the job of the distribution to provide consistency across packages here.

In this case Robie we are following upstream where we suggested it due to this bug.
And while I agree to making things consistent across packages we have talked about it before and nothing happened yet. IMHO: It is correct to say "let us use this as a reminder to reconsider an effort across packages" but not "hold this one back as it might make it different (others are already)".

Robie Basak (racb) wrote :

Surely if we're in agreement that we should be consistent, that means that we either patch to make them one way, or we patch to make them consistent the other way. Patching to make them inconsistent doesn't follow, and nor does not patching an exception to leave that one different from all the others.

> hold this one back as it might make it different

Might is wrong. It _will_ make it different, and I don't think that's right.

FWIW, users already have the option to easily override this behaviour for any set of services without having to worry about stepping on future packaging changes thanks to systemd's override features in /etc (systemd-system.conf(5)).

On Thu, Aug 1, 2019 at 8:50 AM Robie Basak <email address hidden> wrote:
>
> Surely if we're in agreement that we should be consistent, that means
> that we either patch to make them one way, or we patch to make them
> consistent the other way.

Umm this is derailing I think - the last time we talked as a bigger
group IIRC the realization was there there exist different "types" of
services which would eventually need different characteristics.
Some are meant to be more "always up whatever happens" (like this case
for auth) while others would want to stop providing any service like
"before its wrong I better stop".

The end of the discussion back then was that there - will be
differences - and that it might end up that one needs to identify the
different type of services that will exist and then group all packages
into one of these groups.
To then resolve any mismatch of "decision for group" vs "current behavior".

Since none of the above exists yet, so far this decision is done on a
per-package case like here.

Robie Basak (racb) wrote :

See also bug 1838380. Perhaps these should all be merged into the same bug.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 2.2.0-4ubuntu1

---------------
sssd (2.2.0-4ubuntu1) eoan; urgency=medium

  * Merge with Debian unstable. Remaining changes:
    - Switch sss_obfuscate shebang to python3.
  * Dropped:
    - Fix build with newer samba (4.10+):
      + d/p/build-newer-samba.patch: replace ARRAY_SIZE with N_ELEMENTS, since
        the former is no longer available.
        [Fixed upstream]
      + d/p/make-n_elements-public.patch: make N_ELEMENTS public
        [Fixed upstream]
    - d/p/GPO_CROND-customization.patch: Set GPO_CROND to cron instead of
      crond for Debian and Ubuntu (LP #1572908)
      [Fixed upstream]
  * Added:
    - d/p/restart-on-failure.patch: add Restart=on-failure to sssd.service
      (LP: #1821927)

sssd (2.2.0-4) unstable; urgency=medium

  [ Sam Morris ]
  * fix-have-systemd.patch: correct detection of systemd.pc
    (Closes: #932080)
  * default-to-socket-activated-services.diff: rely on socket activation
    to spawn nss and pam responders

sssd (2.2.0-3) unstable; urgency=medium

  * common/ipa/krb5-common/proxy.postinst: Use libexec path. (Closes:
    #931859)

sssd (2.2.0-2) unstable; urgency=medium

  * rules: Override dh_installman, let dh_install handle installing
    manpages too.

sssd (2.2.0-1) unstable; urgency=medium

  * New upstream release.
  * control: Bump policy to 4.4.0.
  * control, compat, rules: Bump debhelper to 12.
  * *.install: Updated, some files moved to /usr/libexec.

sssd (2.1.0-1) experimental; urgency=medium

  * New upstream release.
  * sssd-tools.install: Local domain support is deprecated and not
    built by default anymore, so drop the files.
  * control, sssd-common.install: Secrets responder is dropped, deprecated.
  * control: Add ldap-utils to build-depends, tests need it.
  * sssd-common.install: Add new internal libs for iface/sbus.
  * fix-whitespace-test.diff: Fix ignoring the debian dir.
  * rules: Update the clean target.

sssd (1.16.4-1~exp1) experimental; urgency=medium

  [ Timo Aaltonen ]
  * New upstream release. (LP: #1572908)
  * Drop patches, all upstream.
  * Enable systemd responders. (Closes: #925026, #923882)

  [ Dominik George ]
  * Acknowledge NMU.
  * Add myself to Uploaders.

 -- Andreas Hasenack <email address hidden> Mon, 29 Jul 2019 18:09:16 -0300

Changed in sssd (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers