Winbind failing to start leads to postinst erroring out

Bug #1818431 reported by Noel McLoughlin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
samba (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Ubuntu 16.04 just works: Winbind was a smooth experience. We install `winbind`, `libnss-winbind`, and `libpam-winbind` using APT successfully, join the Domain/Realm, and start winbind with systemd!

Ubuntu 18.04 is regression: The `winbind` package breaks APT/DPGK package manager because `/var/lib/dpkg/info/winbind.postinst` is trying to start the service - that's bad regression.

I need package-manager (apt/dpkg) to handle packages, and service-manager (systemd/upstart) to manage services. Can the winbind package maintainer do anything to reverse the regression?

I have been reading up on debhelper but cannot find a way to prevent breaking apt.

```
~$ dpkg-query --list | grep winbind
iU libnss-winbind:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.6 amd64 Samba nameservice integration plugins
iU libpam-winbind:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.6 amd64 Windows domain authentication integration plugin
ii libwbclient0:amd64 2:4.7.6+dfsg~ubuntu-0ubuntu2.6 amd64 Samba winbind client library
iF winbind 2:4.7.6+dfsg~ubuntu-0ubuntu2.6 amd64 service to resolve user and gro

... trace ...

Active: failed (Result: exit-code) since Sun 2019-03-03 11:30:11 MST; 19ms ago
     Docs: man:winbindd(8)
           man:samba(7)
           man:smb.conf(5)
  Process: 43699 ExecStart=/usr/sbin/winbindd --foreground --no-process-group $WINBINDOPTIONS (code=exited, status=1/FAILURE)
 Main PID: 43699 (code=exited, status=1/FAILURE)

Mar 03 11:30:11 myhost1 systemd[1]: Starting Samba Winbind Daemon...
Mar 03 11:30:11 myhost1 winbindd[43699]: [2019/03/03 11:30:11.597251, 0] ../source3/winbindd/winbindd_cache.c:3170(initialize_winbindd_cache)
Mar 03 11:30:11 myhost1 winbindd[43699]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Mar 03 11:30:11 myhost1 winbindd[43699]: [2019/03/03 11:30:11.600710, 0] ../source3/winbindd/winbindd_util.c:891(init_domain_list)
Mar 03 11:30:11 myhost1 winbindd[43699]: Could not fetch our SID - did we join?
Mar 03 11:30:11 myhost1 winbindd[43699]: [2019/03/03 11:30:11.600854, 0] ../source3/winbindd/winbindd.c:1366(winbindd_register_handlers)
Mar 03 11:30:11 myhost1 winbindd[43699]: unable to initialize domain list
Mar 03 11:30:11 myhost1 systemd[1]: winbind.service: Main process exited, code=exited, status=1/FAILURE
Mar 03 11:30:11 myhost1 systemd[1]: winbind.service: Failed with result 'exit-code'.
Mar 03 11:30:11 myhost1 systemd[1]: Failed to start Samba Winbind Daemon.
dpkg: error processing package winbind (--configure):
 installed winbind package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of libpam-winbind:amd64:
 libpam-winbind:amd64 depends on winbind (= 2:4.7.6+dfsg~ubuntu-0ubuntu2.6); however:
  Package winbind is not configured yet.

dpkg: error processing package libpam-winbind:amd64 (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of libnss-winbind:amd64:
 libnss-winbind:amd64 depends on winbind (= 2:4.7.6+dfsg~ubuntu-0ubuntu2.6); however:
  Package winbind is not configured yet.

dpkg: error processing package libnss-winbind:amd64 (--configure):
 dependency problems - leaving unconfigured
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Errors were encountered while processing:
 winbind
 libpam-winbind:amd64
 libnss-winbind:amd64
```

affects: ubuntu-packaging-guide → samba (Ubuntu)
Revision history for this message
Noel McLoughlin (noelmcloughlin) wrote :

I retested on Ubuntu 16.04 (Samba/Winbind packages/services, Join Domain/Realm, start winbind) and works fine there.

Revision history for this message
Sebastien Bacher (seb128) wrote :

There are a bunch of closed report of similar issues where the maintainer focussed on why the job is failing, I think the bottom line there is that package installation shouldn't fail/let the packaging system in a corrupted state just because a service fails to start.

Changed in samba (Ubuntu):
importance: Undecided → High
summary: - Regression in winbind package postinstall script
+ Winbind failing to start leads to postinst erroring out
Revision history for this message
Noel McLoughlin (noelmcloughlin) wrote :

It seems to be pattern in Ubuntu that package maintainers can break OS:

https://bugs.launchpad.net/nginx/+bug/1512344?comments=all

How can this be escalated if maintainers do not address this?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Hello Noel, thanks for filing this bug in Ubuntu.

Both Debian and Ubuntu like to install services with a working default configuration, and it is expected that a service is running after it is installed. That's why winbind is started right after it is installed.

During upgrades, the same principle applies: in order to have the new version of the service available after an upgrade, it must be restarted. If the restart fails, it should be investigated.

When you say this:
"""
Ubuntu 18.04 is regression: The `winbind` package breaks APT/DPGK package manager because `/var/lib/dpkg/info/winbind.postinst` is trying to start the service - that's bad regression.
"""

Could you elaborate a bit on which steps you took for the winbind service to fail to run? The logs show it is complaining that it didn't join the domain, or somehow lost the secret.

As an example, I just did the following on a fresh bionic container:

sudo apt update
sudo apt dist-upgrade -y
sudo apt install samba winbind -y

And it worked just fine:
root@bionic-winbind:~# systemctl status winbind
● winbind.service - Samba Winbind Daemon
   Loaded: loaded (/lib/systemd/system/winbind.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2019-03-07 20:09:32 UTC; 18s ago
     Docs: man:winbindd(8)
           man:samba(7)
           man:smb.conf(5)
 Main PID: 2793 (winbindd)
   Status: "winbindd: ready to serve connections..."
    Tasks: 4 (limit: 4915)
   CGroup: /system.slice/winbind.service
           ├─2793 /usr/sbin/winbindd --foreground --no-process-group
           ├─2795 /usr/sbin/winbindd --foreground --no-process-group
           ├─2960 /usr/sbin/winbindd --foreground --no-process-group
           └─2961 /usr/sbin/winbindd --foreground --no-process-group

Can you please share your /etc/samba/smb.conf? The logs from /var/log/samba/log* would also help.

Changed in samba (Ubuntu):
status: New → Incomplete
importance: High → Undecided
Revision history for this message
Noel McLoughlin (noelmcloughlin) wrote : Re: [Bug 1818431] Re: Winbind failing to start leads to postinst erroring out
Download full text (8.1 KiB)

Hi Andreas,

Both Debian and Ubuntu like to install services with a working default
> configuration, and it is expected that a service is running after it is
> installed. That's why winbind is started right after it is installed.

I have three concerns with apt interfering with services-
- systems theory suggests package and service managers have different
functions.
- Mission critical systems design expects this.
- The pillar of Reactive Systems theory demands *non-blocking*
communication between subsystems - no polling!

The expectation "a service is running after it is installed" is not in
question - it is principle of least surprise.
I dislike fact apt is moving into user space daemon management space.
If a user space daemon fails - who cares! If a system management
technology breaks - everyone cares - it is a devastating event.

Regarding the specific Bug I will gather the information you requested.

Here the context is an Infrastructure-as-code (
https://github.com/saltstack-formulas/samba-formula) installation which is
heavily automated but joining the domain, for security reasons, is not
automated. Please let me know if anything jumps out. I will try to retest
when I get time.

thanks
Noel

-- <INFRA-AS-CODE start> ---
... automate dns
... automate chrony
... automate nss
... automate kerberos
... etc..

apt install -y samba
systemctl start samba
cp smb.conf.custom /etc/samba/smb.conf
systemctl start samba
apt-get install -y samba-client

apt install -y samba-winbind
apt install -y libpam-winbind smbldap-tools cifs-utils
cp winbind.conf.custom /etc/samba/winbind.conf
systemctl start winbind *fails on 16.04 but I have control and package
manager is not broken*

systemctl start nmb
systemctl start samba
systemctl start nmb
systemctl start winbind *never works*

--- <INFRA AS CODE end> ---

net ads join EXAMPLE.COM -U domainadmin
kinit -k HOST\$@EXAMPLE.COM
systemctl restart winbind *always works*

--- <INFRA AS CODE start> ---
automate Active Directory pam/nss
automate Citrix Linux VDA
--- <INFRA AS CODE end> ---

On Thu, Mar 7, 2019 at 8:20 PM Andreas Hasenack <email address hidden>
wrote:

> Hello Noel, thanks for filing this bug in Ubuntu.
>
> Both Debian and Ubuntu like to install services with a working default
> configuration, and it is expected that a service is running after it is
> installed. That's why winbind is started right after it is installed.
>
> During upgrades, the same principle applies: in order to have the new
> version of the service available after an upgrade, it must be restarted.
> If the restart fails, it should be investigated.
>
> When you say this:
> """
> Ubuntu 18.04 is regression: The `winbind` package breaks APT/DPGK package
> manager because `/var/lib/dpkg/info/winbind.postinst` is trying to start
> the service - that's bad regression.
> """
>
> Could you elaborate a bit on which steps you took for the winbind
> service to fail to run? The logs show it is complaining that it didn't
> join the domain, or somehow lost the secret.
>
> As an example, I just did the following on a fresh bionic container:
>
> sudo apt update
> sudo apt dist-upgrade -y
> sudo apt install samba winbind -y
>
> A...

Read more...

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

About the package vs system managers debate, I'll just comment that it's not that simple, and that a package that is yet to be installed might need a service running already, and if that service failed to start, what should it do? apt stopped (it didn't "break"), so that the admin could figure out what is wrong.

Regarding the steps you took, I can't troubleshoot your deployment system, but I do see that you are starting services several times, and updating config files in between. I would suggest the following:
- install packages you need
- update config files as you need them
- join domain if needed
- then (re)start services

What you are doing, but I don't have your config files at hand to confirm, is restart the services with a configuration that is not yet valid. In particular, winbind is expecting to be joined to a domain already in the config you give it, but when it looks for the secret, it can't find it.

Perhaps another option for you is to use policy-rc.d where you can create a policy of when services should be (re)started. See the "INIT SCRIPT POLICY" section in invoke-rc.d(8) manpage, or the /usr/share/doc/init-system-helpers/README.policy-rc.d.gz file (installed by the init-system-helpers package).

Revision history for this message
Noel McLoughlin (noelmcloughlin) wrote :
Download full text (6.6 KiB)

Hi Andreas,

If for ANY reason winbind service is not startable APT will be broken -
that's the regression this ticket must address.

The samba-winbind package has introduced hard dependency on external
systems during installation process.
1. systemd - a failure to start service will breaks apt package manager.
2. PDC/BDC - a timeout or connect reset breaks apt package manager.
3. Network availability

There is no debate - the reactive manifesto is clear that samba-winbind
should be designed per best practice.

https://www.reactivemanifesto.org/glossary#Non-Blocking
A non-blocking API to a resource allows the caller the option to do other
work rather than be blocked waiting on the resource to become available.

https://www.reactivemanifesto.org/glossary#Message-Driven
Responding to the failure of a component in order to restore its proper
function, on the other hand, requires a treatment of these failures that is
not tied to ephemeral client requests, but that responds to the overall
component health state

Please fix samba-winbind package to send non-blocking request systemd start
winbind.
Breaking apt is irresponsible (and legally unwise I imagine).
Start winbind service in not-blocking fashion so system can continue to
function.

best regards,
Noel

On Mon, Mar 11, 2019 at 8:50 PM Andreas Hasenack <email address hidden>
wrote:

> About the package vs system managers debate, I'll just comment that it's
> not that simple, and that a package that is yet to be installed might
> need a service running already, and if that service failed to start,
> what should it do? apt stopped (it didn't "break"), so that the admin
> could figure out what is wrong.
>
> Regarding the steps you took, I can't troubleshoot your deployment system,
> but I do see that you are starting services several times, and updating
> config files in between. I would suggest the following:
> - install packages you need
> - update config files as you need them
> - join domain if needed
> - then (re)start services
>
> What you are doing, but I don't have your config files at hand to
> confirm, is restart the services with a configuration that is not yet
> valid. In particular, winbind is expecting to be joined to a domain
> already in the config you give it, but when it looks for the secret, it
> can't find it.
>
> Perhaps another option for you is to use policy-rc.d where you can
> create a policy of when services should be (re)started. See the "INIT
> SCRIPT POLICY" section in invoke-rc.d(8) manpage, or the /usr/share/doc
> /init-system-helpers/README.policy-rc.d.gz file (installed by the init-
> system-helpers package).
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1818431
>
> Title:
> Winbind failing to start leads to postinst erroring out
>
> Status in samba package in Ubuntu:
> Incomplete
>
> Bug description:
> Ubuntu 16.04 just works: Winbind was a smooth experience. We install
> `winbind`, `libnss-winbind`, and `libpam-winbind` using APT
> successfully, join the Domain/Realm, and start winbind with systemd!
>
> Ubuntu 18.04 is regression: The `winbind` package breaks APT/DPGK
> ...

Read more...

Revision history for this message
Robie Basak (racb) wrote :
Download full text (3.9 KiB)

Thank you for taking the time to report this bug and helping to make Ubuntu better.

I accept that it is a problem that situations can arise when a service will fail to start (for example by system misconfiguration, or as a result of necessary changes to local customisations triggered by a release upgrade), and this causes apt to fail when maintainer scripts fail to start such a service.

The underlying principle that leads to this behaviour is the long standing philosophy that if a user installs a package, it is presumed that the user intends to use it, and so after installation is complete the service for which the user installed the package should be active with sensible defaults. This is why packages that provide services configure and start them by default.

As Andreas points out, you can control this behaviour using policy-rc.d, which is the mechanism provided to allow sysadmin override of service control behaviour.

In my view this policy causes difficulties in three cases:

1) On servers, a package's default behaviour often isn't useful, and it is expected that the user is a sysadmin who will configure the daemon. In this case, an automatic service start on package install just gets in the way. In my opinion, management tools (chef, puppet, ansible, etc) should therefore automatically use policy-rc.d on Debian and derivatives to provide more sensible default behaviour in their automation case, but they currently do not. You mention saltstack so that sounds like this problem applies here. This point however is a tangent to your assertions about service and package manager integration.

2) On server packages, it is fairly common for users to end up in a misconfigured state where a service cannot be started. This causes problems on package upgrades, since sometimes a user isn't impacted by the service start failure, but does get impacted by a subsequent service restart attempt on package upgrade unwinding through to an apt failure.

3) On release upgrades, intentional changes to how packages operate may invalidate previous local customisations, breaking services until the customisations are updated manually. This can be mitigated somewhat by smarter package upgrade paths, but fundamentally cannot be eliminated in the general case.

I therefore accept that "something needs to be done". However it is clearly a general problem and not one specific to winbind, and the right way to solve this is to find a general solution for individual packages to implement. The solution needs to come from the top, not from individual package maintenance changes on a piecemeal basis; otherwise we'll just end up with inconsistent behaviour and confuse users further.

In the meantime, nothing in particular has changed in the way Debian and Ubuntu work. Under the current design, it is still a local configuration problem that a service is enabled but fails to start. If this happens by default, then it is a bug in packaging. If it happens because of some local event, then it is something that is expected behaviour that needs to be addressed by the sysadmin. If your automation is not using policy-rc.d, then under the current distribution design your automation is ...

Read more...

Changed in samba (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Noel McLoughlin (noelmcloughlin) wrote :
Download full text (10.5 KiB)

Hi Andreas and Robie,

Thank you for thoroughly evaluating and investigating this ticket.

We both accept that situations arise when a service cannot start after
package install/upgrade. For kernel-space service I think the maintainer
script should poll and abort in that case. For user-space daemons ideally
the maintainer scripts are cognizant to the reality that a failed service,
while not ideal, is an acceptable state for a user space daemon. I fully
support the underlying philosophy that both you and Andreas have
communicated - ideally the service gets started. There is at least one
important exception to this philosophy where we want to migrate from
provider A to provider B (i.e. apache to nginx) in a rolling fashion and
the workflow maybe a stepped process of install B package while service A
is still running (and locking resources needed for service B), migrate the
configuration, stop service A, start service B, uninstall package A. This
is a corner case which I encountered with legacy linux systems.

I presume the policy-rc.d mechanism holds across Debian-derivative systems
- we have a specific saltstack bug tracking this issue - so this is
possible workaround. I need to check that this mechanism is well
documented because I was unable to find this API when searching for a
workaround to this behaviour. I never saw this issue with Samba-Winbind in
Ubuntu 16.04.

We both accept that this policy causes difficulties in at least the three
cases you have listed. Thanks for the detailed summarization of these
issues.

On the basis that this is general issue I accept that technically marking
this ticket (and a corresponding nginx ticket) as invalid works at
maintainer and package level. However please understand this behaviour can
be frustrating in the wider scheme of things when we want to design
resilient systems where failures are expected. I have not looked at the
implementation in detail with DEP packages but would prefer and encourage
non-blocking checks by maintainer scripts so apt continues to work.

I wanted to ask if there was a Technical Steering Committee (TSC) this
matter could be raised with. I lack the bandwidth to pursue this matter at
TSC right now - my FOSS work is voluntary - but for the future what is the
best way to NOT forget this issue. It's frustrating from Engineer
perspective.

thanks
Noel

On Wed, Mar 20, 2019 at 6:50 AM Robie Basak <email address hidden>
wrote:

> Thank you for taking the time to report this bug and helping to make
> Ubuntu better.
>
> I accept that it is a problem that situations can arise when a service
> will fail to start (for example by system misconfiguration, or as a
> result of necessary changes to local customisations triggered by a
> release upgrade), and this causes apt to fail when maintainer scripts
> fail to start such a service.
>
> The underlying principle that leads to this behaviour is the long
> standing philosophy that if a user installs a package, it is presumed
> that the user intends to use it, and so after installation is complete
> the service for which the user installed the package should be active
> with sensible defaults. This is why packages that provide ...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.