winbind does not work after reboot on Mint 19 / Ubuntu 18.04

Bug #1789097 reported by Rene Herman
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
samba
Unknown
Unknown
samba (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

[copied from the Linux Mint forum]

Just installed Mint 19 and noticed a WINS name resolution buglet. I assume this applies to Ubuntu 18.04 as well.

Windows, i.e., NetBIOS, name resolution is on Linux provided for by the "winbindd" daemon, part of the Samba suite. One does not need either of the other two Samba daemons "smbd" and "nmbd" when NetBIOS name resolution is all you need: sudo apt-get install libnss-winbind (which additionally pulls in "winbind" itself) and adding "wins" before "dns" to the "hosts" line of /etc/nsswitch.conf is enough.

While this works fine directly after installation it does no more after reboot due to a systemd unit file dependency issue. The standard /lib/systemd/system/winbind.service orders itself after "network.target" and "nmbd.service" which if you do not in fact have "samba" hence "nmbd.service" installed amounts to "network.target" only. It however needs "network-online.target" -- itself a dependency of "smbd.service" if you do have that installed as well.

You can solve things by copying the unit file to its corresponding directory under /etc and editing it,

  sudo cp /{lib,etc}/systemd/system/winbind.service
  xed admin:/etc/systemd/system/winbind.service

to change the line

  After=network.target nmbd.service

to the two lines

  After=network-online.target nmbd.service
  Wants=network-online.target

After save and reboot you will have NetBIOS name resolution functional without all of the rest of Samba running.

Note that although this fix is not needed when you do have "samba" itself installed the above is still fine also the

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Thanks for filing this bug in Ubuntu.

What do you mean by "does not work", exactly?

Doesn't the network come up eventually? Or is this a desktop machine where the network is wireless and only available after someone logs in?

Please show exactly what is not working, your configuration files, and also please provide logs.

Changed in samba (Ubuntu):
status: New → Incomplete
Revision history for this message
Rene Herman (rene.herman) wrote :

Yes, the network itself comes up fine. "Does not work" simply means that NetBIOS name resolution is unavailable even after it does; that e.g. "ping NAS" for "NAS" the NetBIOS name of my, well, NAS, tells me the name cannot be resolved.

There would not appear to be relevant logs nor configuration files. The issue is simply that with "After=network.target" rather then "After=network-online.target" the winbindd daemon is started too early; does start, but does not work. Needs to be restarted for it to start to work.

I am as mentioned on Linux Mint 19 but would be surprised if this were not the case on Ubuntu 18.04 itself as well. This is a standard, wired desktop without even wireless available.

It is easy to reproduce: have the "winbind" and "libnss-winbind" packages installed but not "samba" itself, enable WINS name resolution in /etc/nsswitch.con as detailed and as normal, and reboot.

As also mentioned, installing "samba" itself hides the issue by winbind having a dependency on nmbd and nmbd in turn on network-online.target; without "samba", we appear to have a simple systemd bootup race, solved by having it depend on network-online.target itself.

Revision history for this message
Rene Herman (rene.herman) wrote :

Just to add: the issue's the same and fully constant on an Intel Xeon E5606 @2,13GHz with 8GiB running Mint 19 Cinnamon 64-bit, and an Intel Core 2 Duo E8400 @ 3GHz withj 4GiB running Mint 19 Xfce 64-bit.

That is... it's not a subtle race and I assume anyone will see this if installing only "winbind" and not the rest of "samba".

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm trying to check why winbind won't recover from the network being down, so I setup a lxd container where I installed just winbind and the nss module.

My /etc/nsswitch.conf reads:
hosts: files wins dns

(I would have ordinarily put dns before wins, but ok, let's try to reproduce this)

smb.conf has:
[global]
...
        wins server = 10.10.222.254

That IP is another samba server I have.

I then disabled dhcp on eth0, and rebooted the container. It came back up with no IP on eth0. winbind was running, and name resolution via wins was obviously not working. I logged in using "lxc exec <containername> bash" instead of ssh).

I then just ran "dhclient eth0", and retried the name resolution (ping -c <netbiosname> was my test), and it worked right away. So it was able to use the newly functional interface.

I didn't see anything out of the ordinary in winbind's logs.

I'll try again with a vm, which should be a bit slower than a lxd container. And maybe also try without a network card at all, then add one, and see if winbind needs to be restarted.

In the meantime, can you perhaps grab me some winbind logs during a machine boot showing the problem? Maybe add "debug level = 3" or 5 to smb.conf as well.

Revision history for this message
Rene Herman (rene.herman) wrote :
Download full text (4.4 KiB)

Thanks much for checking. I see; I was expecting this to be an "immediately obvious" sort of thing to someone with unlike me an actual clue about samba/winbind, but I have with the above guidance nos supplied more information.

Specifically, it would appear that your explicitly mentioned "wins server" configuration will be the difference. I never expected my /etc/samba/smb.conf to be relevant since mine is fully vanilla, as provided by the "samba-common" package. It is also attached, but:

rene@t5500:~$ debsums samba-common | grep /usr/share/samba/smb.conf
/usr/share/samba/smb.conf OK
rene@t5500:~$ cmp /usr/share/samba/smb.conf /etc/samba/smb.conf
rene@t5500:~$

This is to say that I do not in fact have a WINS server specified; that judging by the logs my winbind is relying on broadcasts, and bombing out in that case if started before network-online.target.

There's two logs attached with "debug level = 5", one log.winbindd-nonworking and one log.winbindd-working, in which "nonworking" is without /etc/systemd/system/winbind.service (i.e., default) and working with.

In the nonworking situation the log shows:

[2018/09/05 22:44:21.132692, 0] ../lib/util/become_daemon.c:124(daemon_ready)
  STATUS=daemon 'winbindd' finished starting up and ready to serve connections
[2018/09/05 22:44:42.487584, 3] ../source3/winbindd/winbindd_misc.c:395(winbindd_interface_version)
  [ 1917]: request interface version (version = 29)
[2018/09/05 22:44:42.487775, 3] ../source3/winbindd/winbindd_misc.c:428(winbindd_priv_pipe_dir)
  [ 1917]: request location of privileged pipe
[2018/09/05 22:44:42.487995, 3] ../source3/winbindd/winbindd_wins_byname.c:56(winbindd_wins_byname_send)
  [ 1917]: wins_byname WD-NETCENTER
[2018/09/05 22:44:42.488050, 3] ../source3/libsmb/namequery.c:2142(resolve_wins_send)
  resolve_wins: WINS server resolution selected and no WINS servers listed.
[2018/09/05 22:44:42.488086, 3] ../source3/libsmb/namequery.c:1880(name_resolve_bcast_send)
  name_resolve_bcast: Attempting broadcast lookup for name WD-NETCENTER<0x20>

.... and silence after that, whereas in the working situation:

[2018/09/05 22:39:51.454142, 0] ../lib/util/become_daemon.c:124(daemon_ready)
  STATUS=daemon 'winbindd' finished starting up and ready to serve connections
[2018/09/05 22:40:41.498139, 3] ../source3/winbindd/winbindd_misc.c:395(winbindd_interface_version)
  [ 1992]: request interface version (version = 29)
[2018/09/05 22:40:41.498346, 3] ../source3/winbindd/winbindd_misc.c:428(winbindd_priv_pipe_dir)
  [ 1992]: request location of privileged pipe
[2018/09/05 22:40:41.498570, 3] ../source3/winbindd/winbindd_wins_byname.c:56(winbindd_wins_byname_send)
  [ 1992]: wins_byname WD-NETCENTER
[2018/09/05 22:40:41.498627, 3] ../source3/libsmb/namequery.c:2142(resolve_wins_send)
  resolve_wins: WINS server resolution selected and no WINS servers listed.
[2018/09/05 22:40:41.498676, 3] ../source3/libsmb/namequery.c:1880(name_resolve_bcast_send)
  name_resolve_bcast: Attempting broadcast lookup for name WD-NETCENTER<0x20>
[2018/09/05 22:40:41.499722, 4] ../source3/libsmb/nmblib.c:108(debug_nmb_packet)
  nmb pac...

Read more...

Revision history for this message
Rene Herman (rene.herman) wrote :
Revision history for this message
Rene Herman (rene.herman) wrote :
Revision history for this message
Rene Herman (rene.herman) wrote :
Revision history for this message
Rene Herman (rene.herman) wrote :
Revision history for this message
Rene Herman (rene.herman) wrote :

Oh, and the mentioned vanilla smb.conf...

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

> This is to say that I do not in fact have a WINS server specified; that judging by the logs
> my winbind is relying on broadcasts, and bombing out in that case if started before
> network-online.target.

Ok, reproduced. When there is no wins server specified, somehow winbind doesn't "detect" the interface when it comes up, and the broadcasts it sends go nowhere. I'll if there is something upstream about this behavior.

Changed in samba (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → Medium
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

basically if the interface is available when winbind starts, winbind keeps working if it's brought down and up again. But if winbind starts when the interface isn't available, then it won't recover without a restart. That's my finding so far.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ah, and that's only in the "no wins specified" case, otherwise it recovers just fine.

Revision history for this message
Rene Herman (rene.herman) wrote :

As to the reply you got on https://bugzilla.samba.org/show_bug.cgi?id=13607, I can deny a "net cache flush" doing anything to get WINS name resolving going for me when booted with the original service file:

rene@t5500:~$ ping WD-NETCENTER
ping: WD-NETCENTER: Name or service not known
rene@t5500:~$ net cache flush
rene@t5500:~$ ping WD-NETCENTER
ping: WD-NETCENTER: Name or service not known

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

As a workaround, could you try creating this file: /usr/lib/networkd-dispatcher/routable.d/10-winbind

with these contents:

#!/bin/sh

pid=$(systemctl show winbind -p MainPID --value)
if [ "$pid" -ne "0" ]; then
    kill -HUP $pid
fi

Then make it executable: sudo chmod +x /usr/lib/networkd-dispatcher/routable.d/10-winbind

That's assuming you have the networkd-dispatcher package installed. It's not mandatory, but likely that you have it. If you don't, then please install it.

Revision history for this message
Rene Herman (rene.herman) wrote :

Actually, no, that doesn't work for me. Was expecting it would given the comments on the samba bug, but, well, no. It *does* work to send HUP once booted:

rene@t5500:~$ ping WD-NETCENTER
ping: WD-NETCENTER: Name or service not known
rene@t5500:~$ sudo kill -HUP $(systemctl show winbind -p MainPID --value)
rene@t5500:~$ ping WD-NETCENTER
PING WD-NETCENTER (192.168.1.33) 56(84) bytes of data.
64 bytes from fs7-netcenter (192.168.1.33): icmp_seq=1 ttl=64 time=1.86 ms

It seems the networkd-dispatcher script is still too early.

Must by the way also admit I'd maybe not consider it a better workaround even it if did work than just waiting for network-online.target; nmbd as mentioned already does as well, so only people with minimal Windows-networking needs or wants would get any potential benefit from NOT simply waiting for network-online.target.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Sure.

Let's see what upstream comes up with after the last round of comments there. If it's a patch that can be backported to the bionic version, we might go that route. If not, then adding the network-online target it is.

Revision history for this message
Rene Herman (rene.herman) wrote :

Upstream seems silent; bug remains in NEW state and no follow-up.

Revision history for this message
Rene Herman (rene.herman) wrote :

So glad I took the time to report this.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I come by trying to close or revive bugs that were dormant for too long.

As Rene already quotes - upstream seems to have dropped the ball.
Together with Andreas they have found that a SIGHUB will make it rescan the interfaces which makes it work. But the follow on that upstream intended "If that works, it should be possible to add a netlink listener that triggers the scan whenever interfaces come and go." didn't seem to happen.

FYI - for some remaining hope I also Pinged the upstream case.

P.S. In addition this is another aspect of the general (and never ending) online ordering context, tagging it as that.

[1]: https://bugs.launchpad.net/ubuntu/+bugs?field.tag=network-online-ordering

tags: added: network-online-ordering
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.