winbind forgets uid/name gid/name mappings in regulary periods

Bug #207791 reported by Torsten Krah
28
This bug affects 2 people
Affects Status Importance Assigned to Milestone
samba (CentOS)
New
Undecided
Unassigned
samba (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Binary package hint: winbind

Version 3.0.26a-1ubuntu2.3, Gutsy.

I am memeber of a domain, wbinfo -u and wbinfo -g are displaying users+groups fine.
nssswitch isconfigured lile that:

passwd: compat winbind
group: compat winbind
shadow: compat winbind

Using getent passwd "domainuser" does succeed, id "domainuser" too.

However, after 10 minutes (this times varies), doing a "ls -l file" does only show the uid of the user/group, the names are non persistent anymore.
I than have to do a manual refresh with getent or id (they succeed the first time) and it works again, username + groupnames are there.
Thats really annoying because many tools and daemons are relying on persistent users/groups, not only the uids.
Don't know why they get lost from time to time, but it happens everytime.
Using nscd does not help.
My workaround for now is to have a local copy in /etc/passwd, but not shadow, to have a persistent uid/name mapping.

smb.conf winbind parameters:

        winbind cache time = 10
        winbind refresh tickets = true
        winbind enum users = yes
        winbind enum groups = yes
        winbind use default domain = Yes
        winbind offline logon = true

Any things i can do, tweaks, changes or something?

kind regards

Torsten

Revision history for this message
Chuck Short (zulcss) wrote :

Have you considered to upgrade to hardy and use likewise-open?

Thanks
chuck

Changed in samba:
status: New → Incomplete
Revision history for this message
Torsten Krah (tkrah) wrote :

If hardy gets released this may be an option, but not yet. So what to do until then?

Revision history for this message
Peter Parzer (peter-parzer) wrote :

The bug is still there in Hardy-beta and likewise-open

Revision history for this message
Torsten Krah (tkrah) wrote :

Bug is still there in Hardy Release - it doesn't matter if winbind or likewise-open is used.

Revision history for this message
Brandon Perry (bperry-volatile) wrote :

This bug is still present in Intrepid as of 10/08/2008.

I have no name!@station-17:~$ dpkg -l | grep winbind
ii libwbclient0 2:3.2.3-1ubuntu2 client library for interfacing with winbind
ii winbind 2:3.2.3-1ubuntu2 service to resolve user and group informatio
[1]+ Done epiphany-webkit
I have no name!@station-17:~$ apt-cache policy winbind
winbind:
  Installed: 2:3.2.3-1ubuntu2
  Candidate: 2:3.2.3-1ubuntu2
  Version table:
 *** 2:3.2.3-1ubuntu2 0
        500 http://us.archive.ubuntu.com intrepid/main Packages
        100 /var/lib/dpkg/status
I have no name!@station-17:~$

Revision history for this message
wiggles (tim-wielgos) wrote :

I'm having similar issues with RHEL and CentOS as well.

In my logs, I'm seeing messages like this, over and over again, every 10 minutes. It makes me wonder if this is part of the problem:

[2008/10/15 20:26:44, 0] lib/util_sid.c:string_to_sid(242)
  string_to_sid: Sid S-0-0 is not in a valid format.

Revision history for this message
Torsten Krah (tkrah) wrote :

I did not see such messages in my logs but still have this issue with intrepid.

Revision history for this message
Jerome Haltom (wasabi) wrote :

The ones of you having this issue: Are you by chance using the SFU idmap? I've isolated my problem down to that I believe.

If not, see if you can grab the stack trace. The samba panic action should put it in the log files.

Revision history for this message
Torsten Krah (tkrah) wrote :

I did not install SFU on my AD Domain Controller.
I am not using samba/winbindd but likewise-open like suggested - how to make a stack trace if i can not "produce" this failure by command?

Revision history for this message
Heikki Manninen (hma-iki) wrote :

I'm too having the same problem. Now with Intrepid, had it also with Hardy. Using AD 2003 + SFU.

At the moment I'm using Samba 3.2.4 packages from Debian Sid to overcome the winbind crashing problem that is still present in Intrepids 3.2.3.

This problem exist with all three Samba versions (Hardy, Intrepid, Sid).

Revision history for this message
Heikki Manninen (hma-iki) wrote :

Still having this problem with a freshly installed & fully up to date Jaunty. AD2k3 + SFU + idmap_ad.

Thierry Carrez (ttx)
Changed in samba (Ubuntu):
status: Incomplete → New
Chuck Short (zulcss)
Changed in samba (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Drew Scott Daniels (drewdaniels) wrote :

uid/name gid/name mappings depend on the backend chosen. If the backend doesn't cache or calculate the uids and gids then it's likely trying to query a Domain Controller.

   1. Check your idmap backend setup in /etc/samba/smb.conf
   2. Check /var/log/samba/log.winbind* for relevant errors/warnings
   3. If using rid or ads as the backend, try to find out if you can still query the domain controller with wbinfo -u and wbinfo -g. You may need to check klist, net ads status, net ads info to see if your kerberos key didn't get renewed. Some of this should be run under sudo with an Active Directory (AD) authenticated user.
   4. Try re-logging in with an AD user and see if the problem is fixed. If so, it might be that a new key was issued.

I kind of wonder if the winbind refresh tickets option isn't working for some reason.

     Drew Daniels
Resume: http://www.boxheap.net/ddaniels/resume.html

Revision history for this message
Torsten Krah (tkrah) wrote :

   1. Check your idmap backend setup in /etc/samba/smb.conf

Checked - ok here.

   2. Check /var/log/samba/log.winbind* for relevant errors/warnings

No errors or warnings.

   3. If using rid or ads as the backend, try to find out if you can still query the domain controller with wbinfo -u and wbinfo -g. You may need to check klist, net ads status, net ads info to see if your kerberos key didn't get renewed. Some of this should be run under sudo with an Active Directory (AD) authenticated user.

Yes i can query - and after issuing wbinfo -u or wbinfo -g or getent my "name, uid mappings" are known again. But thats the problem - those infos should not be "lost" (see original report above) - my ticket was still valid >9 hours.

   4. Try re-logging in with an AD user and see if the problem is fixed. If so, it might be that a new key was issued.

Yeah of cause relogin does fix - but it will happen again after a short time - that won't fix the issue reported.

Revision history for this message
Drew Scott Daniels (drewdaniels) wrote :

Hi Torsten Krah,
Thanks for your friendly reply. Part of the reason I asked the questions was for other readers of the bug to be able to diagnose similar problems.

Also, for more detailed debugging, here's a link to the current development version 3 source code:
http://gitweb.samba.org/?p=samba.git;a=tree;f=source3/winbindd;hb=HEAD

I don't know what's going to happen to winbindd with samba4's new code.

I'm guessing that the expiration and failure to re-get the UID mapping can be seen at the top level in:
http://gitweb.samba.org/?p=samba.git;a=blob;f=source3/winbindd/wb_uid2sid.c;hb=HEAD
The only thing that might show up in a log at this level (or from this file anyway) looks to be:
  50 DEBUG(10, ("idmap_cache_find_uid2sid found %d%s\n",
  51 (int)uid, expired ? " (expired)": ""));

   1 Any chance you see the above log line?
Unless I misunderstand, the cache is expired, and requesting a mapping fails. That means the following is executed:

  66 for (domain = domain_list(); domain != NULL; domain = domain->next) {
  67 if (domain->have_idmap_config
  68 && (uid >= domain->id_range_low)
  69 && (uid <= domain->id_range_high)) {
  70 state->dom_name = domain->name;
  71 break;
  72 }
  73 }
  74
  75 child = idmap_child();
  76
  77 subreq = rpccli_wbint_Uid2Sid_send(
  78 state, ev, child->rpccli, state->dom_name,
  79 uid, &state->sid);
  80 if (tevent_req_nomem(subreq, req)) {
  81 return tevent_req_post(req, ev);
  82 }
  83 tevent_req_set_callback(subreq, wb_uid2sid_done, req);
  84 return req;
  85 }

So I'm guessing that means that the domain name isn't found, or there's a problem with rpccli_wbint_Uid2Sid_send(), but there are a few other possibilities. The next steps would be to:
   * Check if upstream's got any new information
   * Look at rpccli_wbint_Uid2Sid_send()
   * Check how the cache is initially populated and check if it's different code than the above code.
   * Test with caching disabled if possible.
   * Test in "offline" mode if possible.

I haven't looked to see if this bug is filed upstream, or if it's mentioned on a mailing list of theirs. If it is, then any links would be nice.

     Drew Daniels
Resume: http://www.boxheap.net/ddaniels/resume.html

Revision history for this message
Chuck Short (zulcss) wrote :

Hi,

I was wondering if you were still having this problem.

Regards
chuck

Revision history for this message
Peter Parzer (peter-parzer) wrote :

I did not have this problem anymore with Ubuntu 9.10 and winbind 2:3.4.0-3ubuntu5.4

Revision history for this message
Thierry Carrez (ttx) wrote :

Closing based on previous comment, please reopen if you can reproduce in the current development version.

Changed in samba (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Drew Scott Daniels (drewdaniels) wrote :

I would like to know how the issue can be fixed (package upgrade to what version?).

Torsten Krah indicated this was a problem as of 2010-02-23, there has been no new release of winbind since 2010-02-02, and that release didn't mention any issues like this.

In my research, I also found that some idmap backends don't support certain features. There was an upstream documentation bug discussed recently.

Thanks,

     Drew Daniels
Resume: http://www.boxheap.net/ddaniels/resume.html

Revision history for this message
Thierry Carrez (ttx) wrote :

Torsten reported the issue against hardy and was still having it as of 2010-02-23, since the hardy package wasn't updated. The FixReleased status echoes the fact that the current development version (Lucid) is believed to be fixed.

Revision history for this message
Heikki Manninen (hma-iki) wrote :

Winbind still fails on 9.04, 9.10 and 10.04 beta 1.

While everything works just fine on RHEL/CentOS 5.4 with Samba/Winbind 3.4.7.

Revision history for this message
Drew Scott Daniels (drewdaniels) wrote :

To people saying they have related bugs:

This bug is for the idmap working fine on initial login, but the mapping being lost after the cache time is exceeded (with no other related winbind activity like authentications/logins). The id mapping can be seen (or seen as failed) using "ls" in a directory with files owned by an active directory user (success shows names, failure shows numbers).

Preferably in a single comment please answer the following questions/steps:
0. Please file a separate bug if it's a separate issue.
1. If you find a configuration or specific set of versions that work, please list all the differences.
2. Please try everything below with a fresh install of lucid on a separate non-production system, if possible.
3. Please list the most recent package versions (e.g.: "dpkg -l|grep -i samba", and maybe other packages. ubuntu-bug might help). It'd also be useful to list your distribution even if it's clear from the version numbers, just to save time looking it up.
4. Please list relevant configuration options (e.g. both winbind and idmap sections of /etc/samba/smb.conf and maybe more. ubuntu-bug might post the entire configuration file).
5. Check the log files for related information. /var/log/samba/log.winbind* might be more useful than some of the other log files. Post anything that might be relevant.
6. If using rid or ads as the backend, try to find out if you can still query the domain controller with wbinfo -u and wbinfo -g. You may need to check klist, net ads status, net ads info to see if your kerberos key didn't get renewed. Some of this should be run under sudo with an Active Directory (AD) authenticated user. Consider posting some of the output.
7. Try disabling the cache. Maybe try both "winbind cache time 0 " in smb.conf and with the line missing if you're not sure which disables the cache. Post to the bug the results of trying to get the mapping (e.g. by ls on a file owned by an Active Directory mapped user).
8. Try "winbind offline logon = false" in smb.conf and post the results of before and after cache timeout.
9. Post any information you can about the cache and mapping files. This could be a tbl file. The log files might give some information about this.
10. List whether you did a fresh install of Ubuntu or an upgrade. If it was an upgrade, what version(s) did you upgrade from?
11. Did you try any other idmap backends? If so, please list which ones and what order. I believe there might be a bug on switching backends without deleting a mapping file.

Thanks,

     Drew Daniels
Resume: http://www.boxheap.net/ddaniels/resume.html

Revision history for this message
Torsten Krah (tkrah) wrote :
Download full text (3.9 KiB)

2. Please try everything below with a fresh install of lucid on a separate non-production system, if possible.

Tried a fresh install of Natty (11.04) and its even more worse now - i am seing this one more than 10 times a day and its "nearly" reproducable. All i have to do is to open say 20 gnome terminals - the last one will have this:

Ich habe keinen Benutzernamen!@sf050:~$

3. Please list the most recent package versions (e.g.: "dpkg -l|grep -i samba", and maybe other packages. ubuntu-bug might help). It'd also be useful to list your distribution even if it's clear from the version numbers, just to save time looking it up.

ii samba 2:3.5.8~dfsg-1ubuntu2.2 SMB/CIFS file, print, and login server for Unix
ii samba-common 2:3.5.8~dfsg-1ubuntu2.2 common files used by both the Samba server and client
ii samba-common-bin 2:3.5.8~dfsg-1ubuntu2.2 common files used by both the Samba server and client
ii samba-tools 2:3.5.8~dfsg-1ubuntu2.2 Samba testing utilities
krah@sf050:~$ dpkg -l | grep winbind
ii libwbclient0 2:3.5.8~dfsg-1ubuntu2.2 Samba winbind client library
ii winbind 2:3.5.8~dfsg-1ubuntu2.2 Samba nameservice integration server

4. Please list relevant configuration options (e.g. both winbind and idmap sections of /etc/samba/smb.conf and maybe more. ubuntu-bug might post the entire configuration file).

Same configuration like initial report (excerpt):

        security = ADS
        idmap backend = rid:FRIENDS=10000-20000
        idmap uid = 10000-20000
        idmap gid = 10000-20000
 winbind cache time = 300
 winbind refresh tickets = true
 winbind enum users = yes
 winbind enum groups = yes
 winbind use default domain = Yes
 winbind offline logon = true

5. Check the log files for related information. /var/log/samba/log.winbind* might be more useful than some of the other log files. Post anything that might be relevant.

Log is empty - at least with configuration above.

6. If using rid or ads as the backend, try to find out if you can still query the domain controller with wbinfo -u and wbinfo -g.
You may need to check klist, net ads status, net ads info to see if your kerberos key didn't get renewed. Some of this should be run under sudo with an Active Directory (AD) authenticated user. Consider posting some of the output.

Yes i can still query the AD. wbinfo -u and wbinfo -g does work as does getent or id.

7. Try disabling the cache. Maybe try both "winbind cache time 0 " in smb.conf and with the line missing if you're not sure which disables the cache. Post to the bug the results of trying to get the mapping (e.g. by ls on a file owned by an Active Directory mapped user).

Using cache time = 0 it completely fails to get the mapping via wbinfo -u or wbinfo -g. (Error looking up domain users). If i set it to at least 1 winbind is able to get the mappings (strange)!
The only message in the logs i've seen so far is:

[2011/05/19 ...

Read more...

Changed in samba (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Torsten Krah (tkrah) wrote :

Beside natty i am seeing this on a fresh lucid install too.

error view (winbind did forget the name):

-rw-rwxr-- 1 11107 praktikanten 63K 2007-01-19 17:11 2006.xls
-rw-rwxr-- 1 11107 praktikanten 115K 2007-06-01 12:01 2007.xls

renewed the infos via `id myuser` (which is the one with UID 11107):

-rw-rwxr-- 1 myuser praktikanten 63K 2007-01-19 17:11 2006.xls
-rw-rwxr-- 1 myuser praktikanten 115K 2007-06-01 12:01 2007.xls

Torsten

Revision history for this message
Adam Mielke (f96x) wrote :

This thread hasn't seen any activity in several months, but I discovered it today after suffering from the same problem for the past few weeks on Lucid. After some trial and error I was able to resolve it. Winbind was unable to translate uids/gids into SIDs, but it could convert usernames into SIDs. The fix was to modify the idmap config from this syntax:

idmap backend = rid:FRIENDS=10000-20000

To this syntax:

idmap backend = tdb
idmap config FRIENDS : backend = rid
idmap config FRIENDS : range = 10000-20000

And voila, winbind works correctly.

Cheers,

Adam Mielke
Research Computing and Engineering
College of Liberal Arts
University of Minnesota

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.