[SRU] winbind coredumps when encountering a group with over 1000 members

Bug #970679 reported by Seb Harrington on 2012-04-01
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
samba
Fix Released
High
samba (Ubuntu)
High
James Page
Precise
High
James Page
Quantal
High
James Page

Bug Description

Impact:
winbind coredumps when encountering a group with more that 1000 members - this renders winbind unusable in deployments with > 1000 users in a single group.

Development Fix:
Cherry picked patch from upstream VCS - this fix should be included in 3.6.6.
Fix ensures that hunks of 1000 entries processed in winbind line up with talloc memory handling preventing the crash.

Stable Fix:
Cherry picked patch from upstream VCS - see comments in Development fix.

Test Case:
NOTE - hard to reproduce as requires deployment with large number of users/groups.
Configure winbind to communicate with a Domain Controller with more that 1000 users
getent group groupWithLessThan1000Members - OK
getent group groupWithMoreThan1000Members - HANGS (coredumps recorded in syslog).

Regression Potential:
Minimal - patch has been committed upstream and should be released in Samba 3.6.6.

Original Bug Report:

Samba 3.6.3 precise

winbind works as expected with groups with < 1000 members, core dumps when encountering groups with > 1000 members.

e.g. getent group groupWithLessThan1000Members returns expected results

getent group groupWithMoreThan1000Members hangs at CLI whilst winbind coredumps in the background and eventually returns nothing, however this can be found in syslog

Apr 1 02:00:56 fs1 winbindd[1506]: [2012/04/01 02:00:56.252483, 0] ../lib/util/debug.c:413(talloc_log_fn)
Apr 1 02:00:56 fs1 winbindd[1506]: Bad talloc magic value - unknown value
Apr 1 02:00:56 fs1 winbindd[1506]: [2012/04/01 02:00:56.255072, 0] lib/util.c:1117(smb_panic)
Apr 1 02:00:56 fs1 winbindd[1506]: PANIC (pid 1506): Bad talloc magic value - unknown value
Apr 1 02:00:56 fs1 winbindd[1506]: [2012/04/01 02:00:56.282138, 0] lib/util.c:1221(log_stack_trace)
Apr 1 02:00:56 fs1 winbindd[1506]: BACKTRACE: 20 stack frames:
Apr 1 02:00:56 fs1 winbindd[1506]: #0 /usr/sbin/winbindd(log_stack_trace+0x1a) [0x7f4dab7704ca]
Apr 1 02:00:56 fs1 winbindd[1506]: #1 /usr/sbin/winbindd(smb_panic+0x25) [0x7f4dab7705a5]
Apr 1 02:00:56 fs1 winbindd[1506]: #2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2(talloc_strdup+0x299) [0x7f4da95ab429]
Apr 1 02:00:56 fs1 winbindd[1506]: #3 /usr/sbin/winbindd(+0x4edb5d) [0x7f4dabab9b5d]
Apr 1 02:00:56 fs1 winbindd[1506]: #4 /usr/sbin/winbindd(dcerpc_lsa_lookup_sids3+0x2e) [0x7f4dababa24e]
Apr 1 02:00:56 fs1 winbindd[1506]: #5 /usr/sbin/winbindd(winbindd_lookup_sids+0x116) [0x7f4dab6b7306]
Apr 1 02:00:56 fs1 winbindd[1506]: #6 /usr/sbin/winbindd(+0xeefa2) [0x7f4dab6bafa2]
Apr 1 02:00:56 fs1 winbindd[1506]: #7 /usr/sbin/winbindd(+0xd9be2) [0x7f4dab6a5be2]
Apr 1 02:00:56 fs1 winbindd[1506]: #8 /usr/sbin/winbindd(_wbint_LookupGroupMembers+0x5e) [0x7f4dab6c497e]
Apr 1 02:00:56 fs1 winbindd[1506]: #9 /usr/sbin/winbindd(+0x1029b4) [0x7f4dab6ce9b4]
Apr 1 02:00:56 fs1 winbindd[1506]: #10 /usr/sbin/winbindd(winbindd_dual_ndrcmd+0xbc) [0x7f4dab6c3f6c]
Apr 1 02:00:56 fs1 winbindd[1506]: #11 /usr/sbin/winbindd(+0xf6cb4) [0x7f4dab6c2cb4]
Apr 1 02:00:56 fs1 winbindd[1506]: #12 /usr/sbin/winbindd(+0xf7765) [0x7f4dab6c3765]
Apr 1 02:00:56 fs1 winbindd[1506]: #13 /usr/sbin/winbindd(tevent_common_loop_immediate+0xe2) [0x7f4dab781e92]
Apr 1 02:00:56 fs1 winbindd[1506]: #14 /usr/sbin/winbindd(run_events_poll+0x48) [0x7f4dab77ff88]
Apr 1 02:00:56 fs1 winbindd[1506]: #15 /usr/sbin/winbindd(+0x1b43a6) [0x7f4dab7803a6]
Apr 1 02:00:56 fs1 winbindd[1506]: #16 /usr/sbin/winbindd(_tevent_loop_once+0x90) [0x7f4dab780fb0]
Apr 1 02:00:56 fs1 winbindd[1506]: #17 /usr/sbin/winbindd(main+0x78b) [0x7f4dab699a3b]
Apr 1 02:00:56 fs1 winbindd[1506]: #18 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f4da8bd376d]
Apr 1 02:00:56 fs1 winbindd[1506]: #19 /usr/sbin/winbindd(+0xcde91) [0x7f4dab699e91]
Apr 1 02:00:56 fs1 winbindd[1506]: [2012/04/01 02:00:56.282756, 0] lib/fault.c:372(dump_core)
Apr 1 02:00:56 fs1 winbindd[1506]: dumping core in /var/log/samba/cores/winbindd
Apr 1 02:00:56 fs1 winbindd[1506]:
Apr 1 02:03:57 fs1 winbindd[1163]: [2012/04/01 02:03:57.387585, 0] winbindd/winbindd_util.c:330(trustdom_list_done)
Apr 1 02:03:57 fs1 winbindd[1163]: Got invalid trustdom response

Fix submitted here: https://bugzilla.samba.org/show_bug.cgi?id=8807 ubuntu version probbably just needs patching.

Changed in samba (Ubuntu):
importance: Undecided → High
status: New → Triaged

Because of the Domain Users group being the default primary group in Active Directory, any domain with more than 1000 users will contain such a group and be affected by this.

James Page (james-page) on 2012-05-02
Changed in samba (Ubuntu Precise):
status: New → Triaged
importance: Undecided → High

I applied https://attachments.samba.org/attachment.cgi?id=7381 against the samba 3.6.3-2ubuntu1 src deb and can confirm it does fix the problem.

I did notice doing a wbinfo -u on our 50k+ users AD will crash winbind, but that is a separate issue I believe.

Seb Harrington (sebharrington) wrote :

With this confirmed and the patch confirmed as fixing the issue, is there any chance of getting the fix pushed out?

Regards,

Seb

Quoting Seb Harrington (<email address hidden>):
> With this confirmed and the patch confirmed as fixing the issue, is
> there any chance of getting the fix pushed out?

How about someone confirming this to upstream, in upstream bug log,
and suggesting the fix to be pushed for 3.6.6 (if not done already:
I'm not in position to check upstream's BTS right now).

James Page (james-page) on 2012-05-16
Changed in samba (Ubuntu Quantal):
assignee: nobody → James Page (james-page)
status: Triaged → In Progress
James Page (james-page) on 2012-05-16
description: updated
summary: - winbind coredumps when encountering a group with over 1000 members
+ [SRU] winbind coredumps when encountering a group with over 1000 members
James Page (james-page) on 2012-05-16
description: updated
James Page (james-page) on 2012-05-16
description: updated
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package samba - 2:3.6.5-2ubuntu2

---------------
samba (2:3.6.5-2ubuntu2) quantal; urgency=low

  * d/patches/lp_970679_fix-large-groups.patch: Cherry picked patch from
    upstream VCS to resolve issue with winbind crashing with groups
    containing more than 1000 members (LP: #970679).
  * d/control: Fixup Breaks/Replaces on libnss-winbind so that upgrades
    from libpam-winbind don't break. Thanks to Colin Watson for identifying
    this issue.
 -- James Page <email address hidden> Wed, 16 May 2012 12:07:40 +0100

Changed in samba (Ubuntu Quantal):
status: In Progress → Fix Released
James Page (james-page) on 2012-05-16
description: updated
Seb Harrington (sebharrington) wrote :

@james-page is there anyway of releasing the fix for those of us on 12.04?

James Page (james-page) wrote :

@Seb - working on the SRU now for 12.04

Changed in samba (Ubuntu Precise):
assignee: nobody → James Page (james-page)
status: Triaged → In Progress

I just installed the samba/2:3.6.5-2ubuntu2 build for quantall over an precise install. getent group <large group> still works :-)

Hello Seb, or anyone else affected,

Accepted samba into precise-proposed. The package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in samba (Ubuntu Precise):
status: In Progress → Fix Committed
tags: added: verification-needed
Seb Harrington (sebharrington) wrote :

Very pleased to report that groups of over 1000 users are no longer making winbind core dump and samba appears to be operating as it should.

Thanks very much

Martin Pitt (pitti) on 2012-05-19
tags: added: verification-done
removed: verification-needed

Ditto here. getent group <large group> now works for various groups > 1000 and does not crash winbind with the samba version from poposed.

getent group "Domain Users" now only returns

#getent group "Domain Users"
domain users:x:5000513:

But this does not crash winbind, which is the main thing.

wbinfo -u still crashes winbind, but I see that as a separate issue, and it does not hinder normal operations.

With this patch we can join precise systems to our domain again. Thanks!

Peter Parzer (peter-parzer) wrote :

The bug is also solved for me. Thanks.

James Page (james-page) on 2012-05-31
Changed in samba (Ubuntu Precise):
milestone: none → ubuntu-12.04.1
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package samba - 2:3.6.3-2ubuntu2.2

---------------
samba (2:3.6.3-2ubuntu2.2) precise-proposed; urgency=low

  * Fix issue with winbind crashing when trying to access groups containing
    more than 1000 members (LP: #970679):
    - d/patches/lp_970679_fix-large-groups.patch: Cherry picked patch from
      upstream VCS which ensures that large hunk handling in winbind works
      with talloc preventing crashes.
  * d/samba.install: Restore missing ufw profile (LP: #999764).
  * d/samba-common-bin.install: Restore missing apport hook (LP: #999764).
 -- James Page <email address hidden> Wed, 16 May 2012 13:10:02 +0100

Changed in samba (Ubuntu Precise):
status: Fix Committed → Fix Released
Changed in samba:
importance: Unknown → High
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.