slapd goes into endless sched_yield() loop

Bug #15272 reported by Debian Bug Importer
This bug report is a duplicate of:  Bug #15270: slapd/slapcat hang in endless loops. Edit Remove
4
Affects Status Importance Assigned to Milestone
openldap2.2 (Debian)
Fix Released
Unknown
openldap2.2 (Ubuntu)
Invalid
High
Unassigned

Bug Description

Automatically imported from Debian bug report #303057 http://bugs.debian.org/303057

Revision history for this message
Debian Bug Importer (debzilla) wrote :
Download full text (3.1 KiB)

Message-Id: <email address hidden>
Date: Mon, 04 Apr 2005 17:25:35 +0200
From: Wolfgang Kohnen <email address hidden>
To: Debian Bug Tracking System <email address hidden>
Subject: slapd goes into endless sched_yield() loop

Package: slapd
Version: 2.1.30-3
Severity: important

Sometimes all openldap programs (slapd, slapcat, slapindex) which
want to access my bdb_backend the program eats up all cpu cycles
and doesn't react anymore except of SIGNAL 2 and 4 (not 15; didn't
check any other).

Increasing the loglevel didn't show anything interesting
(to me; I am no programmer). I did a strace on slapindex and slapcat
which showed both times that there is an endless invocation of
sched_yield(). A strace of slapcat can be found here:

 http://duplo.lis.bremen.de/~wollie/slapcat.strace

Please drop me a line if you would like to have a look into my exact
configureation. Maybe the index lines in slapd.conf are interesting (I
am using gosa):

index default sub
index uid,mail eq
index gosaMailAlternateAddress,gosaMailForwardingAddress eq
index cn,sn,givenName,ou pres,eq,sub
index objectClass pres,eq
index uidNumber,gidNumber,memberuid eq
index gosaSubtreeACL,gosaObject,gosaUser pres,eq

Greets,
Wollie

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: i386 (i686)
Kernel: Linux 2.6.8
Locale: LANG=de_DE@euro, LC_CTYPE=de_DE@euro (charmap=ISO-8859-15)

Versions of packages slapd depends on:
ii coreutils [fileutils] 5.2.1-2 The GNU core utilities
ii debconf 1.4.30.11 Debian configuration management sy
ii libc6 2.3.2.ds1-20 GNU C Library: Shared libraries an
ii libdb4.2 4.2.52-18 Berkeley v4.2 Database Libraries [
ii libgcrypt11 1.2.0-4 LGPL Crypto library - runtime libr
ii libgnutls11 1.0.16-9 GNU TLS library - runtime library
ii libgpg-error0 1.0-1 library for common error values an
ii libiodbc2 3.52.2-3 iODBC Driver Manager
ii libldap2 2.1.30-3 OpenLDAP libraries
ii libltdl3 1.5.6-4 A system independent dlopen wrappe
ii libsasl2 2.1.19-1.5 Authentication abstraction library
ii libslp1 1.0.11a-2 OpenSLP libraries
ii libwrap0 7.6.dbs-8 Wietse Venema's TCP wrappers libra
ii perl [libmime-base64-perl] 5.8.4-8 Larry Wall's Practical Extraction
ii psmisc 21.5-1 Utilities that use the proc filesy
ii zlib1g 1:1.2.2-3 compression library - runtime

-- debconf information:
  slapd/password_mismatch:
  slapd/fix_directory: true
  slapd/invalid_config: true
* shared/organization: sub.example.com
  slapd/upgrade_slapcat_failure:
  slapd/upgrade_slapadd_failure:
  slapd/backend: BDB
* slapd/allow_ldap_v2: false
  slapd/no_configuration: false
  slapd/move_old_database: true
  slapd/suffix_change: false
  slapd/slave_databases_require_updateref:
  slapd/autoconf_modules: true
  slapd/purge_database: false
  slapd/admin:
* slapd/domain: sub.exam...

Read more...

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Mon, 4 Apr 2005 18:31:48 +0200
From: Torsten Landschoff <email address hidden>
To: Wolfgang Kohnen <email address hidden>, <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

--4Ckj6UjgE2iN1+kY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Wolfgang,=20

On Mon, Apr 04, 2005 at 05:25:35PM +0200, Wolfgang Kohnen wrote:
> Sometimes all openldap programs (slapd, slapcat, slapindex) which=20
> want to access my bdb_backend the program eats up all cpu cycles=20
> and doesn't react anymore except of SIGNAL 2 and 4 (not 15; didn't=20
> check any other).

Known problem. Please try upgrading to 2.2.23. 2.1.30 will not ship with
sarge because of these and other problems. Running db4.2_recover in the
database directory may temporarily fix those problems but they are going
to strike again.

2.2.23 is only in unstable for now - sorry. Not sure if the dependencies
are fulfilled in testing already.

Greetings

 Torsten

--4Ckj6UjgE2iN1+kY
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCUWv0dQgHtVUb5EcRAqFXAJ4n/j1yKAgFKl8KZYD2SmRUqiM8ogCeIv35
hcNskE4SA3EOU5WtKCCw7Ek=
=NTEM
-----END PGP SIGNATURE-----

--4Ckj6UjgE2iN1+kY--

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <email address hidden>
Date: Mon, 4 Apr 2005 18:32:14 +0200 (CEST)
From: <email address hidden> (Torsten Landschoff)
To: <email address hidden>
Subject: severity of 303057 is normal

severity 303057 normal

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <email address hidden>
Date: Mon, 4 Apr 2005 18:32:29 +0200 (CEST)
From: <email address hidden> (Torsten Landschoff)
To: <email address hidden>
Subject: merging 303057 302992

merge 303057 302992

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Tue, 05 Apr 2005 13:53:26 +0200
From: Sven Hartge <email address hidden>
To: <email address hidden>
Subject: Re: slapd goes into endless sched_yield() loop

> On Mon, Apr 04, 2005 at 05:25:35PM +0200, Wolfgang Kohnen wrote:
>> Sometimes all openldap programs (slapd, slapcat, slapindex) which
>> want to access my bdb_backend the program eats up all cpu cycles
>> and doesn't react anymore except of SIGNAL 2 and 4 (not 15; didn't
>> check any other).

> Known problem. Please try upgrading to 2.2.23. 2.1.30 will not ship
> with sarge because of these and other problems. Running db4.2_recover
> in the database directory may temporarily fix those problems but they
> are going to strike again.

> 2.2.23 is only in unstable for now - sorry. Not sure if the
> dependencies are fulfilled in testing already.

I am deeply sorry, but I have to report 2.2.23 hitting the same problem
for me.

How can this be? I thought all those bugs were elimated in 2.2 using db4.2?

Gr�ven.

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 6 Apr 2005 15:22:45 +0200
From: Torsten Landschoff <email address hidden>
To: Sven Hartge <email address hidden>,
 <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

--Qrgsu6vtpU/OV/zm
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Sven,=20

On Tue, Apr 05, 2005 at 01:53:26PM +0200, Sven Hartge wrote:
> >Known problem. Please try upgrading to 2.2.23. 2.1.30 will not ship=20
> >with sarge because of these and other problems. Running db4.2_recover
> >in the database directory may temporarily fix those problems but they
> >are going to strike again.
>=20
> I am deeply sorry, but I have to report 2.2.23 hitting the same problem=
=20
> for me.
>=20
> How can this be? I thought all those bugs were elimated in 2.2 using db4.=
2?

I thought so as well. Did you use a DB_CONFIG file suited for your
setup? Upstream keeps talking about the problems not being caused by
corruption but by thrashing. Probably in that case the maintainer
scripts /MUST/ install a basic DB_CONFIG which will work for most cases.
I am thinking along 8MB of caches or something.=20

I would be very grateful if you could try that on your setup. See
/usr/share/doc/slapd/examples/DB_CONFIG for a template.

Thanks

 Torsten

--Qrgsu6vtpU/OV/zm
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCU+KldQgHtVUb5EcRAi3yAJ0SahQVGbxwuvEpqACU575pNbVfswCeNYgL
uqDJrS3GQmCx7xOShxymnyA=
=VMd7
-----END PGP SIGNATURE-----

--Qrgsu6vtpU/OV/zm--

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 06 Apr 2005 15:32:11 +0200
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
CC: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Torsten Landschoff wrote:

>> I am deeply sorry, but I have to report 2.2.23 hitting the same problem
>> for me.

>> How can this be? I thought all those bugs were elimated in 2.2 using db4.2?

> I thought so as well. Did you use a DB_CONFIG file suited for your
> setup?

Yes, of course.

> I am thinking along 8MB of caches or something.

#txn_checkpoint 128 15 1
set_cachesize 0 252428800 0
set_lk_max_objects 100000
set_lk_max_locks 100000
set_lg_regionmax 1048576
set_lg_max 8388608
set_lg_bsize 2097152
set_lg_dir /var/lib/ldap/logs/
#set_lk_detect DB_LOCK_DEFAULT
set_tmp_dir /tmp/
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

Those enourmous amounts of possible locks have been set, after one of my
replicas complained about being out of locks and the openldap-ML
suggested increasing this number. Since I really need those servers, I
went a little over the top with this value.

(txn_checkpoint does not work, I don't know why. bdb4.2 does not
recognize it.)

Gr��

--
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 6 Apr 2005 21:01:31 +0200
From: Torsten Landschoff <email address hidden>
To: Sven Hartge <email address hidden>, <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

--jI8keyz6grp/JLjh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Sven,=20

On Wed, Apr 06, 2005 at 03:32:11PM +0200, Sven Hartge wrote:
=20
> >I thought so as well. Did you use a DB_CONFIG file suited for your
> >setup?
>=20
> Yes, of course.
=20
> #txn_checkpoint 128 15 1
> set_cachesize 0 252428800 0
> set_lk_max_objects 100000
> set_lk_max_locks 100000
> set_lg_regionmax 1048576
> set_lg_max 8388608
> set_lg_bsize 2097152
> set_lg_dir /var/lib/ldap/logs/
> #set_lk_detect DB_LOCK_DEFAULT
> set_tmp_dir /tmp/
> #set_flags DB_TXN_NOSYNC
> #set_flags DB_TXN_NOT_DURABLE

Hmm, very interesting. I wonder why it works at Stanford and apparently
nowhere else. I was notified today that putting a DB_CONFIG file into
the directory has no effect after the initial database was created so=20
I'd like to ask if you did it "in time".

Sorry for the all problems, I'd really like to have these problems
fixed!

Thanks for the quick feedback in any case.

Greetings

 Torsten

--jI8keyz6grp/JLjh
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCVDIKdQgHtVUb5EcRAk7JAJ9wF+xkTPDfx8yZ2KPcOYucsbdhBACfb02w
1UMZVBya924kXvc5M6EDq4Q=
=yaJs
-----END PGP SIGNATURE-----

--jI8keyz6grp/JLjh--

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 6 Apr 2005 21:27:19 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Um 21:01 Uhr am 06.04.05 schrieb Torsten Landschoff:
> On Wed, Apr 06, 2005 at 03:32:11PM +0200, Sven Hartge wrote:

>>> I thought so as well. Did you use a DB_CONFIG file suited for your
>>> setup?

>> Yes, of course. =20

>> #txn_checkpoint 128 15 1
>> set_cachesize 0 252428800 0
>> set_lk_max_objects 100000
>> set_lk_max_locks 100000
>> set_lg_regionmax 1048576
>> set_lg_max 8388608
>> set_lg_bsize 2097152
>> set_lg_dir /var/lib/ldap/logs/
>> #set_lk_detect DB_LOCK_DEFAULT
>> set_tmp_dir /tmp/
>> #set_flags DB_TXN_NOSYNC
>> #set_flags DB_TXN_NOT_DURABLE

> Hmm, very interesting. I wonder why it works at Stanford and apparently
> nowhere else. I was notified today that putting a DB_CONFIG file into
> the directory has no effect after the initial database was created so=20
> I'd like to ask if you did it "in time".

Of course. First I copied this DB_CONFIG into /var/lib/ldap, the I=20
de-commented the last two lines, slapadd'ed my data, re-commented the last=
=20
two lines and fired up slapd.

> Sorry for the all problems, I'd really like to have these problems=20
> fixed!

Right now I am running with LD_ASSUME_KERNEL=3D2.4.1 as suggested in=20
$the_other_bug, so far no problems, but as this sched_yield()-problem=20
needs some time to show, I don't know, if this really is the solution or=20
if I am just lucky right now.

Gr=FC=DFe,
Sven.

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Wed, 6 Apr 2005 22:08:31 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Um 21:01 Uhr am 06.04.05 schrieb Torsten Landschoff:

> Hmm, very interesting. I wonder why it works at Stanford and apparently
> nowhere else. I was notified today that putting a DB_CONFIG file into
> the directory has no effect after the initial database was created so=20
> I'd like to ask if you did it "in time".

BTW: I also have two NetBSD2.0 machines running a replica of the tree and=
=20
those two never had any problems so far, neither with 2.2.19/db4.2 nor=20
with 2.2.20/db4.3.

This is totally frustrating of being not able to timely reproduce this=20
bug.

Gr=FC=DFe,
Sven.

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Mon, 11 Apr 2005 09:24:27 +0200
From: Torsten Landschoff <email address hidden>
To: Sven Hartge <email address hidden>, <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

--HcAYCG3uE/tztfnV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Sven,=20

On Wed, Apr 06, 2005 at 09:27:19PM +0200, Sven Hartge wrote:
=20
> Right now I am running with LD_ASSUME_KERNEL=3D2.4.1 as suggested in=20
> $the_other_bug, so far no problems, but as this sched_yield()-problem=20
> needs some time to show, I don't know, if this really is the solution or=
=20
> if I am just lucky right now.

Did the problem show by this time or does LD_ASSUME_KERNEL=3D2.4.1 really
help? If it helps I am thinking about adding it to slapd.init as a work
around...

Greetings

 Torsten

--HcAYCG3uE/tztfnV
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCWiYrdQgHtVUb5EcRAhrTAJwK6uqPOA3wjqcdJG6QCv86Sm6zcACeNkL8
Di4VxhHrpzP2aIMWSWOOBCI=
=Qjwl
-----END PGP SIGNATURE-----

--HcAYCG3uE/tztfnV--

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Mon, 11 Apr 2005 11:21:54 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Um 09:24 Uhr am 11.04.05 schrieb Torsten Landschoff:
> On Wed, Apr 06, 2005 at 09:27:19PM +0200, Sven Hartge wrote:
 =20
>> Right now I am running with LD_ASSUME_KERNEL=3D2.4.1 as suggested in=20
>> $the_other_bug, so far no problems, but as this sched_yield()-problem=20
>> needs some time to show, I don't know, if this really is the solution or=
=20
>> if I am just lucky right now.
=20
> Did the problem show by this time or does LD_ASSUME_KERNEL=3D2.4.1 really=
=20
> help? If it helps I am thinking about adding it to slapd.init as a work=
=20
> around...

Nope, didn't help.=20

S=B0

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Mon, 11 Apr 2005 23:05:21 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Hi.

Could we please raise this bug to at least "important", because every day=
=20
at least one of my 8 replicas goes bottom up with the sched_yield() loop.=
=20
Right now I even consider this bug RC-worthy.

Gr=FC=DFe,
Sven.

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-Id: <email address hidden>
Date: Mon, 11 Apr 2005 23:19:19 +0200 (CEST)
From: <email address hidden> (Torsten Landschoff)
To: <email address hidden>
Subject: severity of 303057 is serious

severity 303057 serious

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Marking as duplicate based on debbugs merge (255276,303057)

This bug has been marked as a duplicate of bug 15270.

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Mon, 11 Apr 2005 23:56:23 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Hi.

After the last episode of "killall slapd; db4.2_recover; /etc/init.d/slapd =
start"
my DB_CONFIG looks like this:

#txn_checkpoint 128 15 1
set_cachesize 0 252428800 0
set_lk_max_objects 100000
set_lk_max_locks 100000
#
set_lk_max_lockers 100000
#
set_lg_regionmax 1048576
set_lg_max 8388608
set_lg_bsize 2097152
set_lg_dir /var/lib/ldap/logs/
#set_lk_detect DB_LOCK_DEFAULT
set_tmp_dir /tmp/
#set_flags DB_TXN_NOSYNC
#set_flags DB_TXN_NOT_DURABLE

Note the enormous amount of possible locks, lockers and lockable objects.=
=20
So far, only increasing this amount seems to be the way to circumvent the=
=20
dreaded sched_yield()-loop.

My last change is

  set_lk_max_lockers 100000

which was still default and seems to be to _real_ culprit, as per=20
ITS#2030:

http://www.openldap.org/its/index.cgi/Software%20Bugs?id=3D2030;selectid=3D=
2030;usearchives=3D1
(Note how old this bug report to the OpenLDAP ITS is.)

Right now I am running some stress tests on my systems, but those %$/&%$=A7=
=20
bastards never locked up, when I was actively trying to push them over the=
=20
edge.

It would be really helpful, if anybody experiencing those problems could=20
check with increased locker-settings (as seen above) if they still have=20
that problem.

Gr=FC=DFe,
Sven.

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Tue, 12 Apr 2005 00:58:12 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Torsten Landschoff <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Um 23:56 Uhr am 11.04.05 schrieb Sven Hartge:

[Sorry for spamming this bug report, but I _really_ need to get this going=
=20
*fast*.]

> My last change is
>=20
> set_lk_max_lockers 100000
>=20
> which was still default and seems to be to _real_ culprit, as per=20
> ITS#2030:
>=20
> http://www.openldap.org/its/index.cgi/Software%20Bugs?id=3D2030;selectid=
=3D2030;usearchives=3D1
> (Note how old this bug report to the OpenLDAP ITS is.)

After torturing my setup with different scripts and trying to to rebuild=20
the normal workload, I am confident, after having run

  watch -n1 -d "db4.2_stat -c"

for some time in parallel, the sched_yield() loop occurs, because the=20
bdb-backend runs out of lockers. At least this is what I get from ITS#2030=
=20
and from various other resources (mostly documentation from Sleepycat).

So I suggest for the package to create a DB_CONFIG with _at least_ 5000=20
lockers, locks and lock objects. The default of 1000 is just to low and=20
will get exploitet in no time by even a little database.

(Don't close this bug yet, further observation has to happen.)

Gr=FC=DFe,
Sven.

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Sat, 16 Apr 2005 02:29:50 -0700
From: Steve Langasek <email address hidden>
To: Sven Hartge <email address hidden>, <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

--5gxpn/Q6ypwruk0T
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi Sven,

On Mon, Apr 11, 2005 at 11:56:23PM +0200, Sven Hartge wrote:

> After the last episode of "killall slapd; db4.2_recover; /etc/init.d/slap=
d start"
> my DB_CONFIG looks like this:

> #txn_checkpoint 128 15 1
> set_cachesize 0 252428800 0
> set_lk_max_objects 100000
> set_lk_max_locks 100000
> #
> set_lk_max_lockers 100000
> #
> set_lg_regionmax 1048576
> set_lg_max 8388608
> set_lg_bsize 2097152
> set_lg_dir /var/lib/ldap/logs/
> #set_lk_detect DB_LOCK_DEFAULT
> set_tmp_dir /tmp/
> #set_flags DB_TXN_NOSYNC
> #set_flags DB_TXN_NOT_DURABLE

> Note the enormous amount of possible locks, lockers and lockable objects.=
=20
> So far, only increasing this amount seems to be the way to circumvent the=
=20
> dreaded sched_yield()-loop.

> My last change is

> set_lk_max_lockers 100000

> which was still default and seems to be to _real_ culprit, as per=20
> ITS#2030:

> http://www.openldap.org/its/index.cgi/Software%20Bugs?id=3D2030;selectid=
=3D2030;usearchives=3D1
> (Note how old this bug report to the OpenLDAP ITS is.)

The last follow-up in that ITS points to
<http://www.sleepycat.com/docs/ref/lock/max.html>, which gives guidelines
about how to tune the number of available locks and lockers. Is this what
you did? Has the database held up under stress after making these changes?

I'm not sure how useful it is to set a fixed number of lockers by default,
since the optimal value depends on usage statistics; but bumping from 1000
to 5000 doesn't seem like it can hurt much.

Thanks,
--=20
Steve Langasek
postmodern programmer

--5gxpn/Q6ypwruk0T
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCYNsNKN6ufymYLloRAk5wAKCnLGRoRGdYPRbIOG3iNXp5b/zdBgCeKBRr
KsPcbgy7PYUzH+kQQcoQBJY=
=5xMU
-----END PGP SIGNATURE-----

--5gxpn/Q6ypwruk0T--

Revision history for this message
Debian Bug Importer (debzilla) wrote :

Message-ID: <email address hidden>
Date: Sat, 16 Apr 2005 16:08:51 +0200 (CEST)
From: Sven Hartge <email address hidden>
To: Steve Langasek <email address hidden>
cc: <email address hidden>
Subject: Re: Bug#303057: slapd goes into endless sched_yield() loop

Um 02:29 Uhr am 16.04.05 schrieb Steve Langasek:

>> #txn_checkpoint 128 15 1
>> set_cachesize 0 252428800 0
>> set_lk_max_objects 100000
>> set_lk_max_locks 100000
>> #
>> set_lk_max_lockers 100000
>> #
>> set_lg_regionmax 1048576
>> set_lg_max 8388608
>> set_lg_bsize 2097152
>> set_lg_dir /var/lib/ldap/logs/
>> #set_lk_detect DB_LOCK_DEFAULT
>> set_tmp_dir /tmp/
>> #set_flags DB_TXN_NOSYNC
>> #set_flags DB_TXN_NOT_DURABLE
=20
>> Note the enormous amount of possible locks, lockers and lockable objects=
=2E=20
>> So far, only increasing this amount seems to be the way to circumvent th=
e=20
>> dreaded sched_yield()-loop.

>> http://www.openldap.org/its/index.cgi/Software%20Bugs?id=3D2030;selectid=
=3D2030;usearchives=3D1
>> (Note how old this bug report to the OpenLDAP ITS is.)
=20
> The last follow-up in that ITS points to=20
> <http://www.sleepycat.com/docs/ref/lock/max.html>, which gives=20
> guidelines about how to tune the number of available locks and lockers. =
=20
> Is this what you did?

Correct. But since I desperately needed the databases (after all, this are=
=20
my production LDAP servers), I upped the number to this very high value,=20
so they would hold up in an case, without me manualle rebuilding the=20
database every 6 to 12 hours.

I am about to lower this to 10000, since 100000 is just to high for my=20
workload, consuming to much memory.

> Has the database held up under stress after making these changes?

Yes, after the changes, I experienced no more lookups or database=20
corruptions.
=20
> I'm not sure how useful it is to set a fixed number of lockers by=20
> default, since the optimal value depends on usage statistics; but=20
> bumping from 1000 to 5000 doesn't seem like it can hurt much.

1000 seems to low in most cases, at least my experience show this.

But: I still consider it a grave bug for db4.2, if running out of lockers=
=20
corrupts the database. And I consider it a bug in slapd, if it runs into a=
=20
busy-waiting loop, if something inside the database went wrong.

Gr=FC=DFe,
Sven.

--=20
Sven Hartge -- professioneller Unix-Geek und alltime Nerd
Meine Gedanken im Netz: http://sven.formvision.de/blog/

Changed in openldap2.2:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.