Comment 8 for bug 1921562

Revision history for this message
In , Kartik Subbarao (subbarao) wrote :

Hi Ryan,

I'm running into a problem with slapd 2.4.46 hanging on Ubuntu 18.04,
which seems to be a side effect of the ITS#8650 patch:

https://github.com/openldap/openldap/commit/7b5181da8cdd47a13041f9ee36fa9590a0fa6e48

slapd will run fine for a while, but during some periods of
high-traffic, it'll hang. It'll peg the CPU at 100% and won't respond to
any new LDAP connections. After some time, it'll resume working again,
but overall it's fairly unreliable.

strace on slapd during the hang shows that it's constantly making read()
calls that return EAGAIN. After doing a gdb stack trace on slapd, I
realized that these read() calls are happening as part of the busywait
for loop in tlsg_session_accept() that repeatedly calls
gnutls_handshake() when it gets EAGAIN. When slapd recovers from this
hang state, the first message it prints is a TLS negotiation failure
error on the culprit file descriptor.

If I back out the ITS#8650 patch, the problem goes away. If I insert
sleep(1) in the for loop, slapd no longer pegs the CPU at 100%, but it
still becomes unresponsive during these high-traffic periods.

I don't know what's happening during these high-traffic periods that
causes the TLS negotiation to go astray. Unfortunately it's not easy to
reproduce this problem outside of this production environment, given the
diversity of clients running different OSes with various versions of SSL
libraries.

I'm wondering if there is a better way to handle EAGAIN returned from
gnutls_handshake(), instead of doing a busywait as in ITS#8650, or my
simplistic attempt at inserting a sleep() call which doesn't really seem
to help. I'm wondering how the GnuTLS developers intend for people to
use gnutls_handshake() properly, so as to gracefully handle sessions
that involve long packets on the one hand, without opening up a
vulnerability to chew up lots of system resources on the other hand.

Regards,

     -Kartik