[upstream] ReferenceError: processLDAPValues is not defined

Bug #1854363 reported by Ingar Smedstad
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mozilla Thunderbird
Fix Released
Medium
thunderbird (Ubuntu)
Fix Released
High
Olivier Tilloy
Bionic
Fix Released
High
Olivier Tilloy

Bug Description

The latest thunderbird (1:68.2.1+build1-0ubuntu0.18.04.1) introduces a typo that destroys our autoconfiguration.

Error messages:
Netscape.cfg/AutoConfig failed. Please contact your system administrator.
 Error: getLDAPAttibutes failed: ReferenceError: processLDAPValues is not defined

Netscape.cfg/AutoConfig failed. Please contact your system administrator.
 Error: authconfig.js failed failed: TypeError: userInfo.mail is undefined

Ref: https://bugzilla.mozilla.org/show_bug.cgi?id=1592922

I have downgraded thunderbird and confirmed that this happen when upgrading to 1:68.2.1+build1

My system:
Description: Ubuntu 18.04.3 LTS
Release: 18.04

apt policy thunderbird
thunderbird:
  Installed: 1:52.7.0+build1-0ubuntu1
  Candidate: 1:68.2.1+build1-0ubuntu0.18.04.1
  Version table:
     1:68.2.1+build1-0ubuntu0.18.04.1 500
        500 http://ubuntu.uib.no/archive bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
 *** 1:52.7.0+build1-0ubuntu1 500
        500 http://ubuntu.uib.no/archive bionic/main amd64 Packages
        100 /var/lib/dpkg/status

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Set up an LDAP server as described in bug 1574712 comment #0. Tick the SSL box.

You can look for "Alvarez" in the LDAP address book, but recipient auto-complete gives
Component returned failure code: 0x80590051 [nsIAbDirectoryQuery.doQuery] 2 nsAbLDAPAutoCompleteSearch.js:261

Alice, can you check for us when that stopped working. Same regression range as bug 1574712? Or did that stop working earlier?

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

> You can look for "Alvarez" in the LDAP address book ...

I got that wrong, that doesn't work either. Maybe Adams don't do SSL?
EDIT: Apparently they do: https://support.microfocus.com/kb/doc.php?id=7940314#

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Yes, TB 60.8 works with Adams + SSL for both address book search and auto-complete.

Revision history for this message
In , Alice0775-t (alice0775-t) wrote :

Strange thing I found.

On Daily70.0a1 20190824095533 Windows10,
A. Restart TB > Open Compose Window > Type "Alvarez" in "To:" field > (GOOD)
B. Restart TB > Open Compose Window > Type "Alvarez" in "To:" field > (GOOD) > Search "Alvarez" in Contacts Sidebar(F9) > (BAD)
C. Restart TB > Open Compose Window > Search "Alvarez" in Contacts Sidebar(F9) > (GOOD) > Type "Alvarez" in "To:" field > (BAD)

Regression window:
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2018-05-08+00%3A00%3A00&enddate=2018-05-11+11%3A00%3A00
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=59005ba3cd3e7b3f9e8804bea881bf4c3a755d7c&tochange=21f09d7e7214eaebf1e0980494159bd846e1bdd9

Unfortunately, the taskcluster builds are not available. (it expired) :(

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :
Download full text (5.5 KiB)

Thank you so much, Alice. So a regression from May 2018 in the 62 cycle. No one noticed in over a year :-(

So at the beginning of your comment #3 you're saying that LDAP lookup works in Daily with SSL enabled? At least the first time?

When I try it in a debug build, I crash:
```
xul.dll!DoOCSPRequest(const nsTString<char> & aiaLocation, const mozilla::OriginAttributes & originAttributes, unsigned char[127] & ocspRequest, unsigned __int64 ocspRequestLength, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> timeout, mozilla::Vector<unsigned char,0,mozilla::MallocAllocPolicy> & result) Line 435 C++
xul.dll!mozilla::psm::NSSCertDBTrustDomain::SynchronousCheckRevocationWithServer(const mozilla::pkix::CertID & certID, const nsTString<char> & aiaLocation, mozilla::pkix::Time time, unsigned short maxOCSPLifetimeInDays, const mozilla::pkix::Result cachedResponseResult, const mozilla::pkix::Result stapledOCSPResponseResult) Line 766 C++
xul.dll!mozilla::psm::NSSCertDBTrustDomain::CheckRevocation(mozilla::pkix::EndEntityOrCA endEntityOrCA, const mozilla::pkix::CertID & certID, mozilla::pkix::Time time, mozilla::pkix::Duration validityDuration, const mozilla::pkix::Input * stapledOCSPResponse, const mozilla::pkix::Input * aiaExtension) Line 742 C++
xul.dll!mozilla::pkix::PathBuildingStep::Check(mozilla::pkix::Input potentialIssuerDER, const mozilla::pkix::Input * additionalNameConstraints, bool & keepGoing) Line 257 C++
xul.dll!mozilla::psm::CheckCandidates(mozilla::pkix::TrustDomain & trustDomain, mozilla::pkix::TrustDomain::IssuerChecker & checker, mozilla::Vector<mozilla::pkix::Input,0,mozilla::MallocAllocPolicy> & candidates, mozilla::pkix::Input * nameConstraintsInputPtr, bool & keepGoing) Line 187 C++
xul.dll!mozilla::psm::NSSCertDBTrustDomain::FindIssuer(mozilla::pkix::Input encodedIssuerName, mozilla::pkix::TrustDomain::IssuerChecker & checker, mozilla::pkix::Time) Line 332 C++
xul.dll!mozilla::pkix::BuildForward(mozilla::pkix::TrustDomain & trustDomain, const mozilla::pkix::BackCert & subject, mozilla::pkix::Time time, mozilla::pkix::KeyUsage requiredKeyUsageIfPresent, mozilla::pkix::KeyPurposeId requiredEKUIfPresent, const mozilla::pkix::CertPolicyId & requiredPolicy, const mozilla::pkix::Input * stapledOCSPResponse, unsigned int subCACount, unsigned int & buildForwardCallBudget) Line 364 C++
xul.dll!mozilla::pkix::BuildCertChain(mozilla::pkix::TrustDomain & trustDomain, mozilla::pkix::Input certDER, mozilla::pkix::Time time, mozilla::pkix::EndEntityOrCA endEntityOrCA, mozilla::pkix::KeyUsage requiredKeyUsageIfPresent, mozilla::pkix::KeyPurposeId requiredEKUIfPresent, const mozilla::pkix::CertPolicyId & requiredPolicy, const mozilla::pkix::Input * stapledOCSPResponse) Line 413 C++
xul.dll!mozilla::psm::BuildCertChainForOneKeyUsage(mozilla::psm::NSSCertDBTrustDomain & trustDomain, mozilla::pkix::Input certDER, mozilla::pkix::Time time, mozilla::pkix::KeyUsage ku1, mozilla::pkix::KeyUsage ku2, mozilla::pkix::KeyUsage ku3, mozilla::pkix::KeyPurposeId eku, const mozilla::pkix::CertPolicyId & requiredPolicy, const mozilla::pkix::Input * stapledOCSPResponse, mozilla::psm::CertVerifier::OCSPStaplingStatus * ocspStaplingStatus) Line 207...

Read more...

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Oh, I got it, the first search with SSL works, in the AB, in the contacts side panel or auto-complete. The second search fails.

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :
Revision history for this message
In , Kai Engert (kaie) wrote :

Yes, but 1456489 is the reason for the LDAP/SSL failures.

Most SSL servers will use a certificiate with OCSP information. OCSP querying (checking for revoked certificates) is enabled by default.

However, in bug 1456489, the maintainers of the Firefox network engine have decided that such requests MUST NOT be done on the primary thread.

I see Thunderbird use a synchronous query to the LDAP server, directly from the main thread. This causes all subsequent I/O to happen on the main thread too, which the Firefox core code refuses to execute, and tells the caller that it's a bad certificate - which causes the LDAP/SSL connection to fail.

The solution is to move LDAP querying to a background thread, and report the results to the main thread.

Call stack subset from the main thread:
#30 0x00007ffff7a7835a in nsldapi_send_server_request (ld=0x7fffd452e000, ber=0x7fffd4553800, msgid=1, parentreq=0x0, srvlist=0x0, lc=0x7fffd43cf240, bindreqdn=0x7fffd8ee7f58 "", bind=0)
    at /home/user/moz/comm-esr68/mozilla/comm/ldap/c-sdk/libraries/libldap/request.c:319
#31 0x00007ffff7a76fe9 in nsldapi_send_initial_request (ld=0x7fffd452e000, msgid=1, msgtype=96, dn=0x7fffe6fb7636 <gNullChar> "", ber=0x7fffd4553800) at /home/user/moz/comm-esr68/mozilla/comm/ldap/c-sdk/libraries/libldap/request.c:136
#32 0x00007ffff7a85003 in simple_bind_nolock (ld=0x7fffd452e000, dn=0x7fffe6fb7636 <gNullChar> "", passwd=0x7fffe6fb7636 <gNullChar> "", unlock_permitted=1) at /home/user/moz/comm-esr68/mozilla/comm/ldap/c-sdk/libraries/libldap/sbind.c:145
#33 0x00007ffff7a84625 in ldap_simple_bind (ld=0x7fffd452e000, dn=0x7fffe6fb7636 <gNullChar> "", passwd=0x7fffe6fb7636 <gNullChar> "") at /home/user/moz/comm-esr68/mozilla/comm/ldap/c-sdk/libraries/libldap/sbind.c:79
#34 0x00007fffea41c9ed in nsLDAPOperation::SimpleBind(nsTSubstring<char> const&) (this=0x7fffd42eb500, passwd=...) at /home/user/moz/comm-esr68/mozilla/comm/ldap/xpcom/src/nsLDAPOperation.cpp:270
#35 0x00007fffea48f7f4 in nsAbLDAPListenerBase::OnLDAPInit(nsILDAPConnection*, nsresult) (this=0x7fffd45e34a0, aConn=0x7fffd7bc7ce0, aStatus=nsresult::NS_OK) at /home/user/moz/comm-esr68/mozilla/comm/mailnews/addrbook/src/nsAbLDAPListenerBase.cpp:270
#36 0x00007fffea410466 in nsLDAPConnection::OnLookupComplete(nsICancelable*, nsIDNSRecord*, nsresult) (this=0x7fffd7bc7ce0, aRequest=0x7fffd42f5f00, aRecord=0x7fffd42f11f0, aStatus=nsresult::NS_OK)
#37 0x00007fffeafe1a4b in mozilla::net::DNSListenerProxy::OnLookupCompleteRunnable::Run() (this=0x7fffd47ee8c0) at /home/user/moz/comm-esr68/mozilla/netwerk/dns/DNSListenerProxy.cpp:35

Thunderbird should have automated tests for LDAP/SSL querying, to ensure we don't regress again in the future. I've filed bug 1576901 to track that.

Revision history for this message
In , Kai Engert (kaie) wrote :

Insecure workaround: Disable OCSP (preferences / advanced / certificates, disable "query OCSP".

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

*** Bug 1574248 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

*** Bug 1583775 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mkmelin+mozilla (mkmelin+mozilla) wrote :
Revision history for this message
In , Benc-q (benc-q) wrote :

No patch yet, but here are my thoughts so far:
I've now gone through the code all the way from `nsAbLDAPDirectory::StartSearch()`, out to the final `OnSearchFinished()` callback, and have mapped out the flow on paper. So many moving parts, but I think I've got a grasp on it now.

The `nsLDAPConnection` already has a thread for handling blocking ldap operations, however a lot of the logic still happens out in the main thread (in `nsLDAPOperation`) before handing off to the worker thread. Including the bind operation, which is where our problem happens (the bind immediately goes off and tries to sort out the SSL cert, which fails because we're still on the main thread).
My plan is to run more of the ldap operation code into the worker thread, which should be simple enough (although the logic is split up over a bunch of different objects which obfuscates things a bit). Hopefully have a patch to try out tomorrow.

Revision history for this message
In , Benc-q (benc-q) wrote :

Quick update:
It's not that the SSL handshake must be performed off the main thread, it's that it must be done specifically on the SocketTransportService thread.
I've now got the bind operation (which ends up performing the SSL handshake) doing that, and it no longer asserts. But it now fails to connect to the server (sighs). So I'm picking my way through the various layers to figure out why that is, and how to fix it.

Revision history for this message
In , Benc-q (benc-q) wrote :

The more I dig into this, the more I think I should probably be working on Bug 826662 instead...
LDAP over SSL (LDAPS) seems to be deprecated in favour of the StartTLS extension.
(but see also this [blog post](https://forum.forgerock.com/2015/04/ldaps-or-starttls-that-is-the-question/) which argues in favour of LDAPS).
But really, we should be supporting both.

Anyway.

Getting the handshake running on the STS thread is easy enough, but I'm still trying to figure out why my connections are failing.
(debugging is tricky - there are loads of layers to step through, and on a real network connection it'll time out anyway while you're inspecting it!)

Just to help me mentally organise my findings so far (I don't expect any of this to make much sense, but I find trying to write it down helps!):
- The C++ LDAP code uses [`libprldap`](https://searchfox.org/comm-central/search?q=libprldap&case=false&regexp=false&path=) to adapt the C ldap API to use NSPR sockets, using an IO function table (connect, read, write,poll, close etc...).
- When in SSL mode, some of those IO functions are replaced.
- The replacement SSL connect function first calls the original plaintext connect function, then uses the TLS socket provider (nsTLSSocketProvider, called "starttls") to upgrade the still-unused socket to a secure one (but nothing happens immediately - the negotiation is deferred until the first read/write).
- when an LDAP operation is performed, the first data write triggers the TLS/SSL negotiation.

One of the really confusing things was that it uses the "starttls" socketprovider to set up the encryption. But this is totally unrelated to the 'startTLS' extension in LDAPv3. Apparently some LDAP servers don't support SSLv3, and "starttls" provides a way to make sure SSLv2 is used.

From the C++ side of things:

- `nsLDAPConnection` defers setting up the LDAP stuff until the host name is resolved ([`OnLookupComplete`](https://searchfox.org/comm-central/source/ldap/xpcom/src/nsLDAPConnection.cpp#434) is called).
- The c-api LDAP is set up, [still in `OnLookupComplete`](https://searchfox.org/comm-central/source/ldap/xpcom/src/nsLDAPConnection.cpp#506)
- If we're using SSL, then [`nsLDAPInstallSSL`](https://searchfox.org/comm-central/search?q=nsLDAPInstallSSL&path=) is called to wrap the IO functions with the TLS-aware ones.
- At some point, a new `nsLDAPOperation` will be created and invoked.
- The operation will call a C API ldap function which will cause the first data to be sent, which will in turn trigger the TLS handshaking.
- the operation will be passed to [`nsLDAPConnection::AddPendingOperation()`](https://searchfox.org/comm-central/search?q=AddPendingOperation&path=), which polls for responses on a separate thread.

I'm feeling a bit fried on this one. But I've still got lots of avenues to explore. Weekend now. Fresh look at it on Monday.

Revision history for this message
In , Benc-q (benc-q) wrote :

Created attachment 9102037
LDAP_SSL_fiddling.patch

This is my hacky little patch I've been using to shuffle the SSL negotiation off onto the STS thread.
I implemented a special nsIRunnable for the SimpleBind method, because I erroneously thought that was a special case which triggered the SSL negotiation (it's not - it's just whatever operation first runs over the connection). But it's enough to work on a proof of concept. A proper fix would generalise it (and likely clarify a lot of the logic currently smeared across nsLDAPConnection and nsLDAPOperation).

Needless to say, it still doesn't work: the connection fails.

I've got the ldap.adams.edu LDAP server configured, and I open the addressbook and start entering a name to look up.
ldap.adams.edu is the only public LDAP server I've found which listens on both plain (port 389) and SSL (port 636) LDAP ports.

Revision history for this message
In , Kai Engert (kaie) wrote :

Just a few explanations regarding the use of the terms SSL and TLS in our code, as it can be confusing.

nsTLSSocketProvider is misnamed. It should have been called nsStartTLSSocketProvider.
That class does a socket with "initial plaintext, after some protocol specific handshake, agree to raise the socket into speaking SSL/TLS".

nsSSLSocketProvider is ambiguous. It should have been called nsSSLorTLSSocketProvider.
This is a socket with "speak the SSL/TLS protocol immediately, wrap the inner protocol, and the inner protocol doesn't have to know anything about it".

The versions for SSL and TLS went from:
- SSL 2
- SSL 3
- TLS 1.0
- TLS 1.1
- TLS 1.2
- TLS 1.3

So SSL and TLS are just old and newer versions of each other, and at some time some people decided to rename it, very confusing.

At the time StartTLS was introduced, the most recent available version had already arrived at TLS, which is why it wasn't called StartSSL, but StartTLS.

I'm not sure why you talked about SSL v2. That's very old, and nobody should be using it any more. Even SSL v3 shouldn't be used anymore.

If you had seen some code mentioning SSL v2, it might have been because of another detail.

If a client and a server start to speak with each other about SSL/TLS (either immediately with plain SSL/TLS, or at the time they upgrade to plaintext socket to SSL/TLS in the StartTLS scenario), they exchange an initial handshake message.

That handshake message can either be "modern" with a list of flexible extensions.

Or, that handshake can be "SSL v2 compatible". Some servers simply gave strange errors when talking to them with the initial modern handshake, so a handshake that's backwards comaptible was tried in some scenarios.

Most of what I wrote here shouldn't matter much for the scope of this bug, but might help you understand better what's happening.

However, depending on the server port you're connecting to, you get one code path or the other.

If you connect to a server port that expects to speak SSL/TLS immediately (LDAPS), with an outer SSL/TLS layer, and an inner plaintext LDAP protocol layer, then you should see that the nsSSLSocketProvider code is reached. In this scenario, I assume the SSL/TLS handshake is executed on the thread that initiates the socket connection.

If you connect to a plaintext server port, that optionally can upgrade to SSL/TLS by using StartTLS, then the SSL/TLS handshake will happen later than the initial connect. That SSL/TLS handshake happening at StartTLS upgrade time is probably executed on the thread that handles the data flow on the socket. Depending on the implementation, that might not be the same thread that trigger the initial socket connection.

For the scope of this bug, I'd focus on the code that worked in the past. If so far we had only supported SSL/TLS, I'd try to get that work first, and worry about StartTLS support later.

Revision history for this message
In , Kai Engert (kaie) wrote :

Looking at nsLDAPSSLConnect in nsLDAPSecurityGlue.cpp, it's confusing that it calls
  sps->GetSocketProvider("starttls", getter_AddRefs(tlsSocketProvider))

I would have expected that code to do ...
  sps->GetSocketProvider("ssl", getter_AddRefs(sslSocketProvider))
... given that it doesn't claim to support StartTLS, but only support SSL/TLS.

However, what they do here, they immediately call ->StartTLS() on that socket. They don't wait for some plaintext back and forth talk to negotiate when the time has come to do that, they do it immediately.

So you could say, they do the functional equivalent of a SSL/TLS socket provider, and they don't need the ability to delay the upgrade of the socket to a later time.

That LDAP glue code also has a very old comment, that explains the motivation for doing that:
  // If possible we want to avoid using SSLv2, as this can confuse
  // some directory servers (notably the netscape 4.1 ds). The only
  // way that PSM provides for us to do this is to use a socket that can
  // be used for the STARTTLS protocol, because the STARTTLS protocol disallows
  // the use of SSLv2.

This means, that code was written at a time when SSL v2 was still widely used. But they considered it insecure and didn't want to use it for LDAP. They didn't find another mechanism from within the LDAP code to disable SSL v2. That's why they abused that alternative TLSSocketProvider == StartTLSSocketProvider, to trick the underlying code into disabling the older SSL v2 protocol.

Nowadays we should no longer need that "trick". That code here should be changed to call
  sps->GetSocketProvider("ssl", getter_AddRefs(sslSocketProvider))
and the call to ->StartTLS should be removed, as it's unnecessary.

This might not be helpful for this bug here, but it would be a correctness cleanup.

Later, once we get to fix bug 826662, we'd introduce an alternative code path that calls GetSocketProvider("startssl", ...), but that would be based on a new checkbox added to the user interface, which would allow the user to explicitly request the StartTLS upgrade with the server, instead of speaking the immediate SSL/TLS. We already offer that choice for mail servers (connection security: either none, or SSL/TLS, or StartTLS). Today, the UI for LDAP servers only offers SSL/TLS.

Revision history for this message
In , Benc-q (benc-q) wrote :

Thanks very much Kai - much appreciated! I figured the "starttls" usage was probably a workaround, but was never sure it wasn't just me missing something important.
Is there still any reason for the LDAP SSL (ie port 636) path to open a plain socket, then upgrade it to SSL/TLS, rather than just asking the "ssl" socket provider to open a SSL/TLS connection in the first place?

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to Ben Campbell from comment #18)
> Is there still any reason for the LDAP SSL (ie port 636) path to open a plain socket, then upgrade it to SSL/TLS, rather than just asking the "ssl" socket provider to open a SSL/TLS connection in the first place?

If we ever want to support StartTLS, you'll need to use the approach "create socket, do plaintext negotiation, then upgrade to SSL/TLS".

That means we'll eventually need a code path that is similar to the one today. But for now, if it means simplification, it seems fine to open the SSL/TLS socket directly.

Revision history for this message
In , Benc-q (benc-q) wrote :

Created attachment 9102870
1576364-ssl-ldap-fix-1.patch

This is getting really infuriating. This patch ditches the startTLS-immediately-to-get-sslv2 hack, and uses the TLS/SSL socketprovider instead. The socketprovider seems happy, but the first traffic over the connection fails (no wrong-thread asserts though!). I assume the handshaking is failing somewhere, but I haven't had a chance to debug down through all the layers yet, but that's the next thing I'll do.

I'm uploading this patch now to see if there's anything I'm obviously doing wrong.

Just to confirm, the adams LDAP server (which I'm using to test against) definitely supports LDAP-over-SSL:

```
     $ ldapsearch -v -x -H "ldaps://ldap.adams.edu" -b ou=people,dc=adams,dc=edu -s sub "(cn=*)" cn mail sn
```
Seems to work fine for me, over port 636.

More details on the patch:
I'm not so happy with the way it's still opening a plain TCP connection then attempting to upgrade it via nsISocketProvider::AddToSocket(). I'd rather directly open the socket using the TLS/SSL version of nsISocketProvider::NewSocket(), but it's complicated by all the layers of indirection we're going through.

We use libprldap to provide the IO functions (and DNS hooks and mutex primitives) to interface ldap with the NSPR sockets, but libprldap doesn't know about nsISocketProvider. So we've got helpers in `ldap/xpcom/src/nsLDAPSecurityGlue.cpp` which monkey-patch the libprldap functions and upgrade the connection. Ideally, it'd just be a matter of replacing the "connect" function with a TLS/SSL one, but the function does a whole lot of (potential) DNS lookup stuff synchronously (we've already done an async DNS lookup in advance), so instead we call the original function then call AddToSocket() upon it.

I think it'd be much cleaner to ditch libprldap entirely, and provide the interfacing on the C++ side (ie in nsLDAPSecurityGlue.cpp) where we have access to `nsISockerProvider`. But that means moving all the libprldap functionality out into nsLDAPSecurityGlue. It'd be a lot less code overall, but is way more intrusive than I'd like, especially just for this bug!

Revision history for this message
In , Kai Engert (kaie) wrote :

I suggest to trace where it's stuck, might be quicker than following through the debugger.

export NSPR_LOG_MODULES="pipnss:5"

Compare what's happening without and with your change, so you could see which events are missing.

Revision history for this message
In , Kai Engert (kaie) wrote :

I didn't expect that fixing this bug would require digging deeply into the C/C++ network code. And I wasn't aware how special our ldap network code is.

My explanations from comment 16 and comment 17 weren't a suggestion how to fix this bug. I only wanted to give you more background information, because you were wondering about some of this details in your comment before it.

Your latest patch does very little, and only changes that socket provider detail. That will be insufficient to fix this bug.

You noticed how the socket code uses layers of functions. Yes, it makes use of NSPR file descriptor layering. It allows to implement a wrapped connection, where an outer protocol and an innner protocol is spoken. One layer represents the plain socket that is accessed by the application protocol code (here: LDAP). When using SSL/TLS, this isn't the usual plain implementation of a socket. Instead, it uses a socket implementation from mozilla/security/manager, that provides special implementations for the I/O functions. That's implemented in nsNSSIOLayer.cpp, and that code module is called "PSM".

That application level layer glues the I/O functions to the raw underlying implementation of SSL/TLS, which is provided by the NSS library.

If the application code wants to connect/read/write, the call is sent to a function from nsNSSIOLayer (e.g. PSMSend), which in turn redirects to ssl_* functions. Ldap code -> PSM code -> NSS code.

The NSS socket layer keep tracks of the additional handshaking and other stuff that needs to be exchanged on the raw socket, in order to make the socket secure.

The first part of that is to initiate a handshake between the server and the client. That is, the first time you read or write to the socket, the NSS layer will detect that. It will start communicating with the other side to exchange a few message in both directions, until both sides are happy with the connection. Only after this has happened, the NSS layer will accept the raw application data (LDAP) and encapsulate it into encrypted packages for secure transport.

The first read/write activity by the application should trigger the handshake. Because that initial handshake can take some time, the usual approach is to use async network code. The LDAP application should use async reading and writing on the socket, until it succeeds.

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to Ben Campbell from comment #13)
> It's not that the SSL handshake must be performed off the main thread, it's that it must be done specifically on the SocketTransportService thread.

Why?

Revision history for this message
In , Kai Engert (kaie) wrote :

Created attachment 9102983
ldap-stack.txt

just for reference, the full stack of the LDAP call and the failing assert.

Revision history for this message
In , Kai Engert (kaie) wrote :

I found that the existing code, which creates a plain socket, then adds the StartTLS provider, then calls StartTLS, works around another problem.

Usually, the "connect" call on the socket would be done late, only after having layering SSL/TLS into it.

However, prldap functions prldap_connect -> prldap_try_one_address hide the steps of:
- get socket fd
- set socket options (such as blocking or nonblocking)
- call PR_Connect on that socket (which doesn't have the SSL/TLS layer yet)

This means, unless we want to rewrite the above prldap functions, we'd have to stick with the "immediate StartTLS" approach, as that offers a mechanism to reset the socket, and trigger the handshake (it calls SSL_ResetHandshake, which does a similar SSL/TLS socket setup than the call to PR_Connect).

In other words, I've found out the second patch is useless right now, and actually counterproductive, unless we rewrite parts of prldap.

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Kai Engert (:kaie:) from comment #24)
> just for reference, the full stack of the LDAP call and the failing assert.

Ahh, now I see. I was getting a different assert (checking specifically for the STS thread rather than anything-but-mainthread):
https://searchfox.org/mozilla-central/source/security/manager/ssl/SSLServerCertVerification.cpp#1513

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Kai Engert (:kaie:) from comment #25)
> I found that the existing code, which creates a plain socket, then adds the StartTLS provider, then calls StartTLS, works around another problem.
> Usually, the "connect" call on the socket would be done late, only after having layering SSL/TLS into it.

Doh. I'd missed the fact that the SSL/TLS AddToSocket() layering doesn't work upon an already-connected socket. Too much StartTLS on the brain :-)

> However, prldap functions prldap_connect -> prldap_try_one_address hide the steps of:
> - get socket fd
> - set socket options (such as blocking or nonblocking)
> - call PR_Connect on that socket (which doesn't have the SSL/TLS layer yet)

prldap_connect() also does a bunch of host lookup stuff, which makes it trickier for the nsLDAPSSLSecurityGlue.cpp code to just patch in a replacement connect function. (And the existing LDAP C++ code does it's own host resolution in advance anyway. But only on the first host, if there are multiple. There are other complicating factors too, of course. Sighs).

> In other words, I've found out the second patch is useless right now, and actually counterproductive, unless we rewrite parts of prldap.

Agreed. I actually think it'd tidy things up a lot if I essentially merged prldap and nsLDAPSSLSecurityGlue.cpp, and provided three separate connect() functions:
1) plain text: calls PR_OpenTCPSocket()
2) LDAPS: calls NewSocket() on the SSL/TLS socket provider
3) startTLS: calls NewSocket() on the startTLS socket provider (upgrading it via startTLS(), in sync with sending the appropriate LDAP protocol request to switch to TLS).

On the LDAP side, anything that can block (ie prettymuch anything involving sockets :- ) needs moving off the main thread. It's kind of half way there - it kicks requests off on the main thread, then dispatches to a separate thread to await the result.

Revision history for this message
In , Kai Engert (kaie) wrote :
Revision history for this message
In , Kai Engert (kaie) wrote :

FYI, I did some experiments and tracing.

Using your first patch, I see that the handshake starts, but eventually the code in NSS stops with error code PR_WOULD_BLOCK_ERROR.
(In ssl_SecureSend(), inside the call to ssl_Do1stHandshake().) Some parts of that code might not be prepared to work with a blocking socket. As changing that would probably require changes to the NSS library, and get such changes upstreamed, it's probably better to avoid that.

It's probably better if we change our ldap implementation to use nonblocking sockets (async i/o). As a quick experiment I changed prldap to request a nonblocking socket (see below), but that fails earlier. We'd need to change our ldap code to expect the typical "would block" scenario from sockets, and try again.

```
diff --git a/ldap/xpcom/src/nsLDAPSecurityGlue.cpp b/ldap/xpcom/src/nsLDAPSecurityGlue.cpp
--- a/ldap/xpcom/src/nsLDAPSecurityGlue.cpp
+++ b/ldap/xpcom/src/nsLDAPSecurityGlue.cpp
@@ -103,6 +103,8 @@ extern "C" int LDAP_CALLBACK nsLDAPSSLCo
   NS_ASSERTION(options & LDAP_X_EXTIOF_OPT_SECURE,
                "nsLDAPSSLConnect(): called for non-secure connection");
   options &= ~LDAP_X_EXTIOF_OPT_SECURE;
+
+ options |= LDAP_X_EXTIOF_OPT_NONBLOCKING;

   // Retrieve session info. so we can store a pointer to our session info.
   // in our socket info. later.
```

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Kai Engert (:kaie:) from comment #29)
> It's probably better if we change our ldap implementation to use nonblocking sockets (async i/o).

I think this is probably the Right Thing (tm) to do anyway: the libldap API seems to support async IO pretty comprehensively. A little bugzilla archeology suggests that libldap had troubles with async IO (on Linux, particularly) up to about 2002, but it _looks_ like it's been addressed (fingers crossed!).
Annoyingly, the C++ LDAP code calls only the blocking versions of the libldap APIs. But usually with a zero timeout, so it can poll repeatedly and use them in a kind-of async manner.

> As a quick experiment I changed prldap to request a nonblocking socket (see below), but that fails earlier. We'd need to change our ldap code to expect the typical "would block" scenario from sockets, and try again.

Yes it needs to be handled higher up - setting the libldap session to async, and calling the async variants of the API.
The `LDAP_X_EXTIOF_OPT_NONBLOCKING` is (I think) just used to pass into hooked IO functions, so they know libldap wants a non-blocking socket.

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Ben Campbell from comment #30)
> Annoyingly, the C++ LDAP code calls only the blocking versions of the libldap APIs. But usually with a zero timeout, so it can poll repeatedly and use them in a kind-of async manner.

I take that back. It does indeed already use the asynchronous libldap API calls.

Revision history for this message
In , Benc-q (benc-q) wrote :

I think I've figured out a major problem with async mode in ldap: the PR_ERROR_WOULD_BLOCK wasn't correctly being communicated from libprldap back to libldap. Sends in NSS layer would correctly return PR_ERROR_WOULD_BLOCK while waiting for handshaking to complete, but libldap wouldn't realise, and would just bail out.

I've written it up and posted a fix (Bug 1590976).

It's not the complete solution - there's still some extra async setup to do on the C++ `nsLDAPConnection` side of things, which I'll sort out a patch for tomorrow (today my LDAP code is thoroughly riddled with experimental hacks and printfs, so there's cleanup to do first :- )

Revision history for this message
In , Benc-q (benc-q) wrote :
Download full text (5.8 KiB)

So, I've got my socket set to `PR_SockOpt_Nonblocking` and I perform my first send - on my connection thread, not the main thread.
It's still doing the silly open-a-plain-TCP-socket-and-immediately-call-startTLS trick for now, but I'm pretty sure that's fine (and I think the SSL/TLS socketprovider does something similar internally).

But when during the handshake, I always get the not-on-the-STS-thread assert:

`Assertion failure: onSTSThread, at /fast/ben/tb/mozilla/security/manager/ssl/SSLServerCertVerification.cpp:1513`
(https://searchfox.org/mozilla-central/source/security/manager/ssl/SSLServerCertVerification.cpp#1513)

What's going wrong?
Should I be fetching the SocketTransportService and dispatching all my IO upon it? (that just seems... very wrong).
I feel like I've missed something.
Anything about this jump out at you, Kai?

My stacks traces look like this:
```
#0 0x00007ffff00196c8 in mozilla::psm::AuthCertificateHookInternal(mozilla::psm::TransportSecurityInfo*, void const*, std::unique_ptr<CERTCertificateStr, mozilla::UniqueCERTCertificateDeletePolicy> const&, std::unique_ptr<CERTCertListStr, mozilla::UniqueCERTCertListDeletePolicy>&, mozilla::Maybe<nsTArray<unsigned char> >&, mozilla::Maybe<nsTArray<unsigned char> >&, unsigned int, unsigned int) () at /fast/ben/tb/mozilla/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so
#1 0x00007ffff0019aac in mozilla::psm::AuthCertificateHook(void*, PRFileDesc*, int, int) () at /fast/ben/tb/mozilla/obj-x86_64-pc-linux-gnu/dist/bin/libxul.so
#2 0x00007ffff7508fe6 in ssl3_AuthCertificate (ss=0x7fffd709b000) at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:11067
#3 0x00007ffff7508f73 in ssl3_CompleteHandleCertificate (ss=0x7fffd709b000, b=<optimized out>, length=<optimized out>)
    at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:10962
#4 0x00007ffff750d259 in ssl3_HandleCertificate (ss=0x7fffd709b000, b=0x7fffd87d1004 "", length=2556) at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:10830
#5 0x00007ffff750d259 in ssl3_HandlePostHelloHandshakeMessage (ss=0x7fffd709b000, b=0x7fffd87d1004 "", length=2556)
    at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:12056
#6 0x00007ffff750a5c7 in ssl3_HandleHandshakeMessage (ss=0x7fffd709b000, b=0x7fffd87d1004 "", length=2556, endOfRecord=1)
    at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:11999
#7 0x00007ffff750ed79 in ssl3_HandleHandshake (ss=0x7fffd709b000, origBuf=<optimized out>) at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:12173
#8 0x00007ffff750ed79 in ssl3_HandleNonApplicationData (ss=0x7fffd709b000, rType=ssl_ct_handshake, epoch=<optimized out>, seqNum=<optimized out>, databuf=<optimized out>)
    at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:12692
#9 0x00007ffff750fd17 in ssl3_HandleRecord (ss=0x7fffd709b000, cText=0x7fffd7319730) at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3con.c:12974
#10 0x00007ffff7518798 in ssl3_GatherCompleteHandshake (ss=0x7fffd709b000, flags=<optimized out>) at /fast/ben/tb/mozilla/security/nss/lib/ssl/ssl3gthr.c:512
#11 0x00007ffff751a903 in ssl_GatherRecord1stHandshake (ss=0x7fffd709b000) at /fast/ben/tb/mozilla/security/nss/lib/ssl/sslcon.c:74
#12 0x00007ffff...

Read more...

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to Ben Campbell from comment #33)
> But when during the handshake, I always get the not-on-the-STS-thread assert:
>
> `Assertion failure: onSTSThread, at /fast/ben/tb/mozilla/security/manager/ssl/SSLServerCertVerification.cpp:1513`
> (https://searchfox.org/mozilla-central/source/security/manager/ssl/SSLServerCertVerification.cpp#1513)

On which thread does this happen? Is it a thread different from the main thread?
(If it isn't on the main thread: I'm curious if it works if you temporarily remove the assertion.)

Revision history for this message
In , Kai Engert (kaie) wrote :

Dana, AuthCertificateHookInternal asserts that it's running specifically on the STS thread.

Is it necessary to be that strict? Would it be OK to run it on some other background thread?
See comment 33.

Revision history for this message
In , Dkeeler (dkeeler) wrote :

Yes, it is necessary. The certificate verification implementation relies on the fact that it's blocking the socket thread in some places.

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Kai Engert (:kaie:) from comment #34)
> On which thread does this happen? Is it a thread different from the main thread?
> (If it isn't on the main thread: I'm curious if it works if you temporarily remove the assertion.)

Yes, I'm running it on a background thread (our LDAPConnection creates one to dispatch IO to. It used to start operations on the main thread and use the background thread to handle responses, but I'm moving all the IO onto the background thread).
Haven't tried removing the assert yet, but will give it a go when I get a chance.

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #36)
> Yes, it is necessary. The certificate verification implementation relies on the fact that it's blocking the socket thread in some places.

So, from the outside:
as someone just trying to open a secure socket connection (using the SSL/TLS or StartTLS socketproviders), on my own background thread, what am I doing wrong that causes me to hit this assertion?
Should I be dispatching all my IO on the SocketTransportService? (I mean, I could do that without too much trouble, but it seems very odd...)

Revision history for this message
In , Dkeeler (dkeeler) wrote :

What network APIs are you using? `nsIChannel.asyncOpen` should already dispatch to the network thread (my understanding is that has to be called from the main thread) (incidentally, we're trying to avoid creating too many threads, so it might be worthwhile to investigate if you can avoid creating a new background thread).

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #39)
> What network APIs are you using? `nsIChannel.asyncOpen` should already dispatch to the network thread (my understanding is that has to be called from the main thread) (incidentally, we're trying to avoid creating too many threads, so it might be worthwhile to investigate if you can avoid creating a new background thread).

It's currently opening `PRFileDesc`-based sockets using `PR_OpenTCPSocket` for the plain connection. Which works fine.
TLS/SSL used to work, using PR_OpenTCPSocket() followed by an immediate `nsTLSSocketProvider::AddToSocket()` and a `startTLS()`. This is what I'm trying to get working again.

(I'd also like to implement proper startTLS protocol (there's an LDAP operation to tell the server to switch), but that's a separate bug, and if I can get this TLS/SSL connection working then startTLS should work fine too).

I was under the impression that `nsIChannel` was for dealing with protocol-based connections (ie implemented in a nsIProtocolHandler), rather than plain socket connections? That doesn't seem applicable here, right?

Anyway. the LDAP stuff is old code, with some... idiosyncratic aspects. So if the preferred way of doing network IO has changed over the years, then I imagine this code hasn't yet heard the news :-)

Revision history for this message
In , Dkeeler (dkeeler) wrote :

I see. In that case, my understanding is all of that socket work should be done on the socket thread.

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #41)
> I see. In that case, my understanding is all of that socket work should be done on the socket thread.

Ahh, that's what I needed to know - I've now got it working.
Thanks for all your information Dana and Kai - it was very much appreciated!

Revision history for this message
In , Benc-q (benc-q) wrote :

Created attachment 9105105
1576364-dispatch-ldap-on-sts-1.patch

This patch gets LDAP up and running again over SSL/TLS connections (ie with the "SSL" checkbox ticked in the LDAP setup). It:

1. ensures libldap is placed into non-blocking mode
2. dispatches the ldap_simple_bind operation on the socket thread (rather than mainthread)
3. listens for LDAP responses in the socket thread (rather than on a custom thread)
4. moves a bunch of replicated error handling out of `nsLDAPOperation` and into `nsLDAPConnection::AddPendingOperation()`.

Item 2 is important, as this is where the SSL/TLS handshaking is done. The SSL/TLS socket layer defers this security negotiation until the first send occurs, and for this code that means in ldap_simple_bind.

The intention is to move all the other operations to dispatch on the socket thread too, but for practical purposes this fixes the bug. (The other operations will issue on the main thread, without blocking, but their responses will be handled on the socket thread).
Moving the rest of the operations over can be done in a follow-up patch - there's enough else going on here already.
(and better still would be to get rid of `nsLDAPOperation` altogether, but that I will write up in another bug).

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Comment on attachment 9105105
1576364-dispatch-ldap-on-sts-1.patch

Thanks for the patch, I'm happy to see this fixed so we can finally ship it in TB 68.2.x.

Looks like Kai is more familiar with the "plumbing" here, so I'm adding him as reviewer, but I'll take a look myself.

Revision history for this message
In , Kai Engert (kaie) wrote :

The patch doesn't seem to work for me on Linux. I tried on both comm-central and comm-esr68 (merged). With the patch applied, I never see a search result displayed. Ben, on which platform have you tested? Did you use the Adams LDAP server for testing, or something else?

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :
Download full text (3.5 KiB)

Comment on attachment 9105105
1576364-dispatch-ldap-on-sts-1.patch

Hmm, I'm afraid that Kai is right. No SSL still works, but SSL doesn't. In the AB search nothing comes up and using auto-complete, I crash on MOZ_ASSERT(onSTSThread); in a debug build:

xul.dll!mozilla::psm::AuthCertificateHookInternal(mozilla::psm::TransportSecurityInfo * infoObject, const void * aPtrForLogging, const std::unique_ptr<CERTCertificateStr,mozilla::UniqueCERTCertificateDeletePolicy> & serverCert, std::unique_ptr<CERTCertListStr,mozilla::UniqueCERTCertListDeletePolicy> & peerCertChain, mozilla::Maybe<nsTArray<unsigned char> > & stapledOCSPResponse, mozilla::Maybe<nsTArray<unsigned char> > & sctsFromTLSExtension, unsigned int providerFlags, unsigned int certVerifierFlags) Line 1525 C++
xul.dll!mozilla::psm::AuthCertificateHook(void * arg, PRFileDesc * fd, int checkSig, int isServer) Line 1611 C++
nss3.dll!ssl3_AuthCertificate(sslSocketStr * ss) Line 11069 C
nss3.dll!ssl3_CompleteHandleCertificate(sslSocketStr * ss, unsigned char * b, unsigned int length) Line 10962 C
nss3.dll!ssl3_HandleHandshakeMessage(sslSocketStr * ss, unsigned char * b, unsigned int length, int endOfRecord) Line 11999 C
[Inline Frame] nss3.dll!ssl3_HandleHandshake(sslSocketStr * ss, sslBufferStr * origBuf) Line 12173 C
nss3.dll!ssl3_HandleNonApplicationData(sslSocketStr * ss, <unnamed-tag> rType, unsigned short epoch, unsigned __int64 seqNum, sslBufferStr * databuf) Line 12692 C
nss3.dll!ssl3_HandleRecord(sslSocketStr * ss, SSL3Ciphertext * cText) Line 12974 C
nss3.dll!ssl3_GatherCompleteHandshake(sslSocketStr * ss, int flags) Line 514 C
nss3.dll!ssl_GatherRecord1stHandshake(sslSocketStr * ss) Line 74 C
nss3.dll!ssl_Do1stHandshake(sslSocketStr * ss) Line 41 C
nss3.dll!ssl_SecureSend(sslSocketStr * ss, const unsigned char * buf, int len, int flags) Line 920 C
nss3.dll!ssl_Send(PRFileDesc * fd, const void * buf, int len, int flags, unsigned int timeout) Line 3125 C
xul.dll!PSMSend(PRFileDesc * fd, const void * buf, int amount, int flags, unsigned int timeout) Line 1205 C++
prldap60.dll!prldap_write(int s, const void * buf, int len, lextiof_socket_private * socketarg) Line 204 C
ldap60.dll!ber_flush(sockbuf * sb, berelement * ber, int freeit) Line 0 C
ldap60.dll!nsldapi_send_ber_message(ldap * ld, sockbuf * sb, berelement * ber, int freeit, int epipe_handler) Line 450 C
ldap60.dll!nsldapi_send_server_request(ldap * ld, berelement * ber, int msgid, ldapreq * parentreq, ldap_server * srvlist, ldap_conn * lc, char * bindreqdn, int bind) Line 319 C
ldap60.dll!nsldapi_send_initial_request(ldap * ld, int msgid, unsigned long msgtype, char * dn, berelement * ber) Line 136 C
ldap60.dll!nsldapi_search(ldap * ld, const char * base, int scope, const char * filter, char * * attrs, int attrsonly, ldapcontrol * * serverctrls, ldapcontrol * * clientctrls, int timelimit, int sizelimit, int * msgidp) Line 202 C
ldap60.dll!ldap_search_ext(ldap * ld, const char * base, int scope, const char * filter, char * * attrs, int attrsonly, ldapcontrol * * serverctrls, ldapcontrol * * clientctrls, timeval * timeoutp, int sizelimit, int * msgidp) Line 129 C
xul.dll!nsLDAPOperation::SearchExt(const nsTSubstring<char> & aBaseDn...

Read more...

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

P.S.: Yes, using ldap.adams.edu (Base DN: ou=people,dc=adams,dc=edu).

Revision history for this message
In , Benc-q (benc-q) wrote :

Gah. Will get back into it.
Yes, I was using ldap.adams.edu.

I _think_ it might be because I only moved the `SimpleBind` operation to dispatch to STS. That's the first operation which performs a IO Send, and it's the first Send which kicks off the handshaking. I left the others because I figured it'd be better land the least-disruptive fix possible first.
But Jorg's trace has the handshaking happening during the `SearchExt` operation. I suspect it's either a timing thing, or depends on how you initiate the lookup (maybe some paths don't bother performing the simplebind if the username/password are empty).
Either way, the other operations should be moved onto the STS thread anyway, so I'll start on that.

Revision history for this message
In , Mojmir21 (mojmir21) wrote :

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

Mozilla.cfg:
getLDAPAttributes("ldap.***.***.cz","ou=people,dc=fio,dc=cz","uid=" + env_user,"uid,cn,mail,labeledURI");

Actual results:

Error: getLDAPAttibutes fail: ReferenceError: processLDAPValues is not defined

Expected results:

It should work with "getLDAPAttributes", not "getLDAPAttibutes". Last version not afected is 60.9.0

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Mike, looks like a typo here in Mozilla core code:
https://searchfox.org/comm-central/rev/a1c52b2c68bc40266e42e193a785106e767db946/mozilla/extensions/pref/autoconfig/src/prefcalls.js#158

I'll do a patch and you can find the right component for it.

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Created attachment 9105527
Bug 1592922 - Fix typo in autoconfig's prefcalls.js. r=mkaply DONTBUILD

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

So what's the issue here? That there is a typo in the error message or that the function doesn't work?

Revision history for this message
In , Mojmir21 (mojmir21) wrote :

(In reply to Jorg K (GMT+2) from comment #3)
> So what's the issue here? That there is a typo in the error message or that the function doesn't work?

The function doesn't work.

Revision history for this message
In , Pulsebot (pulsebot) wrote :

Pushed by <email address hidden>:
https://hg.mozilla.org/integration/autoland/rev/eecadaaac3e5
Fix typo in autoconfig's prefcalls.js. r=mkaply DONTBUILD

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Mike, thanks for the review. I fully agree about the DONTBUILD, although I've see M-C developers push comment changes of a few lines causing an entire build.

Now that we fixed the typo, what can be said about the malfunction? What is that LDAP stuff residing in M-C? Does that call into TB's LDAP handling? It does, incredible:
https://searchfox.org/comm-central/rev/a1c52b2c68bc40266e42e193a785106e767db946/mozilla/extensions/pref/autoconfig/src/prefcalls.js#146
https://searchfox.org/comm-central/search?q=nsILDAPURL&case=false&regexp=false&path=
https://searchfox.org/comm-central/search?q=nsILDAPSyncQuery&case=false&regexp=false&path=

Note that TB's LDAP handling with SSL is broken in TB 68, see bug 1576364.

Ben, more LDAP chagrin.

Revision history for this message
In , Benc-q (benc-q) wrote :

Hmm.

1. The offending function, `processLDAPValues` doesn't appear to exist anywhere:
https://searchfox.org/comm-central/search?q=processLDAPValues&path=
Comment just above where it's called says "user supplied method".
I had to look it what the autoconfig module actually is - it's [MCD, Mission Control Desktop, AKA AutoConfig](https://developer.mozilla.org/en-US/docs/Archive/Misc_top_level/MCD,_Mission_Control_Desktop_AKA_AutoConfig)). The linked page has an example `processLDAPValues` function.

2. I really don't know what the status of the "autoconfig" module is. Even supplying a `processLDAPValues` function, I suspect there might be issues...

3. If it's supposed to work under both TB and Firefox, presumably its LDAP support should be disabled under Firefox, given that the LDAP support won't even be available? If it's TB-specific, then presumably the whole module should be moved out into comm-central.

Revision history for this message
In , Shindli (shindli) wrote :
Revision history for this message
In , Benc-q (benc-q) wrote :

Created attachment 9105715
1576364-dispatch-ldap-on-sts-2.patch

Not a 100% completed patch yet, but hitting the weekend here, so I'm uploading what I've got - it works for me as is (addressbook lookup via SSL connection to the Adams server) so I'm hoping it'll work for someone else too!

It moves the bulk of the LDAP operations onto the socket thread.
(still got to move the two-part BindSASL/StepSASL over, but I don't think it'll stop anything working right now if you're just using the adams ldap server to test against with no fancy login credentials).
I also need to tidy it up and review it thoroughly myself - there's a lot of churn, so lots of scope for screwing something up with a typo.

Revision history for this message
In , Mkmelin+mozilla (mkmelin+mozilla) wrote :

I can confirm this patch seems to work.

Revision history for this message
In , Kai Engert (kaie) wrote :

Only a subset of operations works for me.

In address book, if the left side stays with the default "all address books", then no search results are shown.
However, if I explicitly select the configured LDAP directory, then I get search results.

The same can be seen in composer, clicking F9 to open the search pane.

I get the same successes and failures regardless of the OCSP enabled/disabled setting, that's an improvement.

Revision history for this message
In , Mkmelin+mozilla (mkmelin+mozilla) wrote :

I think that has been flakey earlier too. Do you see it without SSL?

Revision history for this message
In , Kai Engert (kaie) wrote :

After additional testing, the behavior I reported in comment 51 looks like a separate regression after ESR68.

Using TB 68, with SSL disabled, the lookup in addressbook works in both scenarios, either having selected "all address books" or "Adams" specifically. Same behaviour in composer F9.

Using latest c-c (TB 72), without Ben's patch, with SSL disabled, only selecting "Adams" gives me results, and nothing for "all".
I think we can handle that as a separate regression, I'll file a bug.

Once we're ready to backport Ben's patch to TB 68, we'll have to test if it works for the "all address books" scenarios, too.

In any case, the latest patch is indeed an improvement, it works for me with both SSL and OCSP enabled, even with OCSP required.

I understand that Ben wants to do cleanup prior to requesting review, so I'll wait for him to request it.

Revision history for this message
In , Benc-q (benc-q) wrote :

Created attachment 9106096
1576364-dispatch-ldap-on-sts-3.patch

This one has the last couple of operations shifted off the main thread, and it all still seems to fix the bug for me.
I can't claim it does much for code readability... but ideally it'll all get swept away by bug 1592500 one day!

Revision history for this message
In , Kai Engert (kaie) wrote :

Comment on attachment 9106096
1576364-dispatch-ldap-on-sts-3.patch

Review of attachment 9106096:
-----------------------------------------------------------------

Just minor comments, but please (at least) move the call to ldap_set_option, r=kaie with that.
Postponing the additional work into the separate bugs you filed seems ok.

::: ldap/xpcom/src/nsLDAPConnection.cpp
@@ +518,5 @@
> //
> mConnectionHandle =
> ldap_init(mResolvedIP.get(),
> mPort == -1 ? (mSSL ? LDAPS_PORT : LDAP_PORT) : mPort);
> + ldap_set_option(mConnectionHandle, LDAP_OPT_ASYNC_CONNECT, LDAP_OPT_ON);

should be moved into the else branch below (don't execute if handle is null)

::: ldap/xpcom/src/nsLDAPOperation.cpp
@@ +317,5 @@
> + LDAP *ld = LDAPHandle();
> + int32_t msgID = ldap_simple_bind(ld, mBindName.get(), mPasswd.get());
> +
> + if (msgID == -1) {
> + NotifyLDAPError();

Maybe keep TranslateLDAPErrorToNSError(), and pass it to NotifyLDAPError() (multiple places), so it will be available when working on bug 1592449.

@@ +506,5 @@
> + }
> +
> + SetID(msgID);
> + // Register the operation to pick up responses.
> + Conn()->AddPendingOperation(msgID, mOp);

Could Conn() ever be null?

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Any news here? I'd like to ship this in the next beta asap. so we can get it into TB 68 soon.

Revision history for this message
In , Kai Engert (kaie) wrote :

(In reply to Kai Engert (:kaie:) from comment #55)
> Maybe keep TranslateLDAPErrorToNSError(), and pass it to NotifyLDAPError()
> (multiple places), so it will be available when working on bug 1592449.

Looks like my suggestion is unnecessary, because NotifyLDAPError() has access to lderrno which can be used to obtain the failure.

> > + Conn()->AddPendingOperation(msgID, mOp);
>
> Could Conn() ever be null?

The previous code didn't check for null at this place, so we can skip that for now.

I'll make the single change that I consider necessary, and attach an updated revision, so Jörg can proceed.

Revision history for this message
In , Kai Engert (kaie) wrote :

Created attachment 9106832
1576364-dispatch-ldap-on-sts-3-b.patch

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Thanks, Kai!

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Sorry, this doesn't work for my in SSL mode. No Alvarez found. This doesn't look good either:

[15920, Socket Thread] ###!!! ASSERTION: nsLDAPSSLConnect(): called for blocking connection: 'options & LDAP_X_EXTIOF_OPT_NONBLOCKING', file c:/mozilla-source/comm-central/comm/ldap/xpcom/src/nsLDAPSecurityGlue.cpp, line 108

Revision history for this message
In , Kai Engert (kaie) wrote :

Jörg, you tested on Windows, correct?
I've only tested on Linux, which worked for me.

Could this be a platform specific race? I suggest we make a try build, so we have something for reliably testing across platforms.

Revision history for this message
In , Kai Engert (kaie) wrote :

Jörg, also, did you consider bug 1593347 while testing? You must select the LDAP directory in the address book when testing with c-c.

Revision history for this message
In , Kai Engert (kaie) wrote :
Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Since bug 1593347 is fixed, I tested LDAP directory search, "All Addressbooks" search and address auto-complete. Nothing worked on Windows with SSL enabled.

Revision history for this message
In , Kai Engert (kaie) wrote :

Ok. I tested with a fresh local build on Linux, and it works for me, all scenarios. I'll test Windows once the above try build is ready.

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Nice try run ;-) - Of course I used a local debug build. Your 32bit Windows build on Windows is ready, but it doesn't work either. Port 636, right, the one I get when switching on SSL?

Revision history for this message
In , Kai Engert (kaie) wrote :

Yes, everything you said is correct.
I confirm, both win32 and win64 builds don't work. Nothing in error console.
OSX works, Linux works.
I'll start a local Windows debug build. If nobody is quicker than me, I could try to debug it tomorrow.

Revision history for this message
In , Benc-q (benc-q) wrote :

(In reply to Jorg K (GMT+2) from comment #60)
> Sorry, this doesn't work for my in SSL mode. No Alvarez found. This doesn't look good either:
>
> [15920, Socket Thread] ###!!! ASSERTION: nsLDAPSSLConnect(): called for blocking connection: 'options & LDAP_X_EXTIOF_OPT_NONBLOCKING', file c:/mozilla-source/comm-central/comm/ldap/xpcom/src/nsLDAPSecurityGlue.cpp, line 108

This assert means that libldap isn't asking the socket layer for a non-blocking socket... Which will cause _everything_ to screw up.
Maybe libldap does something different on windows. Looking into it.

Revision history for this message
In , Benc-q (benc-q) wrote :

Ahh yeah, this'll probably do it:

https://searchfox.org/comm-central/source/ldap/c-sdk/include/portable.h#164

```
/*
 * Async IO. Use a non blocking implementation of connect() and
 * dns functions
 */
#if !defined(LDAP_ASYNC_IO)
# if !defined(_WINDOWS) && !defined(macintosh)
# define LDAP_ASYNC_IO
# endif /* _WINDOWS */
#endif

```
It looks like the default is to not compile in async IO support on windows (And mac. But I'm not sure that `macintosh` is actually `#defined`...)

So, hacking in a `#define LDAP_ASYNC_IO` might get it working.

If `LDAP_ASYNC_IO` is undefined then `ldap_set_option(...LDAP_OPT_ASYNC_CONNECT...)` becomes a no-op. And likely some other async support
https://searchfox.org/comm-central/search?q=LDAP_ASYNC_IO&path=

I _think_ libldap should support async IO just fine on Windows. We don't really use the built-in libldap socket stuff anyway - we replace all the IO with an unholy combination of `libprldap` and [`nsLDAPSecurityGlue.cpp`](https://searchfox.org/comm-central/source/ldap/xpcom/src/nsLDAPSecurityGlue.cpp). But our replacement `connect` routines still want libldap to pass in `LDAP_X_EXTIOF_OPT_NONBLOCKING` flag so they know to go async.

Revision history for this message
In , Kai Engert (kaie) wrote :

Looks like ```macintosh``` was used for old Mac OS 9, which explains why it worked in my OSX testing.
(according to https://sourceforge.net/p/predef/wiki/OperatingSystems/ )

I can try to remove ```if !defined(_WINDOWS) && !defined(macintosh)``` and build on Windows.

Revision history for this message
In , Kai Engert (kaie) wrote :

Created attachment 9107132
1576364-enable-async-windows.patch

I confirm this change fixes it on Windows.

We might want to report this change to upstream per https://wiki.mozilla.org/LDAP_C_SDK

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Comment on attachment 9107132
1576364-enable-async-windows.patch

Thanks, WFM.

Revision history for this message
In , Pulsebot (pulsebot) wrote :

Pushed by <email address hidden>:
https://hg.mozilla.org/comm-central/rev/a8daa92bf143
Dispatch LDAP operations on socket thread. r=kaie
https://hg.mozilla.org/comm-central/rev/339135a7c2a3
Enable LDAP async IO on Windows. r=jorgk DONTBUILD

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :
Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Created attachment 9108662
ldap.patch - ESR68

Hi Ben,
your patch doesn't apply to comm-esr68, but this one applies **if** we also uplift
https://hg.mozilla.org/comm-central/rev/63b40a60574ad6bb461b26fc35ca88b64c53e9e4
https://hg.mozilla.org/comm-central/rev/da918af75788db82c7aa0efa6eefba046445acfb
(the former has one hunk in ldap/xpcom/src/nsLDAPURL.cpp which doesn't apply).

So the choices are: Rebase heavily or uplift these two. What do you think?

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Comment on attachment 9106832
1576364-dispatch-ldap-on-sts-3-b.patch

Sorry, doesn't apply, huge clashes with bug 1562157 in nsLDAPOperation.cpp.

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Comment on attachment 9108662
ldap.patch - ESR68

Sorry, the original patch does apply if I uplift some cosmetic tweaks first and of course bug 1562157.

Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :
Revision history for this message
In , Jorgk-bmo (jorgk-bmo) wrote :

Working in my TB 68 build, Leslie is there.

Olivier Tilloy (osomon)
summary: - Typo in thunderbird code prevents central configuration
+ [upstream] Typo in thunderbird code prevents central configuration
Changed in thunderbird (Ubuntu):
importance: Undecided → High
tags: added: regression-update
Changed in thunderbird:
importance: Unknown → Medium
status: Unknown → New
Revision history for this message
Launchpad Janitor (janitor) wrote : Re: [upstream] Typo in thunderbird code prevents central configuration

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in thunderbird (Ubuntu):
status: New → Confirmed
Revision history for this message
Ingar Smedstad (ingsme) wrote :

Any ETA for this? We use this to configure thunderbird for more than 100 users…

Changed in thunderbird (Ubuntu):
assignee: nobody → Olivier Tilloy (osomon)
tags: added: rls-bb-incoming
Revision history for this message
Olivier Tilloy (osomon) wrote :

The typo is in the error message only, it's not what's causing the error itself. This is more likely https://bugzilla.mozilla.org/show_bug.cgi?id=1576364 (LDAP broken in thunderbird 68 with SSL), which appears to be fixed by https://hg.mozilla.org/releases/comm-esr68/rev/53d07d9d7f746ebcdc46e98d8acc5ce3cf261d95, which is available in 68.3.1, which I'm working on bringing to bionic.

summary: - [upstream] Typo in thunderbird code prevents central configuration
+ [upstream] ReferenceError: processLDAPValues is not defined
Changed in thunderbird:
importance: Medium → Unknown
status: New → Unknown
Changed in thunderbird:
importance: Unknown → Medium
status: Unknown → Fix Released
Revision history for this message
Ingar Smedstad (ingsme) wrote :

Ah. I see. That makes sense. I'll confess I did not study the code and the fix since I thought I had found the culprit.

Anyway, someone is on the case and a solution is forthcoming. Can't ask for more then that.

Thank you.

Olivier Tilloy (osomon)
Changed in thunderbird (Ubuntu Bionic):
assignee: nobody → Olivier Tilloy (osomon)
importance: Undecided → High
status: New → Confirmed
tags: removed: rls-bb-incoming
Revision history for this message
Olivier Tilloy (osomon) wrote :

This should now be fixed, as bionic and eoan have thunderbird 68.4.1, while focal has 68.5.0.
Feel free to re-open with a comment detailing the version you're running if this is still a problem.

Changed in thunderbird (Ubuntu Bionic):
status: Confirmed → Triaged
status: Triaged → Fix Released
Changed in thunderbird (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
In , Mozilla-kaply (mozilla-kaply) wrote :

This broke LDAP autoconfig.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.