Steel Bank Common Lisp

socket-connect is not thread safe

Reported by Attila Lendvai on 2010-01-10
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

as reported by Jaap de Heer on sbcl-devel: http://thread.gmane.org/gmane.lisp.steel-bank.devel/14396

somtimes from socket-connect:

Socket error in "socket": EPROTONOSUPPORT (Protocol not supported)
  [Condition of type SB-BSD-SOCKETS:PROTOCOL-NOT-SUPPORTED-ERROR]

reproducible test-case:

(dotimes (i 2)
 (sb-thread:make-thread
  (lambda ()
    (dotimes (j 100000)
      (let ((socket (make-instance 'sb-bsd-sockets:inet-socket :type :datagram :protocol :udp)))
        (sb-bsd-sockets:socket-connect socket #(10 0 0 137) 1234)
        (sb-bsd-sockets:socket-close socket))))))

Within a few thousand socket-connects, one of the threads will
throw the above error.
The same happens with TCP, and I *think* also with one thread
doing UDP connects to host X and another thread doing TCP
connects to host Y.
With only one thread, the error doesn't occur.

the problem (analysis by John Fremlin):

The shared-initialize calls getprotobyname which is not thread safe.

There is little need to call this function at runtime anyway as these
proto numbers are quite stable, but getprotobyname_r is a safe
alternative (on GNU/Linux).

You can see the issue clearly like this

(let ((p (sb-bsd-sockets:get-protocol-by-name "udp")))
    (dotimes (i 2)
      (sb-thread:make-thread
       (lambda ()
  (dotimes (j 100000) (let ((q (sb-bsd-sockets:get-protocol-by-name "udp")))
          (assert (= p q))))))))

Vsevolod Dyomkin (vseloved) wrote :

I am encountering the similar problem on the last version of sbcl (1.0.34).
I have a crawler module, that uses drakma and also socket connections to Redis.
After some time of the crawler operation it hangs and this happens in the call to SB-BSD-SOCKETS:GET-PROTOCOL-BY-NAME. And the hang is indefinite. But if I do (setf (symbol-function 'SB-BSD-SOCKETS:GET-PROTOCOL-BY-NAME) (constantly 6)) the problem immediately goes away. Apparently, the problem appears due to drakma's heavy reuse of sockets.

I will try to build SBCL with getprotobyname_r instead of getprotobyname and see the results of that (and follow-up).

Changed in sbcl:
status: New → Confirmed
importance: Undecided → Medium
tags: added: contribs threads
Cyrus Harmon (ch-launchpad) wrote :

Huh. Maybe this explains why I see this error on my website and it crashes every few weeks or so.

Changes to contrib/sb-bsd-sockets/constants.lisp and contrib/sb-bsd-sockets/inet.lisp accomplish this. I'm attaching a couple of diffs which appear to do the trick, although there might be some problems on different platforms (Solaris uses a 4 arg getprotobyname_r function).

In addition, get-protocol-by-name has been modified to return multiple values as outlined in the "brownie points comment". The protocol number is still the first value, and the second and third values are the canonical name and aliases respectively, though it appears that RSPF is the only protocol in my /etc/protocols file that specifies multiple aliases. I do hope the added information can be useful.

constants is changed to add the getprotobyname_r function.

If SB-THREAD is not in features, the old get-proto-by-name (apart from the new return value) is used.

Works on Linux AFAICT.

Martin Cracauer (cracauer) wrote :

Committed suggested patches to SBCL 1.0.40.7 after testing.

Thanks everybody!

Changed in sbcl:
status: Confirmed → Fix Committed
Cyrus Harmon (ch-launchpad) wrote :

This breaks x86-64/darwin (and presumably x86/darwin, and ppc/darwin once nyef finishes the threads port), and other OSes with threads but without getprotobyname_r.

James Y Knight (foom) wrote :

Threads cannot really be said to be supported on a platform without this functionality or some replacement thereof, however... Obviously the change could just be conditioned on #-darwin, but that won't actually *fix* the problem on darwin w/threads.

Do you have any idea if there's a replacement function under a different name for Darwin?

Or, for a totally different way to fix things, it might be best to just add the set of IPPROTO_* constants to the constants list (sb-grovel already seems to do the appropriate #ifdef stuff to only import the name if it exists on the system's header file), and then just include directly in SBCL a keyword->int translation list for those constants. (I'd suggest simply exporting the constants, and making users do "sb-bsd-sockets:+ipproto-tcp+", but for backwards compatibility)...

 status confirmed
 done

Reverting this as 1.0.40.8 until a replacement patch that doesn't break
Darwin is developed.

Christophe

Martin Cracauer <email address hidden> writes:

> Committed suggested patches to SBCL 1.0.40.7 after testing.
>
> Thanks everybody!
>
> ** Changed in: sbcl
> Status: Confirmed => Fix Committed

Changed in sbcl:
status: Fix Committed → Confirmed
Changed in sbcl:
assignee: nobody → Nikodemus Siivola (nikodemus)
status: Confirmed → In Progress
Nikodemus Siivola (nikodemus) wrote :

In SBCL 1.0.42.14.

Changed in sbcl:
assignee: Nikodemus Siivola (nikodemus) → nobody
status: In Progress → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers