socket-receive doesn't check for errors

Bug #1426667 reported by Frank James on 2015-02-28
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

A bug in sb-bsd-sockets:socket-receive was introduced sometime around september 2014. The new code assumes the return value of recvfrom is always the length of the received buffer, but recvfrom can also return -1 on error. This was detected by the older code, but this check no longer seems to be done. I just called socket-receive and got a buffer length of 4294967295 i.e. -1 cast as unsigned int, the buffer itself seemed to be filled with garbage from the heap (contents of malloc'ed buffer).

You can repeat by sending a UDP datagram to a local port not listening for UDP traffic then waiting for a reply with socket-receive. On windows this returns immediately with the above behaviour (I can't comment on other platforms).

I'm using sbcl 1.2.1 on Windows, but looking at the relevant code on github suggests the bug should be on all platforms.

Frank James (frank-a-james-0) wrote :

I've done a little bit more investigating and It looks like the -1 check actually is getting performed. The cause of the bug is actually more simple: recvfrom returns a (32-bit) signed int, but the recvfrom ffi definition has a return type of ssize_t, I assume that's a 64bit int on x86_64 systems. This would cause the -1 error status to get marshalled into Lisp incorrectly.

Stas Boukarev (stassats) wrote :

recvfrom returns ssize_t here.

Frank James (frank-a-james-0) wrote :

On windows recvfrom returns a 32 bit signed int, what is ssize_t on x64 builds? This is a real bug, I can provide example code for you if required. I don't think it's a lot of work to fix.

Stas Boukarev (stassats) wrote :

Do provide.

Frank James (frank-a-james-0) wrote :

Sure thing. We need to have a way of ensuring the socket-receive always fails. We can exploit the behaviour described here:

This allows us to define a procedure which should always signal an error from recvfrom.

(defun should-fail (port)
  (let ((socket (make-instance 'sb-bsd-sockets:inet-socket
          :type :datagram
          :protocol :udp)))
    (sb-bsd-sockets:socket-bind socket #(0 0 0 0) 0)
    (sb-bsd-sockets:socket-connect socket #(127 0 0 1) port)
    (sb-bsd-sockets:socket-send socket (make-array 16 :element-type '(unsigned-byte 8)) 16)
    (let ((buffer (make-array 16 :element-type '(unsigned-byte 8))))
      (sb-bsd-sockets:socket-receive socket buffer 16
         :element-type '(unsigned-byte 8))))
      (sb-bsd-sockets:socket-close socket))))

I assume noone is listening for UDP on port 111 on your machine, if so choose a different port.

Using sbcl 1.2.7 (x86) I get correct behaviour:

CL-USER> (should-fail 111)

Socket error in "recvfrom": 10054 (An existing connection was forcibly closed by the remote host.)
   [Condition of type SB-BSD-SOCKETS:SOCKET-ERROR]

 0: [RETRY] Retry SLIME REPL evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] Abort thread (#<THREAD "new-repl-thread" RUNNING {26FBAB31}>)

Using sbcl 1.2.7 (x64) I get incorrect behaviour:

CL-USER> (should-fail 111)
#(2 0 0 111 127 0 0 1 0 0 0 0 0 0 0 0)
#(127 0 0 1)

Stas Boukarev (stassats) wrote :

In 2d8f5ea1d00e7c6b23aad2b44b1485cd8d5983e0

Changed in sbcl:
status: New → Fix Committed
Cyrus Harmon (ch-launchpad) wrote :

Running this test on x86-64 on linux 3.19 (and other roughly similarly-aged kernels) hangs. portmapper is listening on port 111. Perhaps that's the source of the problem. In any event, this breaks the build for me. Is the right thing to choose a different port and hope we get lucky, automagically determine the right port to use, or, back this test out?


Frank James (frank-a-james-0) wrote :

I don't know if this test would even succeed on Linux, does Linux signal an ECONNRESET in the same way Windows does? I used the codes above because I'm writing an NFS and was encountering this bug (that's why I used port 111).

Stas Boukarev (stassats) wrote :

Cyrus, does it work with a different port for you?

Cyrus Harmon (ch-launchpad) wrote :

I'm not sure I understand this well enough to say if it works, but if I use port 11111, it doesn't hang, and it doesn't fail, so, I guess so.

Stas Boukarev (stassats) wrote :

I'll just remove the test.

Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers