GC-unsafe SB-ALIEN string deporting

Bug #308949 reported by Nikodemus Siivola on 2008-12-17
4
Affects Status Importance Assigned to Milestone
SBCL
Medium
Unassigned

Bug Description

Translating a Lisp string to an alien string by taking a SAP to it
as done by the :DEPORT-GEN methods for C-STRING and UTF8-STRING
is not safe, since the Lisp string can move. For example the
following code will fail quickly on both cheneygc and pre-0.9.8.19
GENCGC:

  (setf (bytes-consed-between-gcs) 4096)
  (define-alien-routine "strcmp" int (s1 c-string) (s2 c-string))

  (loop
    (let ((string "hello, world"))
       (assert (zerop (strcmp string string)))))

(This will appear to work on post-0.9.8.19 GENCGC, since
the GC no longer zeroes memory immediately after releasing
it after a minor GC. Either enabling the READ_PROTECT_FREE_PAGES
#define in gencgc.c or modifying the example so that a major
GC will occasionally be triggered would unmask the bug.)

On cheneygc the only solution would seem to be allocating some alien
memory, copying the data over, and arranging that it's freed once we
return. For GENCGC we could instead try to arrange that the string
from which the SAP is taken is always pinned.

For some more details see comments for (define-alien-type-method
(c-string :deport-gen) ...) in host-c-call.lisp.

New test-case better at provoking this:

(declaim (inline strcmp))
(define-alien-routine "strcmp" int
  (s1 (c-string :external-format :utf-8))
  (s2 (c-string :external-format :utf-8)))

(defvar *run* t)

(defun test ()
  (loop while *run*
        do (let ((string #.(coerce "hello, world" 'base-string)))
             (assert (zerop (strcmp string string))))))

(let ((test-thread (sb-thread:make-thread
                    (lambda ()
                      (loop while *run*
                            do (let ((string #.(coerce "hello, world" 'base-string)))
                                 (assert (zerop (strcmp string string))))))
                    :name "test"))
      (control-thread (sb-thread:make-thread
                       (lambda ()
                         (let* ((start (get-internal-real-time))
                                (stop (+ start (* 60 internal-time-units-per-second))))
                           (loop while (< (get-internal-real-time) stop)
                                 do (gc :full t)))
                         (setf *run* nil))
                       :name "control")))
  (sb-thread:join-thread control-thread)
  (sb-thread:join-thread test-thread))

Even if the test-case seems to run nicely, interrupting it is likely to result
in a memory fault, and trying to recompile test test-case ends up in LDB.

Changed in sbcl:
importance: Undecided → Medium
status: New → Confirmed
importance: Medium → High
description: updated
Nikodemus Siivola (nikodemus) wrote :

Marking as NEW again while waiting for a chance to replicate this.

Changed in sbcl:
importance: High → Undecided
status: Confirmed → New
description: updated
Changed in sbcl:
importance: Undecided → High
status: New → Confirmed

Juho Snellman committed a fix for this on x86oid (gencgc) systems as 1.0.3.11.

I have tried the "new test case", with random interruptions, on both an x86-linux and an x86-64-linux build post-1.0.27, and it was perfectly reliable, suggesting to me that the test case is more of a threading stress test than anything else.

On unicode builds, we will usually have to allocate a fresh copy of the string anyway just for external-format handling. I was unable to determine if this is non-heap space or not through a quick browse of the source, but it could be a usable approach.

Nikodemus Siivola (nikodemus) wrote :

I can still replicate this on current HEAD on Darwin/x86. Dropping to Medium, and changed tags to reflect that.

tags: added: os-darwin
removed: gc-cheney gc-gencgc
Changed in sbcl:
importance: High → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers