Problems with dynamic-extent closures

Bug #681092 reported by Stas Boukarev on 2010-11-24
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Alastair Bridgewater

Bug Description

(defun foo ()
  (declare (optimize speed))
  (let ((c 0))
    (flet ((bar () c))
      (declare (dynamic-extent #'bar))
      (do () ((list) (bar))
        (setf c 10)
        (return (bar))))))

(foo) returns -312207914 instead of 10

jamieforth (jamieforth) wrote :

In (foo) returns 10
In (foo) returns FOO

Thank you!

What's happening is that the function is being inlined completely rather than producing a local call or explicit closure, thus not introducing a new stack frame (and thus a complete closure environment) but a different physenv is still involved, leading the IR2 conversion of the REF node to believe that an ANCESTOR-FRAME-REF is required, and for some reason it picks up the /calling/ frame to read from.

Changed in sbcl:
importance: Undecided → Critical
status: New → In Progress
assignee: nobody → Alastair Bridgewater (alastair-bridgewater)
Paul Khuong (pvk) wrote :

Seems to me the issue is that bar is in tail position and a local call. This results in the parent frame actually being bar's grand-parent's frame.

Paul, you are quite right, the problem is tail-calls, in this case tail-local-calls.

Attached is a first stab at a fix for this. It makes the specific test case in this bug function as expected, but is very likely to leave further problems in its wake. The first problem is that there is no guarantee that the new frame won't overwrite the stack slot containing the closure value. The second problem is that a tail-full-call that takes a d-x closure as an argument will also overwrite the stack slot. The only plausible solution to the second problem is to disallow tail-calls when there is a "live" d-x closure.

It occurs to me that the second problem (tail-call with a d-x closure as a parameter) is a non-issue because it's predicated on the existence of a stack-allocated closure object, the cleanup for which would prevent any call being in the tail position.

A bare-minimum KLUDGE for the first problem (lifetime for closure value cell TNs post-tail-call) could be to add all of the "implicit" value cells to the environment-live list for the callee function during ir2tran, though that would depend on the order of conversion of nested tail-calls, which isn't something that I'm comfortable relying on long-term.

So, here's part two: Because the tail-call involves switching PHYSENVs, in order to make absolutely sure the storage for a closed-over variable doesn't get reallocated, make the variable TN a component-live TN instead of a physenv-live TN. A massive KLUDGE, but easier than figuring out how to make the TNs live over the tail-set of the home lambda of the variable.


Changed in sbcl:
status: In Progress → Fix Committed
Nikodemus Siivola (nikodemus) wrote :

In 1.0.45 release.

Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers