Problems with dynamic-extent closures

Bug #681092 reported by Stas Boukarev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Critical
Alastair Bridgewater

Bug Description

(defun foo ()
  (declare (optimize speed))
  (let ((c 0))
    (flet ((bar () c))
      (declare (dynamic-extent #'bar))
      (do () ((list) (bar))
        (setf c 10)
        (return (bar))))))

(foo) returns -312207914 instead of 10

Revision history for this message
jamieforth (jamieforth) wrote :

In 1.0.41.56 (foo) returns 10
In 1.0.44.32 (foo) returns FOO

Revision history for this message
Alastair Bridgewater (alastair-bridgewater) wrote :

Thank you!

What's happening is that the function is being inlined completely rather than producing a local call or explicit closure, thus not introducing a new stack frame (and thus a complete closure environment) but a different physenv is still involved, leading the IR2 conversion of the REF node to believe that an ANCESTOR-FRAME-REF is required, and for some reason it picks up the /calling/ frame to read from.

Changed in sbcl:
importance: Undecided → Critical
status: New → In Progress
assignee: nobody → Alastair Bridgewater (alastair-bridgewater)
Revision history for this message
Paul Khuong (pvk) wrote :

Seems to me the issue is that bar is in tail position and a local call. This results in the parent frame actually being bar's grand-parent's frame.

Revision history for this message
Alastair Bridgewater (alastair-bridgewater) wrote :

Paul, you are quite right, the problem is tail-calls, in this case tail-local-calls.

Attached is a first stab at a fix for this. It makes the specific test case in this bug function as expected, but is very likely to leave further problems in its wake. The first problem is that there is no guarantee that the new frame won't overwrite the stack slot containing the closure value. The second problem is that a tail-full-call that takes a d-x closure as an argument will also overwrite the stack slot. The only plausible solution to the second problem is to disallow tail-calls when there is a "live" d-x closure.

Revision history for this message
Alastair Bridgewater (alastair-bridgewater) wrote :

It occurs to me that the second problem (tail-call with a d-x closure as a parameter) is a non-issue because it's predicated on the existence of a stack-allocated closure object, the cleanup for which would prevent any call being in the tail position.

A bare-minimum KLUDGE for the first problem (lifetime for closure value cell TNs post-tail-call) could be to add all of the "implicit" value cells to the environment-live list for the callee function during ir2tran, though that would depend on the order of conversion of nested tail-calls, which isn't something that I'm comfortable relying on long-term.

Revision history for this message
Alastair Bridgewater (alastair-bridgewater) wrote :

So, here's part two: Because the tail-call involves switching PHYSENVs, in order to make absolutely sure the storage for a closed-over variable doesn't get reallocated, make the variable TN a component-live TN instead of a physenv-live TN. A massive KLUDGE, but easier than figuring out how to make the TNs live over the tail-set of the home lambda of the variable.

Revision history for this message
Alastair Bridgewater (alastair-bridgewater) wrote :

In 1.0.44.34.

Changed in sbcl:
status: In Progress → Fix Committed
Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

In 1.0.45 release.

Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.