wanted: by-value structures in SB-ALIEN

Bug #313202 reported by Nikodemus Siivola
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SBCL
Confirmed
Wishlist
Unassigned

Bug Description

Passing and returning structures by value is not supported by SB-ALIEN.

As of 07.06.2019, there's a bounty on this feature: https://www.bountysource.com/issues/75202399-wanted-by-value-structures-in-sb-alien

Tags: alien feature
Changed in sbcl:
importance: Undecided → Wishlist
status: New → Confirmed
Stas Boukarev (stassats)
tags: added: alien
removed: sb-alien
Revision history for this message
Ekaterina Vaartis (vaartis) wrote :
Revision history for this message
Drew Crampsie (drewc) wrote :

Now? 02 January 2009 is not "now", it's over 10 years ago. Why do you point attention to it now? Has any progress been made over the last 10 years 5 months and 5 days?

If not, does inflation matter? $83.58 would be what my local bank says.

Also, "bunty" is probably not what you meant. :)

Revision history for this message
Stas Boukarev (stassats) wrote :

What are you talking about?

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

The bounty has been created just today. I posted this bounty here for anyone who would find this here and would want to claim the bounty.

Revision history for this message
Drew Crampsie (drewc) wrote :

That's so very confusing, as the bounty page says :

"wanted: by-value structures in SB-ALIEN
 02 January 2009 Posted by Nikodemus Siivola
Passing and returning structures by value is not supported by SB-ALIEN."

Can you please update it to say that you are offering the money, and this is not 2009 but 2019?

Can you see where people may get confused?

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

Bountysource is a site where you offer money for solving bugs, this would be the only purpose for the bug to be there. The page there reflects the state of the bug on launchpad, i cannot alter neither the bug description, nor the date, as far as i know. The fact remains that the bounty is unclaimed and the feature is not implemented, the date does not mean much in the context, only the fact that it hasn't been implemented for ten years.

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

Actually, i found a way to edit the bug description, so i will add the bounty note to the description.

description: updated
Revision history for this message
Drew Crampsie (drewc) wrote :

Thank you for explaining that to me. Hopefully your editing will make people less confused about who is offering what and when. Many blessings.

Revision history for this message
g_o (go0) wrote :

Hey, already started working on this; I got the return value partially working. It seems like a lot of other code on sb-alien is guess-work, should i also guess optimiztions or should i require some flag when compiling?

Thanks in advance

Revision history for this message
g_o (go0) wrote :

Update here: I'm almost finished, problem is I need to call a function i made that fixes something on alien-funcall. I made prints but somehow they won't show when i run a test lisp program? i might need some help

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

Did you try printing to a different stream with (print obj :output-stream stream)? Maybe stdout isn't set up there or something. Also, does anything that might show if it works work in there? Something like raising a condition?

Revision history for this message
g_o (go0) wrote :

I tried printing to different stream (stderr) to no avail; not sure what you mean by "raising a condition" but i can't think of a simple condition to conclude - i practically have 1 input in my control - the args, tried to do length mismatch but it's caught beforehand; the only hypothesis i have is that it might be optimized by a transform from aliencomp or something.. though i really am not knowledgeable in sbcl's optimizations to be honest.

Revision history for this message
g_o (go0) wrote :

so somehow the print shows on some contrib tests ... [visible confusion]

Revision history for this message
g_o (go0) wrote :

Update: somehow using defvar to extern-alien fixes the problem. I'll try to get rid of the cause root by the time I send the patch

Revision history for this message
g_o (go0) wrote :

Semester started so progress will be a bit slow, though it really is almost done

g_o (go0)
Changed in sbcl:
assignee: nobody → g_o (go0)
g_o (go0)
Changed in sbcl:
assignee: g_o (go0) → nobody
Revision history for this message
g_o (go0) wrote :

so i guess i'll just update that i got back to the issue and it partially works. what i have left is to deal with default alignment (not packing/inner padding), i.e: initial offset of the struct when passed. can anyone give me some pointers as to how to guess that offset on the sbcl side?

Revision history for this message
g_o (go0) wrote :

nevermind it seems like the optimization of passing args by registers is a little different for structs; and so i don't think it was really an offset problem. i'll try to fix it this week, hopefully it goes well and i'll finally have a patch

Revision history for this message
g_o (go0) wrote :

alright, so it works but now i get back to the problem i discussed with @vaartis where if i use dynamic binding (i.e: defvar/defparameter) for my alien function my changes show up, but if i use lexical (i.e: let) - as if i never changed anything (e.g: if i inserted a print it wouldn't show). changes to aliencomp though still show. i'm still pretty lost here - is this a host/target compilation thing?

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

Could you perhaps upload your version somewhere with a simple test?

Revision history for this message
g_o (go0) wrote :

@vaartis i can send you my patch but the problem is that it's incredibly important to fix it because define-alien-routine is using lexical variables so if you want to both pass and return struct you'd have to make an almost clone of it on the user side which is really clunky..

Revision history for this message
g_o (go0) wrote :

here's what i have so far.

Revision history for this message
g_o (go0) wrote :

these are external tests i have but didn't integrate yet

Revision history for this message
g_o (go0) wrote :

anyway what i sent should suffice for pass and return, if you want to do both i can send an example file of how it'd be done under this patch from user standpoint. it's not a good solution though, so i really prefer an explanation as to why alien-funcall problematic for lexical vars and if it's a separate issue - open another bug.

to be clear: it doesn't matter *how* i change alien-funcall as far as i could tell, it just uses whatever i had previously. this last edge case allegedly can be solved simply by changing a let to a defvar in define-alien-routine; but even if we settle on this horrible solution - it crashes compilation itself. i really am not sure what i can do here

Revision history for this message
g_o (go0) wrote :

alright i think i fixed most of the problems

g_o (go0)
Changed in sbcl:
assignee: nobody → g_o (go0)
Revision history for this message
g_o (go0) wrote :

umm @vaartis mind giving some sort of feedback on the latest patch i sent?

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

Sorry for not responding before. I'll try your patch tomorrow with the use case I had in mind initially and get back to you.

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

When building latest SBCL (d0243a9f9961f0afdc09b555821b88edb2488be9 on github) with your patch applied I get the following signal (which pauses compilation):

[314/314] src/code/last-file

; file: /home/vaartis/Code/sbcl/src/code/target-alieneval.lisp
; in: DEFUN SB-ALIEN:ALIEN-FUNCALL
; (T (ERROR "~S is not an alien function." SB-ALIEN:ALIEN))
;
; caught STYLE-WARNING:
; undefined function: COMMON-LISP:T
;
; compilation unit finished
; Undefined function:
; T
; caught 1 STYLE-WARNING condition
; printed 4 notes
While evaluating the form starting at line 23, column 0
  of #P"/home/vaartis/Code/sbcl/make-host-2.lisp":

However, afterwards the rest builds fine.

When trying the code you have and following the way you call the function in the tests, it seems like alien-funcall'ing the same externed function only works once, because afterwards the externed function value is modified to have an additional struct parameter (i'm trying a functon that returns a struct and has no arguments).

The tests do pass, but when I tried writing a function of my own and calling it, the data in the struct was all messed up. Not sure if i'm doing something wrong. The test lisp file is attached to this post. I'll also attach a C file to the following post since it seems like only one attachment is allowed.

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :
Revision history for this message
g_o (go0) wrote :

about the compilation pause - i did not experience it so that's weird, i'll see if i can replicate that.
about multiple calls - it should inject arguments to a copy but i probably forgot to change it in the one of the steps, i'll see what's up
about your own test - my first impression is that it's an edge case i didn't notice. i think i know what it is and it shouldn't take too long to fix

stay tuned i guess

Revision history for this message
g_o (go0) wrote :

i think this should do the job

Revision history for this message
g_o (go0) wrote :

cleaned up a bit and changed something tiny. i think it's done

Revision history for this message
Stas Boukarev (stassats) wrote :

(defun alien-void-unparsed-type-p (type)
  (string-equal "(VALUES)" (princ-to-string type)))

doesn't look good...

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

With the latest patch, the first value in the passed truct did work as expected. However, in the test file I attached before there were two values and the second one (y) is now always zero when passed into C. In fact, every second field is equal to zero, and the next field after them has their value. Seems like an issue with offsets, perhaps. When I added a third field, it had the value that the third field was supposed to have. This is all despite unsigned int and unsigned-int (in lisp) both being 32bits. Attached is a modified test lisp file. I will also attach the modifed C test file in the next post.

Another thing i've noticed is when I try to create a boolean field in lisp, i always get

debugger invoked on a TYPE-ERROR in thread
#<THREAD "main thread" RUNNING {1004AE0993}>:
  The value
    T
  is not of type
    CHARACTER
  when binding CHAR

when declaring it

Revision history for this message
Ekaterina Vaartis (vaartis) wrote :
Revision history for this message
Ekaterina Vaartis (vaartis) wrote :

> third field was supposed to have
* the second field

Revision history for this message
g_o (go0) wrote :

@stassats oops forgot about that "^^, that hack should be easily replaceable though
@vaartis about the boolean - i'm not responsible for the typing system. i think i saw some
         sbcl kludges that said that alien system doesn't really support less than byte types
         (in case you're hoping for bitfield); i'm pretty sure i tested char,
         try using it instead. in any case i think it might be out of the scope of the issue.
         (i made no real distinction between types on my end anyways).
         hmm about 32-bit types on 64 bit machine i'm not sure if there's an elegant way to solve
         this; see, i let sbcl's existing argument packer deal with padding and offsetting.
         i prefer being verbose as a workaround using prefered offsetting/alignment attributes
         on gcc's end or on the lisp alien type end.

         tbh at this point i had my fun with this issue and while it's nice to have i
         don't care all that much about the beer money. so if anyone wanna improve this thing or try
         something different he/she is very welcome, hopefully we'll share both.
         i'm just really out of time for this issue, as another semester is coming my way
         and with things being crazy as they are right now.

Charles (karlosz)
Changed in sbcl:
assignee: g_o (go0) → nobody
Revision history for this message
Rongcui Dong (rongcuid) wrote :

Hello,

I would like to have a stab at this bug. I've spent a good part of last week to go through SBCL's source code, and I have 75% confidence that I can at least add support for struct passing/returning by argument for the few platforms I have experience on: ARM64/Darwin, X86-64 Windows/Unix, and Risc-V. However, since this will be a fairly large investment of time, I would like to break my patching into multiple milestones, so I can get some feedback from the devs, and patch up on my knowledge of the internals of SBCL (which I only used for a while).

I will try to make each patch I submit here stable enough to go into a release, gradually fixing problems and adding supports of platforms. My goal is that even if I cannot follow through the very end of this bug fix, each patch should leave the code base in a good state for other devs to pick up on my progress. I will also leave extensive documentation to all my changes.

## The patch

Attached in this comment is the first milestone. It's a very simple BREAKING CHANGE:

- Any attempt to define a foreign routine that passes or returns struct by value raises an error.

I consider this a necessary change that should be merged as soon as possible, even though it doesn't fix this bug. As of current release, when you define a function that passes or returns structs by value, SBCL generates assembly that can potentially corrupt the stack silently. This happen due to the following implementation:

- SB-ALIEN handles struct values by the RECORD alien type class
- The RECORD class inherits from this hierarchy: RECORD -> MEM-BLOCK -> ALIEN-VALUE -> SYSTEM-AREA-POINTER
- All platforms define how to pass a SYSTEM-AREA-POINTER as an argument
- No platform define how to pass a RECORD
- Therefore, RECORD is passed as a SYSTEM-AREA-POINTER

This behavior is almost always incorrect. It clobbers registers and stack by misaligning arguments and return values, which is potentially very dangerous. This patch throws an error to prevent this from happening. While I consider this the correct thing to do, please have a review and see if it breaks anything.

I did not add a test for this patch, as this milestone is only temporary, until I add concrete implementation to platforms.

## Some more proposals

These are some proposals which I have in mind, but since they will touch code that don't directly correlate to this bug, I think I should ask for permission to change anything:

1. The ARM64 c-call implementation is structured far differently from the ARM documentation, AAPCS64. This makes it pretty difficult to verify against the official documentation. May I refactor this part to more closely reflect the official docs?
2. I don't think RECORD should inherit from MEMORY-BLOCK. Actually, this class hierarchy is pretty problematic for the :ARG-TN and :RESULT-TN methods because there are so many platform-specific behavior. Perhaps they should be split into another class, or just not use OOP all together?
  - Plus it took me about five days to figure out which method was called due to the explicit OOP system...

And that's all for the first patch.

Thank you,
Rongcui Dong

Revision history for this message
Rongcui Dong (rongcuid) wrote (last edit ):

I am currently struggling with structures that can return two TNs in the :ARG-TN method. This can happen on ARM64 when a structure of size between 8 and 16 bytes is passed by value: it can take up two registers, or two slots on the stack.

However, currently each argument cannot seem to return multiple TNs. Would this mean I need to also change aliencomp.lisp's alien funcall optimizer? Or am I missing something obvious, such as an existing way to assign mutliple TNs to the same argument? Any guidance would help greatly.

EDIT:

I figured it out. I've implemented passing small structs by value on ARM64, and I'll put up a patch once I have full support of ARM64.

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote (last edit ):

Thanks. I'd like to take the milestone 1 patch, but I think I'd also like the test infrastructure around it if we can. What's the minimal possible working example that we can imagine? Something like:

struct foo {
  int a;
  int b;
};

struct foo bar(struct foo x) {
  x.a ^= x.b;
  x.b ^= x.a;
  x.a ^= x.b;
  return x;
}

and then

(define-alien-type foo (struct foo (int a) (int b)))

(with-alien ((foo foo))
  (setf (slot foo 'a) 1 (slot foo 'b) 2)
  (let ((rfoo (alien-funcall (extern-alien "bar" (function foo foo)) foo)))
    (list (slot foo 'a) (slot foo 'b) (slot rfoo 'a) (slot rfoo 'b))))

expecting (1 2 2 1), corresponding to the C code

int main() {
  struct foo foo;
  foo.a = 1;
  foo.b = 2;
  struct foo rfoo = bar(foo);
  printf("%d %d %d %d", foo.a, foo.b, rfoo.a, rfoo.b);
}

If I've understood all that correctly, could we have a test with something like this code that is marked as :fails-on :sbcl, as well as the patch in comment #41? (If I've not understood correctly, please help! :-)

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I have added preliminary pass-by-value struct for ARM64. Currently I've only tested passing struct as the first argument. There also seems to be some inefficient stack allocation, and the code can use a lot of refactoring.

This patch adds:

* Passing a struct aligned to 8-bytes by value as first argument
** Works for small structs (copied to registers) and large structs (copied to stack, then pass pointer)
* Test cases for the above

Major changes:

* In `aliencomp.lisp`, `%alien-funcall`, I added one additional return value, `preprocess-tns`, which contain functions which are called _before_ any arguments are copied. This is necessary due to ARM64 calling convention's Stage B, where arguments may be copied to memory and replaced by a pointer.

This change will not affect existing backends, but it is still in a common code path. If there's a better option, let me know!

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I cannot figure out how to make test cases for milestone 1. Neither `:fails-on` nor `assert-error` can catch the error, for some reason. I am guessing that's because it's at macro expansion time? I don't know.

Take this as reference:

(define-alien-type nil (struct tiny-align-8 (m0 (integer 64))))
(with-test (:name :struct-by-value-tiny-align-8-args :fails-on :sbcl)
  (assert-error (define-alien-routine tiny-align-8-get-m0 (integer 64) (m (struct tiny-align-8))))
  (assert-error (define-alien-routine tiny-align-8-mutate void (m (struct tiny-align-8))))
  (assert-error (define-alien-routine tiny-align-8-return (struct tiny-align-8))))

None of these errors are caught.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I cannot find any way to catch the error, so I don't know how to write a test case for it.

To try out the error without merging from the later patch, just try this in REPL (after patching milestone 1):

(define-alien-routine "foo" (struct foo))

This will throw an error that somehow cannot be caught by any handler. Maybe I just missed something.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Attached is the Milestone 1 patch with test cases. I reused tests from my feature branch, but simply asserted that defining routines will always fail.

However:

* Currently the test fixture isn't great at adding support on `:fails-on platform`, because I just asserted error
** I'll probably make the `(eval '(define...` changes in the development branch as well
* Error thrown is a SIMPLE-ERROR. I looked at CL's standard error types and didn't find a suitable one. Any suggestions?

Also, is `format-patch` the recommended way to submit patches? I grouped all of my commits into a single file, but not sure if I should do it.

The patch should be mergeable if there there are no problem.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Attached is my first full implementation of struct pass by value on ARM64 backend. I have implemented everything in the ABI for passing arguments, except for scalable types, vector types, half float, quad floats, and quad integers, which are not supported by SBCL. However, the test framework has not been updated, and I will only consider this milestone "done" when the tests cover all of AAPCS64's cases.

Returning by value is not implemented yet.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

@csr21-cantab Sorry I missed the last sentence in your comment. The patch in #47 includes a few tests for Milestone 1.

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

Thanks.

What if we were to change the :arg-tn and :result-tn methods to error at runtime? I think it would be more in SBCL's normal style to do something like:

(define-alien-type-method (record :arg-tn) (type state)
  (declare (ignore type state))
  (warn "Passing structs by value is unsupported on this platform.")
  `(error "Passing structs by value is unsupported on this platform."))

instead? This should make it easier to test, too.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

How does this work? What you show returns a quoted error, but it gets returned into `make-call-out-tns`, which in turn goes into %alien-funcall, in the backend. I don't know if it's possible to make it runtime error

Revision history for this message
Rongcui Dong (rongcuid) wrote (last edit ):

@csr21-cantab I think it would be difficult to make something as suggested in #50, based on my understanding of the backend.

Currently, the error is generated when defining an alien routine, which is called in the following order:

- define-alien-routine
- make-call-out-tns
- :ARG-TN/:RETURN-TN -> error

However, to cause an error at runtime instead, this error must be emitted to the backend. So, it has to go through two stages.

When defining:

- define-alien-routine
- make-call-out-tns
- :ARG-TN/:RETURN-TN, somehow returning the error, unevaluated
- make-call-out-tns returning this error

When running:
- %alien-funcall catching this error and throw it
- Or generate some sort of VOPs that throw an error

The main problem is that make-call-out-tns and :arg-tn/:return-tn are architecture-specific. If I define a blanket implementation, I have no control over how each architecture uses these return values, so it would mean touching code of every architecture as well as the backend.

Currently in my tests, I am wrapping all define-alien-routine calls in EVAL to delay their execution, so they can definitely be tested.

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

Yeah, I agree, it's difficult. (I think we would have to do something weird like make a special kind of TN that errored if used. Almost certainly not worth it).

I've merged the patch from #47 (milestone 1). Thank you. format-patch and all-in-one-file is nice and easy for me to merge with just `git am`, so that works fine.

Revision history for this message
Stas Boukarev (stassats) wrote :
Download full text (3.2 KiB)

The windows build fails with:

  Passing structs by value is unsupported on this platform.

Backtrace for: #<HOST-SB-THREAD:THREAD "main thread" RUNNING {10012E0613}>
0: (HOST-SB-DEBUG::DEBUGGER-DISABLED-HOOK #<SIMPLE-ERROR "Passing structs by value is unsupported on this platform." {100578E8B3}> #<unused argument> :QUIT T)
1: (HOST-SB-DEBUG::RUN-HOOK HOST-SB-EXT:*INVOKE-DEBUGGER-HOOK* #<SIMPLE-ERROR "Passing structs by value is unsupported on this platform." {100578E8B3}>)
2: (INVOKE-DEBUGGER #<SIMPLE-ERROR "Passing structs by value is unsupported on this platform." {100578E8B3}>)
3: (ERROR "Passing structs by value is unsupported on this platform.")
4: (SB-ALIEN::RECORD-ARG-TN-METHOD #<unused argument> #<unused argument>)
5: (SB-C:MAKE-CALL-OUT-TNS #<ALIEN-TYPE (FUNCTION (BOOLEAN 32) (C-STRING :EXTERNAL-FORMAT :UCS-2) (SIGNED 32) (:STRUCT NIL) &REST)>)
6: (SB-C::%ALIEN-FUNCALL-IR2-CONVERT-OPTIMIZER #<SB-C::COMBINATION :FUN #<SB-C::REF :LEAF #<SB-C::GLOBAL-VAR :%SOURCE-NAME SB-C:%ALIEN-FUNCALL :TYPE #1=#<SB-KERNEL:FUN-TYPE (FUNCTION (# SB-ALIEN-INTERNALS:ALIEN-TYPE &REST T) *)> :DEFINED-TYPE #1# :WHERE-FROM :DECLARED :KIND :GLOBAL-FUNCTION {10056B3F03}> {10056B4023}> :ARGS (#<SB-C::REF :LEAF #<SB-KERNEL:CONSTANT :VALUE "GetFileAttributesExW" {10053B2CD3}> {10056B4323}> #<SB-C::REF :LEAF #<SB-KERNEL:CONSTANT :VALUE #<ALIEN-TYPE (FUNCTION # # # # &REST)> {10054367A3}> {10056B4563}> (#<SB-C::COMBINATION :FUN #<SB-C::REF :LEAF # {10056FD8A3}> :ARGS (#) {10056FD923}> #<SB-C::COMBINATION :FUN #<SB-C::REF :LEAF # {1005700483}> :ARGS (#) {1005700503}> #<SB-C::COMBINATION :FUN #<SB-C::REF :LEAF # {1005702C23}> :ARGS (#) {1005702CA3}>) #<SB-C::REF :%SOURCE-NAME SB-ALIEN::VALUE :LEAF #<SB-KERNEL:CONSTANT :VALUE 0 {10053B34F3}> {1005716473}> #<SB-C::REF :%SOURCE-NAME SB-C::SAP :LEAF #<SB-C::LAMBDA-VAR :%SOURCE-NAME #:VAR :TYPE #<SB-KERNEL:BUILT-IN-CLASSOID SYSTEM-AREA-POINTER (sealed)> {10053AC7F3}> {1005718D93}>) {10056B40A3}> #<SB-C::IR2-BLOCK {1005780F83}>)
7: (SB-C::IR2-CONVERT-BLOCK #<SB-C::CBLOCK 7 :START c1 {1005716323}>)
8: (SB-C:IR2-CONVERT #<SB-C:COMPONENT :NAME "DEFUN" {100543A3B3}>)
9: (SB-C::%COMPILE-COMPONENT #<SB-C:COMPONENT :NAME "DEFUN" {100543A3B3}>)
10: (SB-C::COMPILE-COMPONENT #<SB-C:COMPONENT :NAME "DEFUN" {100543A3B3}>)
11: (SB-C::COMPILE-TOPLEVEL (#<SB-C::CLAMBDA :%SOURCE-NAME SB-C::.ANONYMOUS. :%DEBUG-NAME "top level form" :KIND (TOPLEVEL) :TYPE #<SB-KERNEL:BUILT-IN-CLASSOID FUNCTION (read-only)> :WHERE-FROM :DEFINED :VARS NIL {10053A9513}>) NIL)
12: (SB-C::CONVERT-AND-MAYBE-COMPILE (SB-IMPL::%DEFUN (QUOTE NATIVE-FILE-WRITE-DATE) (NAMED-LAMBDA NATIVE-FILE-WRITE-DATE (NATIVE-NAMESTRING) (DECLARE (SB-C::TOP-LEVEL-FORM)) "Return file write date, represented as CL universal time." (BLOCK NATIVE-FILE-WRITE-DATE (WITH-ALIEN (#) (SYSCALL # # NATIVE-NAMESTRING 0 FILE-ATTRIBUTES))))) (#1=(SB-IMPL::%DEFUN (QUOTE NATIVE-FILE-WRITE-DATE) (NAMED-LAMBDA NATIVE-FILE-WRITE-DATE (NATIVE-NAMESTRING) (DECLARE (SB-C::TOP-LEVEL-FORM)) "Return file write date, represented as CL universal time." (BLOCK NATIVE-FILE-WRITE-DATE (WITH-ALIEN # #)))) (PROGN (EVAL-WHEN (:COMPILE-TOPLEVEL) (SB-C:%COMPILER-DEFUN (QUOTE NATIVE-FILE-WRITE-DATE) T NIL NIL))...

Read more...

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : Re: [Bug 313202] Re: wanted: by-value structures in SB-ALIEN

Stas Boukarev <email address hidden> writes:

> The windows build fails with:
>
> Passing structs by value is unsupported on this platform.
>
> Backtrace for: #<HOST-SB-THREAD:THREAD "main thread" RUNNING {10012E0613}>
> 0: (HOST-SB-DEBUG::DEBUGGER-DISABLED-HOOK #<SIMPLE-ERROR "Passing structs by value is unsupported on this platform." {100578E8B3}> #<unused argument> :QUIT T)
> 1: (HOST-SB-DEBUG::RUN-HOOK HOST-SB-EXT:*INVOKE-DEBUGGER-HOOK* #<SIMPLE-ERROR "Passing structs by value is unsupported on this platform." {100578E8B3}>)
> 2: (INVOKE-DEBUGGER #<SIMPLE-ERROR "Passing structs by value is unsupported on this platform." {100578E8B3}>)
> 3: (ERROR "Passing structs by value is unsupported on this platform.")
> 4: (SB-ALIEN::RECORD-ARG-TN-METHOD #<unused argument> #<unused argument>)
> 5: (SB-C:MAKE-CALL-OUT-TNS #<ALIEN-TYPE (FUNCTION (BOOLEAN 32) (C-STRING :EXTERNAL-FORMAT :UCS-2) (SIGNED 32) (:STRUCT NIL) &REST)>)
> 6: (SB-C::%ALIEN-FUNCALL-IR2-CONVERT-OPTIMIZER #<SB-C::COMBINATION
> :FUN #<SB-C::REF :LEAF #<SB-C::GLOBAL-VAR :%SOURCE-NAME
> SB-C:%ALIEN-FUNCALL :TYPE #1=#<SB-KERNEL:FUN-TYPE (FUNCTION (#
> SB-ALIEN-INTERNALS:ALIEN-TYPE &REST T) *)> :DEFINED-TYPE #1#
> :WHERE-FROM :DECLARED :KIND :GLOBAL-FUNCTION {10056B3F03}>
> {10056B4023}> :ARGS (#<SB-C::REF :LEAF #<SB-KERNEL:CONSTANT :VALUE
> "GetFileAttributesExW" {10053B2CD3}> {10056B4323}> #<SB-C::REF :LEAF

Thanks. Looks like some arguments should have been declared as pointers
to structs but aren't. I'll fix the compilation and verify that it's OK
with github's CI.

Christophe

Revision history for this message
Stas Boukarev (stassats) wrote :

Will that unearth problems elsewhere?

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

> Will that unearth problems elsewhere?

Might there be code out there that mistakenly declares interfaces like

  (define-alien-function foo boolean (struct bar))

where they meant

  (define-alien-function foo boolean (* (struct bar)))

and no-one noticed because sbcl does the right/wrong thing? Maybe. There was only a small amount of it in SBCL's code itself, for what it's worth: the diff that at least builds all cores is at <https://github.com/csrhodes/sbcl/tree/fix-win32> and we'll find out shortly whether it passes tests or runs at all.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

In current Stable, (struct foo) is synonymous to (* (struct foo)) when passing arguments due to struct (or record) inheriting methods from SAP. On some platforms (like windows X86, I believe), this could work due to their ABI. However, this "working" is just accidental, and on many other platforms it will not work.

I am not familiar with Win32 API, but the last argument is supposed to be a struct pointer: https://learn.microsoft.com/en-us/previous-versions/windows/embedded/ms890909(v=msdn.10). So, the win32 test case is wrong.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

How is it possible that the expected error is not thrown:
https://github.com/sbcl/sbcl/actions/runs/9161867983/job/25187659464

::: UNEXPECTED-FAILURE :STRUCT-BY-VALUE-TINY-ALIGN-8-ARGS due to SIMPLE-ERROR:
        "(EVAL
          '(DEFINE-ALIEN-ROUTINE TINY-ALIGN-8-GET-M0 (INTEGER 64)
                                 (M
                                  (STRUCT
                                   TINY-ALIGN-8)))) returned (TINY-ALIGN-8-GET-M0) without signalling ERROR"
::: Running :STRUCT-BY-VALUE-SMALL-ALIGN-8-ARGS

Is it the --without-sb-eval flag? How to get around this?

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

It's just the fact that it's being interpreted, not compiled. You can probably reproduce locally with

cd tests/
sh ./run-tests.sh --evaluator-mode interpret alien-struct-by-value.impure.lisp

Presumably this is because the function that is generated is not compiled and hence not macroexpanded.

We could probably force the issue by attempting to *call* the alien-routine as well as define it. Testing a fix now.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

%ALIEN-FUNCALL is defined under a `defoptimizer` block. I am not completely sure what it does, but does that contribute to the difference in interpreted vs compiled mode? So, in compiled mode, MAKE-OUT-CALL-TNS is called at compile time while in interpret mode it's called at run time?

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

Basically, yes: the only guarantee about make-call-out-tns is that it must be called by the time the alien function itself is called.

(define-alien-routine foo int) approximately macroexpands into

  (defun foo ()
    (with-alien ((...))
      (values (alien-funcall ...))))

If you are in compiled mode, then this function body gets compiled, including macroexpansion all the optimizations that happen at compile-time for things like alien-funcall.

If you are in interpret mode, then this function body is simply saved, and only when (foo) is called does any macroexpansion happen. So at the point of (eval '(define-alien-routine foo int)), no alien compilation is happening at all: only when we later call (foo) does anything happen.

I adjusted the test to do both definition and call, for now.

Revision history for this message
Rongcui Dong (rongcuid) wrote (last edit ):

I am almost done with the ARM64 support, but I encountered a very strange crash, and I think I need some help here. Attached is the current progress. You can run the alien-struct-by-value.impure.lisp test on an ARM64 machine to see what happens, as well as the disassembly.

The crash happens when a large struct is passed by value and not as the first argument. A memory fault is generated at -0x7, in a lambda returned from the preprocessing step:

- PRE-PADDING-AND-EXTENSION emits a closure, PP-G
- In MAKE-CALL-OUT-TNS, this PP-G is called with the current frame pointer offset, emitting another closure, PP
- In %ALIEN-FUNCALL, PP is called to emit assembly code required for copying the struct to stack

However, the generated assembly contains something very strange:

; 1E0: BF2A00B9 STR WZR, [THREAD, #40] ; pseudo-atomic-bits
; 1E4: BE2E40B9 LDR WLR, [THREAD, #44] ; pseudo-atomic-bits
; 1E8: 5E0000B4 CBZ LR, L2
; 1EC: 200120D4 BRK #9 ; Pending interrupt trap
; 1F0: L2: 43915FF8 LDR NL3, [R0, #-7]
; 1F4: E00303AA MOV NL0, NL3

See 1F0 here. This caused the memory fault. I don't understand why R0 is here, or what it represents, because this assembly seems to happen before my copying code, or MOVE-STRUCT-TO-STACK. Any idea what's going on? This crash doesn't happen if the struct is passed as the first argument.

Revision history for this message
Stas Boukarev (stassats) wrote (last edit ):

I have no idea what's wrong with the code above. But just style notes.
You can use LDP instead of two LDR.
You don't need (load-store-offset 8), 8 is an immediate.
Don't put non-ascii stuff like ® into the code.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I'll try to use LDP once I deal with the problem. I also don't see any ® so I am not sure what you are referring to? I know there's @ which is an address offset. I can clean up load-store-offset for anything that are immediates, though.

I triaged the problematic code to be generated from my vop MOVE-TO-STACK, but I am still figuring out why it generates wrong code only when the struct is not the first argument. Somehow, when struct is the first arg, it uses R1 instead of R0 for the crashing instruction, which doesn't result in a crash.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

The offending code is actually generated even if I don't define anything in the body of the vop:

(define-vop (move-to-stack)
  (:args (from-ptr :scs (sap-reg)) ;; NL2
         (fp :scs (any-reg))) ;; NL1
  (:info size fpoff)
  (:temporary (:scs (non-descriptor-reg)) temp) ;; NL0
  (:generator 0))

Revision history for this message
Stas Boukarev (stassats) wrote :

I don't know how to link to ®, so just grep for it?

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I figured out the bug. It was my mistake because I pushed some list in the wrong order.

I found the ®, it's in the comments. I've replaced it with (R).

Revision history for this message
Rongcui Dong (rongcuid) wrote :

This is the Milestone 2 patch. I think this patch can be reviewed and prepare to merge.

This patch includes:

- Passing struct by value on ARM64, including Darwin
- Test cases for struct passing

What it currently lacks:

- Returning struct by value
- Bulletproof handling of alignment-adjusted types

Other notes:

- I highly doubt varargs is implemented correctly for ARM64. I didn't implement it, and I don't see any code handling them. In addition, there is no test case on variadic foreign functions that I know of.

Revision history for this message
Stas Boukarev (stassats) wrote :

Varargs work on arm64, so I doubt your doubts.

Revision history for this message
Stas Boukarev (stassats) wrote :

And they don't need a test because they are used throughout the runtime and contribs.

Revision history for this message
Rongcui Dong (rongcuid) wrote (last edit ):

Interesting. I'll take your word, then. I removed the comments about them.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I have two failed tests. I will fix them

Revision history for this message
Rongcui Dong (rongcuid) wrote :

This patch passes all tests, on my computer at least.

Revision history for this message
Charles (karlosz) wrote :

In aliencomp.lisp, you added reader conditionals for arm (not arm64!) guarding preprocess-tns. Is that intentional?

Revision history for this message
Rongcui Dong (rongcuid) wrote :

In aliencomp.lisp, the arm conditionals were there originally. Since it flips the argument order, and thus tns, and I am adding a preprocess list, I need to flip the preprocess list as well.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

This is just a side note thing. I find that the %ALIEN-FUNCALL routine quite difficult to go through, because although it's a common path for all the architectures, it contains lots of architecture-specific code that are interleaved together. Are there plans to refactor it a little? While I will try not to touch existing code for as long as possible, I am seeing that the backend is starting to accumulate all the different special cases from each added architecture.

For instance, currently ARG-TNS can be a function, a TN, or a list, depending on the arch and type. However, for a "general" implementation, a function is strictly more powerful than a list or a name. Perhaps keeping only the TN for convenience and the function for advance emitting would make it easier to port to other architectures. Similar can be said for RESULT-TNS, where there are many architecture-specific handling. One potential way to unify these different routes would be to accept lambdas for different stages, like preprocessing arguments hook (which is used in my milestone 2 patch), a pre-argument-copy hook (which is actually needed for milestone 3 for AAPCS64 struct returning), argument passing hook (which already exists, by returning a (lambda (value node block nsp))), and return value hook.

Of course, a refactor like this will affect code in every architecture, so I am not rushing to do it. However, I do think it will be a good thing to do at some point.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Return by value proves to be much more complicated than passing by value. I will stop working on that for now and move to X86-64 struct pass by value.

The main difficulty is because a return struct could have unlimited extent. SBCL, as far as I know, doesn't have garbage-collected alien objects/system memory, so I am unsure how to deal with the return semantics. Should a returned value stay on (lisp) stack and go out of scope on return? Or should it be inside system area and require freeing? Neither seem very safe to me.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I removed some leftover debug comments.

Revision history for this message
Charles (karlosz) wrote :

Having the returned value go on the Lisp stack makes the most sense to me: This matches how strict by values are treated in C, and we don’t try to protect against use-after-free for heap allocated points passed to Lisp either. Users could safely use the return value by binding it to an alien declared via with-alien which copies the return value slots into the alien with a lexically delimited scope.

Also +1 for refactoring the alien code. You may want to do that before porting to x86-64 to save some work: ideally every supported architecture should support the same alien primitives to avoid platform feature conditional explosion.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Ok. How do I test whether my refactors break other architectures? I don't have every machine on the planet. Can I make a PR or something to use the CI?

There's also something I cannot figure out, and earlier an email sent to the sbcl-devel mailing list somehow got sent back because it couldn't be delivered, so I have to ask here.

SBCL represents alien structs as a SAP. Therefore, when I return a struct by value, I have to store the returned struct somewhere that can be pointed to by the SAP. However, I don't really know where. Currently, I've tried pushing the value to control stack, but apparently the wrong value was read back to lisp. Perhaps control stack is not the right place for this? From CMUCL documentation, it's used for garbage collected lisp objects.

I think number stack is the right place to push, but then I don't know how to push it without clobbering the caller's stack frame. I don't think I can just write something to NSP and increment it, because then when the caller returns its frame pointer would be reset incorrectly.

If I need to make the caller allocate stack space, where should I look for? Or alternatively, is CSP the right thing to use, or is there already a way to return something from stack?

Revision history for this message
Stas Boukarev (stassats) wrote :

Can we not refactor anything? I equate refactoring with "let's break code without any benefits".

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Fix stage C for Linux ARM64

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I disagree with that view. If refactor breaks things, it means that tests are not good enough. Easing maintenance is a very good benefit: I cite https://www.sbcl.org/history.html for the fact that SBCL is "distinguished from CMUCL by a greater emphasis on maintainability". Of course, doing a refactor like this will require extensive testing and is probably a large task. I hope Github CI would allow such tests to be done.

If you really don't want that, I can skip it, but I personally don't think that's a good idea. The original backend code has a lot of architecture specific code mixed into the common path, and it's quite a lot of work to reverse engineer their mini-DSLs embedded as lists. You can have a look at my draft PR for a comparison. I've only done it for ARM64 and X86 with an exact translation, and I am using the PR only for the CI.

I am going to pause until we reach an agreement of what to do, but the ARM64 patches are complete and can be tested. The patches I sent do not include the refactor.

Revision history for this message
Charles (karlosz) wrote :

I agree, it’s absurd to ask for not refactoring anything when it’s directly beneficial for adding a large feature.

The number stack should be used for allocating space for struct return values. You can’t have random numbers on the descriptor stack, especially for precise platforms like arm, where the GC could treat them as Lisp pointers.

It’s okay to use the number stack writing to the stack and incrementing the NSP. The NSP gets reset correctly to the NFP. Just don’t change the NFP and things should work.

Revision history for this message
Stas Boukarev (stassats) wrote :

Of course you would disagree, because your refactorings is how I came to that conclusion.

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote :

> If refactor breaks things, it means that tests are not good enough.

OK. I'm perfectly happy to go on record to say that our tests are not good enough, in general, but certainly in the specific area of %alien-funcall.

That doesn't mean that refactorings / reworkings / rearrangements / rearchitecting can't happen, but it does raise the burden on the refactorer, and they carry with them an implicit promise to deal with any subsequently-discovered breakage further down the line.

As for how to test: the cfarm compile farm project offers ssh access to hosts on a variety of architectures and operating systems; it should be possible to build and test things there. Github CI is inadequate for this, as it does not cover most supported architectures or most supported operating systems.

I'd strongly advise first auditing the existing tests (and possibly the implicit tests from e.g. loading and testing systems in quicklisp) to get a sense for what documented functionality is tested at all, what is not, and how big a job it is to write tests to cover a reasonable amount of the difference.

Revision history for this message
Charles (karlosz) wrote :

Re: testing and CI: there's a cross-build runner you (Rongcui) can use which tests if you break building for other architectures (the parts which only exercise the cross-compiler). You can also push to a local fork if you want github CI test runners to work. Unfortunately, this doesn't actually run the tests on architecures beside x86-64 and arm64. You would have to do that yourself. I can recommend requesting access for the gcc farm machines which are available to free software porters if you want machines; they have machines for every platform available. It's annoying for sure, but I think that's the best you can do to make sure you don't break code on other machines. Of course, the less tested platforms naturally bitrot anyway.

Stas, I appreciate your judgements on my own work, but I don't know how valuable that is to the general discussion. To say "don't refactor" to someone who's clearly spent a lot of work in the code and gotten a big and involved feature working on one platform who has insights on what exactly is making it difficult to write the code to port to other architectures doesn't make much sense in this case. I don't know if I could convince you, maybe this needs to come from someone else.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Would you mind elaborating on my refactors? I am open to API change if you think it's not a good encapsulation. The PR is a draft PR, and is meant to break. I have only two archs done so far, and I am merely using the CI for machines I don't have.

Revision history for this message
Stas Boukarev (stassats) wrote :

It's not a judgement. Writing code without bugs is nigh impossible. So refactoring introduces bugs but provides no benefits of faster/smaller code.

Revision history for this message
Stas Boukarev (stassats) wrote :

>Would you mind elaborating on my refactors?

Why was multiple values changed to a list?

>and I am merely using the CI for machines I don't have.

The CI can be run in a personal fork, which I do all the time. Pull requests seem to eat up Cirrus CI compute credits. I'll have to investigate how to disable that.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I am not going to touch anything outside ALIEN-FUNCALL, so hopefully I don't need to audit everything else... Also, I will make test cases for anything I add as thoroughly as possible.

If I were to continue, I will still start with Github CI, as it's the simplest thing to use. I'll take the advice of machine farms.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

> Why was multiple values changed to a list?

This is because every architecture has a different set of requirements. For instance, only ARM64 currently has a preprocessing step. Only X86 needs to set/reset FPU state. I am using destructuring-bind with keywords to make intention clear. Changing back to multiple-values-bind will not affect functionality, but will mean that someone needs to keep track of a large number of positional arguments.

I have no problem with using values. I start with readable code when refactoring, and should any performance issue arise, I can optimize.

> Pull requests seem to eat up Cirrus CI compute credits.

I see. I can move to personal fork for testing.

Revision history for this message
Stas Boukarev (stassats) wrote :

Multiple values can be used with keyword arguments too.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

> Multiple values can be used with keyword arguments too.

Can they? I don't see it mentioned in CLHS

Revision history for this message
Stas Boukarev (stassats) wrote :

(multiple-value-call (lambda (&key a) a) (values :a 10))

Revision history for this message
Rongcui Dong (rongcuid) wrote :

> (multiple-value-call (lambda (&key a) a) (values :a 10))

Ok, I see. That's an easy enough change.

So, do I continue with the refactor?

Revision history for this message
Rongcui Dong (rongcuid) wrote :

@karlosz

> The number stack should be used for allocating space for struct return values

Does this actually work? I tried to push the return value by pushing to number stack and decrementing NSP in ALIEN-FUNCALL. It works when I have a C function of form `struct foo return_foo(void);`, but whenever I add an argument SBCL will crash due to memory fault. The PC pointer looks nothing like the disassembly suggests, so I assume that the crash happened after the alien function returns.

Basically, is this correct?

* Lisp function
* Calls alien routine
** Call C function
** Push result to number stack
* ???

That is, if I push the number stack _inside_ the alien function, is that still valid after the function returns?

Revision history for this message
Charles (karlosz) wrote :

No, I don't think allocating space on the number stack only inside ALIEN-FUNCALL can work. There needs to be coordination with the caller so that the values can be copied before return.

I think you have to have the caller allocate the space for the return-type as well, which unfortunately means you have to restrict alien-funcall for struct return types to only work when the alien function type is known. This is OK because the majority of alien-funcall uses fall under this case. (For the case where you don't know the type in advance, alien-funcall checks the return type and if it's a struct, it barfs).

Look at the transforms for alien-funcall in aliencomp.lisp. You can analyze the struct return type there and allocate the proper amount of space on the number stack there, which is going to be the real backing sap. The sap to that space can then be passed for alien-funcall to copy the struct it gets from C into the backing storage.

Does that plan make sense?

Revision history for this message
Yukari Hafner (shinmera) wrote :

Hey, really excited to finally see this make some headway!

I wanted to talk about an additional use-case that I need, and which not even cffi-libffi can help me with: structs-by-value in callback arguments and returns.

I've so far gotten by without needing this, but I recently started on the Framebuffers [1] project, which I would like to have a backend to Apple's Cocoa library. Apple looooves callbacks, so you have to implement a bunch of those to change and implement certain behaviours. Some of those callbacks look like this:

    struct rect update_rect(struct rect input)

Which in CFFI would be defined something like this:

    (cffi:defcallback update-rect (:struct rect) ((input (:struct rect))) ...)

So I need to be able to return a struct to C somehow.

For the struct-returns of a call, CFFI usually gives you a "translated" version of the struct in the form of a plist. That way lifetime questions are avoided, since it's just a regular lisp object. I'm not sure if sb-alien should take the same approach though.

[1] https://shirakumo.github.io/framebuffers/

Revision history for this message
Rongcui Dong (rongcuid) wrote :

There are two transforms:

deftransform alien-funcall ((function &rest args)
                             ((alien (* t)) &rest t) *)

deftransform alien-funcall ((function &rest args) * * :node node)

Which one should I use?

Revision history for this message
Charles (karlosz) wrote :

You should use the second one, that tells what the function type is at compile time so you can decide how much space to allocate on the number stack at compile time in the caller. This space can then be passed via a SAP to %alien-funcall which does the copying into that region when it gets the returned struct from C. This is how you extend the lifetime of the stack memory to the duration of the caller frame.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Also, the crash was caused by misaligned stack. My original implementation which pushed NSP from %ALIEN-FUNCALL "worked", but is that just accidental?

Revision history for this message
Charles (karlosz) wrote :

I'd have to check again whether on arm64 (or the other platforms where the Lisp stack is not the number stack), function calls tear down the number stack frame. It might be the case that on arm64, its OK. But on x86-64, the C stack is the Lisp stack, so when %ALIEN-FUNCALL returns that memory definitely no longer protected by the stack pointer, so it would be dangerous.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I see. I am still reading the transform function and trying to figure out how to reserve stack space there. Do you have some quick pointers/examples for doing so?

For callbacks, I may look into them later. I would guess that they are a bit easier, but I'll of course need to figure out how to copy struct data from C to Lisp.

Revision history for this message
Charles (karlosz) wrote :

Look at the %alien-funcall ir2-convert method to get a feeling of how the number stack is manipulated by %alien-funcall itself. You would have to lift up some machinery from there to get it to work in the caller. You can see that the number stack space is reset there once %alien-funcall returns, so probably how you have it working is only accidental.

For an initial prototype, you can reserve the space in the caller with (%primitive alloc-alien-stack-space ...). Look at the transform for MAKE-LOCAL-ALIEN. Hope this helps.

Revision history for this message
Charles (karlosz) wrote :

MAKE-LOCAL-ALIEN is how WITH-ALIEN works, by the way. (which reserves number stack space in the current frame)

Revision history for this message
Rongcui Dong (rongcuid) wrote :

> You can see that the number stack space is reset there once %alien-funcall returns, so probably how you have it working is only accidental.

Actually, I am aware that %alien-funcall resets NSP (if you are referring to dealloc-number-stack-space). Currently my structs are pushed to NSP _after_ this, so the decremented NSP stays to the end of %alien-funcall. You can have a look here: https://github.com/rongcuid/sbcl/blob/0fa3218340820ecffe9786cd1bfc5ca22caf3dca/src/compiler/aliencomp.lisp#L756. It's a funcall, but the lambda being called decrements NSP and copies the struct.

I will see what make-local-alien does.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

What's the difference between SB-C::VOP and SB-C::%PRIMITIVE? I was doing exactly what make-local-alien does, except with VOP. I also see a FIXME saying that %PRIMITIVE means the same thing as VOP?

Revision history for this message
Charles (karlosz) wrote :

I think what you're doing at that line looks fine then, actually. As long as the NSP protects it, the memory is valid. It will get cleaned up when the Lisp function returns (see e.g. compiler/arm64/call.lisp, where the nsp is cleaned up in the call VOPs). So the user just has to make sure to use or copy the alien struct before the Lisp frame returns.

Revision history for this message
Charles (karlosz) wrote :

VOP and %PRIMITIVE work at different levels. %PRIMITIVE is for normal CL code that is written, while VOP is used as part of codegen methods (hence it takes IR2 structures as part of its arguments, which %PRIMITIVE doesn't do; those are supplied by the codegen phase of the compiler)

Revision history for this message
Rongcui Dong (rongcuid) wrote :

> It will get cleaned up when the Lisp function returns (see e.g. compiler/arm64/call.lisp, where the nsp is cleaned up in the call VOPs).

That's the part I am not 100% certain about: does this happen at the end of an alien call as well? Or is %alien-funcall all there is? Or, are alien functions wrapped inside a lisp function call which I am not seeing in %alien-funcall routine?

Revision history for this message
Charles (karlosz) wrote :

The primitive %ALIEN-FUNCALL is not wrapped in its own Lisp function when the function type is known; this is what the transform does (it changes a call to ALIEN-FUNCALL directly into that sequence of steps in the caller via %ALIEN-FUNCALL, which is compiled as a primitive, not as a function call). However, when the function type is not known, the primitive is wrapped by the Lisp function ALIEN-FUNCALL. That's why I mentioned that it's probably only possible to do struct returns when the function type is known to the compiler.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I see. But doesn't the compiler always know the function types? DEFINE-ALIEN-ROUTINE has to provide all type information.

Revision history for this message
Charles (karlosz) wrote :

No. You could do something like (defun f (x) (alien-funcall x ...)). So yes, the compiler knows what the types are for a given alien function, but it doesn't necessarily know what alien function is being alien-funcalled. So if you did (f <alien-function-that-returns-a-struct>), it couldn't really work because F doesn't know how much space to allocate for the struct that's returned. But if you had

(defun bar (x)
  (alien-funcall (extern-alien "returns_struct" ...) x))

that gets rewritten by the compiler since it knows the alien function being invoked and hence th return type (and so it can rewrite alien-funcall to the stuff in the transform with %alien-funcall; now it works to allocate space in the caller). Does that make sense?

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I see. Thank you very much.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

This patch adds the ability to return structure by value on ARM64.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Updated doc strings for return by value

Revision history for this message
Charles (karlosz) wrote :

Could you update the manual as well (if you haven't already?) It still says struct pass by value is unsupported.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Maybe more can be explained, but I'll just add a few sentence.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

Summer semester starts so my work on this will be slow. My next target is alien callbacks on ARM64 to complete full support for ARM64. Then, I will move on to X86-64.

Revision history for this message
Rongcui Dong (rongcuid) wrote :

I no longer have time to work on this issue, so I will stop development for now. I will still provide any help if existing patches need to be merged.

To post a comment you must log in.