1.5.9 build error: /tmp/guix-build-sbcl-1.5.9.drv-0/sbcl-1.5.9/obj/from-xc/src/assembly/" does not exist

Bug #1855272 reported by Pierre Neidhardt on 2019-12-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

I'm trying to package SBCL 1.5.9 for Guix, but while previous version saw no problem this far, this one fails with

```
beginning GENESIS, creating core "output/cold-sbcl.core"
obj/from-xc/src/assembly/master.assem-obj

*** - OPEN: Directory
      #P"/tmp/guix-build-sbcl-1.5.9.drv-0/sbcl-1.5.9/obj/from-xc/src/assembly/"
      does not exist
The following restarts are available:
SKIP :R1 skip (GENESIS OBJECT-FILE-NAMES # ...)
RETRY :R2 retry (GENESIS OBJECT-FILE-NAMES # ...)
STOP :R3 stop loading file /tmp/guix-build-sbcl-1.5.9.drv-0/sbcl-1.5.9/make-genesis-2.lisp
ABORT-BUILD :R4 Abort building SBCL.
ABORT :R5 Abort main loop
Bye.
//testing for consistency of first and second GENESIS passes
diff: output/genesis-2: No such file or directory
error: header files do not match between first and second GENESIS
```

I build SBCL with CLISP 2.49.
I've tried with ECL 16.1.3 which returns a similar error.

Any clue what's wrong?

Douglas Katzman (dougk) wrote :

Does the directory exist, or is the message correct that it doesn't?
Please attach a log of everything that happened up to that point

Pierre Neidhardt (ambrevar) wrote :

Find the full log attached.

Pierre Neidhardt (ambrevar) wrote :

Forgot to answer your question: /tmp/guix-build-sbcl-1.5.9.drv-0/sbcl-1.5.9/obj/from-xc/src/ exists but not the `assembly` subdir.

Douglas Katzman (dougk) wrote :

it crashed in make-host-2 while compiling src/compiler/policy with
*** - 1+: NIL is not a number

So anything else that happened is irrelevant.
My observation of late is that CLISP is no longer a reliable build host.
We have a bunch of workarounds for known failures, but this is one I haven't seen.
It's remotely possible that an error has snuck in to our source code which calls and there is a bug in SBCL that obscures the bug in SBCL. But if we assume that that's not the case, can you backtrace in CLISP at the point of crash?

And you said you get an error with ECL. Is it the identical error? Do you have a log of that?

Pierre Neidhardt (ambrevar) wrote :

I just tried building the Guix package of 1.5.9 with SBCL 1.5.8: it works!

Pierre Neidhardt (ambrevar) wrote :

Bootstrappability of compilers is important (see the Thompson attack), and so far SBCL was able to build with ECL and CLISP which are the only boostrappable free software Common Lisp compilers around (beside ABCL maybe?). In particular, CCL and CMUCL are not boostrappable. It'd be nice if we could keep SBCL boostrappable from ECL or CLISP, but well, worse case we can require SBCL 1.5.8 as a bootstrap I suppose. The documentation would need to be updated then.

Pierre Neidhardt (ambrevar) wrote :

Find the ECL build attached.

Pierre Neidhardt (ambrevar) wrote :

Regarding the CLISP backtrace: Can you tell me how to do this?

Douglas Katzman (dougk) wrote :

I believe you're trying to build SBCL not at the latest revision.
A bug was introduced to the code on Nov 12th which could cause this problem.
The bug was fixed in git revision c9bca547087892d9bbb7671e86952fc3d468ba57
Try building at or after that.

Pierre Neidhardt (ambrevar) wrote :

The commit you mentioned seems to fix it.
However, I get another error now with CLISP:

Initial page table:
Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age
 6 776 0 0 0 0 0 0 25427200 768 2000000 776 0 0.0000
           Total bytes allocated = 25427200
           Dynamic-space-size bytes = 1073741824
fatal error encountered in SBCL pid 727(tid 0x7ffff7c71740):
internal error too early in init, can't recover

Error opening /dev/tty: No such device or address

See attachment.

description: updated
Douglas Katzman (dougk) wrote :

my personal experience is that CLISP miscompiles SBCL, and I think this is the experience of other SBCL developers as well.
On the other hand, there was a point in time at which I though CLISP was reliable enough to use.
(I don't know what happened - did our code get more complicated to the point that CLISP miscompiles it which then leads to it miscompiling itself? That seems to be the only explanation)

So there are (at least) two ways to go about finding the problems-
(1) byte-for-byte compare the .fasl files resulting from compiling SBCL under SBCL vs compiling SBCL under CLISP. This should narrow down discrepancies to a file or bunch of files.
(2) start from first principles of debugging - lots of stepping and backtracing and printfs
Unfortunately I don't have the time or need to do either of the above.

If nothing else, please attach a backtrace by entering 'b' at the ldb> prompt.

Douglas Katzman <email address hidden> writes:

> (I don't know what happened - did our code get more complicated to the
> point that CLISP miscompiles it which then leads to it miscompiling
> itself? That seems to be the only explanation)

There are still some relatively easy ways to introduce unportable
behaviour, unfortunately. One that I've seen in the past relates to
uses of LOOP that aren't quite nailed down enough. (Not saying that
this is such an example).

> So there are (at least) two ways to go about finding the problems-
> (1) byte-for-byte compare the .fasl files resulting from compiling
> SBCL under SBCL vs compiling SBCL under CLISP. This should narrow down
> discrepancies to a file or bunch of files.

For the record: I did this this morning, and CLISP and SBCL outputs are
byte-for-byte identical for all cross-compiled files except
src/assembly/master.assem-obj.

Investigation is ongoing!

Pierre Neidhardt (ambrevar) wrote :

Thanks a lot!
I believe this is an important issue, hopefully things will come around!
I'm not in the best position to help, unfortunately, so I wish you best of luck!
Cheers!

Douglas Katzman (dougk) wrote :

I just did a build with clisp after some portability fixes, and it worked, so try it now.

Pierre Neidhardt (ambrevar) wrote :

Thanks!
Just tried against 4c798e6eb72ce1cd8c070de593173538f61f2504 it failed like above:

```
//entering make-target-2.sh
//doing warm init - compilation phase
This is SBCL 1.5.9, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Initial page table:
Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age
 6 776 0 0 0 0 0 0 25427200 768 2000000 776 0 0.0000
           Total bytes allocated = 25427200
           Dynamic-space-size bytes = 1073741824
fatal error encountered in SBCL pid 727(tid 0x7ffff7c71740):
internal error too early in init, can't recover

Error opening /dev/tty: No such device or address
Internal error #21 "Unreachable code reached" at 0x522f5bdb
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>
```

Build log attached.

Pierre Neidhardt (ambrevar) wrote :

Was something added with /dev/null since 1.5.8? Note that Guix packages are built in containers with limited access to the file system.

Stas Boukarev (stassats) wrote :

If you tried 4c798e6eb72ce1cd8c070de593173538f61f2504 then why does it say This is SBCL 1.5.9?

Pierre Neidhardt (ambrevar) wrote :

Because I had to generate a version file with some version in it, I chose "1.5.9". Otherwise it wouldn't build because Guix discards the .git folder when fetching the source.

Douglas Katzman (dougk) wrote :

I used "CLISP 2.49 (2010-07-07)" for my successful build.

I tried updating to "CLISP 2.49.93+ (2018-02-18)" (which I built from source) and the build of SBCL did not finish at all.

It would perhaps be helpful if you could provide a backtrace. The "unreachable code reached" error is exactly what I thought I just fixed.

Douglas Katzman (dougk) on 2020-01-03
Changed in sbcl:
status: New → Fix Committed
Pierre Neidhardt (ambrevar) wrote :

Guix build CLISP on tag clisp-2.49.92-2018-02-18.
I can try with build 2.49 from 2010-07-07.
Sorry I have little time to investigate further :/

Pierre Neidhardt (ambrevar) wrote :

Great news: The build of 2.0.0 succeeded with CLISP 2.49.92-2018-02-18!
Problem solved!
Thanks for the hard work!

Stas Boukarev (stassats) on 2020-01-03
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers