Fails to build on riscv64

Bug #1934555 reported by Sébastien Villemot
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

Hi,

I’m trying to build SBCL 2.1.6 on riscv64 (for bootstrapping the official Debian package).

I’m cross-compiling it from an x86-64 machine, following the recipe in make.sh. Since I don’t have access to real RISC-V hardware, I’m using a QEMU virtual machine.

The process crashes in make-target-2.sh, with the following message:

//entering make-target-2.sh
//doing warm init - compilation phase
This is SBCL 2.1.6, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Initial page table:
Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age
 6 6240 4381 0 0 0 0 0 43476496 27120 2000000 10621 0 0.0000
           Total bytes allocated = 43476496
           Dynamic-space-size bytes = 1073741824
COLD-INIT... (Length(TLFs)= 19456)
"obj/from-xc/src/code/early-source-location.lisp-obj"
"obj/from-xc/src/code/show.lisp-obj"
[…]
"obj/from-xc/src/compiler/riscv/arith.lisp-obj"
"obj/from-xc/src/compiler/riscv/pred.lisp-obj"
"obj/from-xc/src/compiler/float-tran.lisp-obj" Argh! corrupted error depth, halting
fatal error encountered in SBCL pid 3340:
%PRIMITIVE HALT called; the party is over.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> backtrace
Backtrace:
   0: [I*]0x3fc59b8390 pc=0x4f28e1d8 {0x4f28e100+00d8} SB-KERNEL::INFINITE-ERROR-PROTECTOR
   1: 0x3fc59b8350 pc=0x4f28e5a0 {0x4f28e350+0250} ERROR
   2: 0x3fc59b8290 pc=0x4f2aa800 {0x4f2aa430+03d0} SB-KERNEL::WITH-SIMPLE-CONDITION-RESTARTS
   3: 0x3fc59b8140 pc=0x4f2ab2b0 {0x4f2aabc0+06f0} SB-KERNEL::ASSERT-ERROR
   4: 0x3fc59b8100 pc=0x50b61010 {0x50b60e50+01c0} (SB-C::TOP-LEVEL-FORM ())
   5: 0x3fc59b8000 pc=0x5177f450 {0x5177d820+1c30} SB-KERNEL::!COLD-INIT
Note: [I] = interrupted, [*] = no LRA
ldb>

Note that this is similar to the failure already reported at:
https://sourceforge.net/p/sbcl/mailman/sbcl-bugs/thread/mvmmu2k3yj2.fsf%40suse.de/#msg37092980

If you need a setup a virtual machine for debugging this, note that there are ready-to-use Debian QEMU images for riscv64 (along with other archs) available at:
https://people.debian.org/~gio/dqib/

Thanks,

Revision history for this message
Douglas Katzman (dougk) wrote :

I don't know that any of the devs are paying attention to the riscv code.
Since you have by this bug report expressed interest in it, would be able to identify the first breaking change? riscv was definitely working as of:
commit dc3d0fe45c5c331ab4030465fda72d1f0d8cd8b9
Author: Douglas Katzman <email address hidden>
Date: Thu Dec 3 01:04:24 2020 -0500
    riscv: implement layout-id fixups

I did a 'git bisect start' to see how many revisions are entailed since then:
  Bisecting: 406 revisions left to test after this (roughly 9 steps)
So it's not too onerous for someone who cares to do it, but is for someone who doesn't.

Revision history for this message
Sébastien Villemot (sebastien-villemot) wrote :

Unfortunately I also get a crash in make-target-2.sh with commit dc3d0fe45c5c331ab4030465fda72d1f0d8cd8b9, though slightly different:

Initial page table:
Gen Boxed Code Raw LgBox LgCode LgRaw Pin Alloc Waste Trig WP GCs Mem-age
 6 5814 3875 0 0 0 0 0 39658560 27584 2000000 9689 0 0.0000
           Total bytes allocated = 39658560
           Dynamic-space-size bytes = 1073741824
COLD-INIT... (Length(TLFs)= 9600)
Argh! corrupted error depth, halting
fatal error encountered in SBCL pid 4332:
%PRIMITIVE HALT called; the party is over.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> backtrace
Backtrace:
   0: SB-KERNEL::INFINITE-ERROR-PROTECTOR 0x4f33618f [interrupted] fp = 0x3ff59df390 <no LRA> pc_ofs = 0xd8
   1: ERROR 0x4f33632f fp = 0x3ff59df350 LRA = 0x4f33657f pc_ofs = 0x1c0
   2: SB-KERNEL::WITH-SIMPLE-CONDITION-RESTARTS 0x4f323b1f fp = 0x3ff59df290 LRA = 0x4f323eef pc_ofs = 0x330
   3: SB-KERNEL::ASSERT-ERROR 0x4f3241bf fp = 0x3ff59df140 LRA = 0x4f3248af pc_ofs = 0x5a0
   4: (SB-C::TOP-LEVEL-FORM ()) 0x509ece9f fp = 0x3ff59df100 LRA = 0x509ed05f pc_ofs = 0x130
   5: SB-KERNEL::!COLD-INIT 0x513ff58f fp = 0x3ff59df000 LRA = 0x5140148f pc_ofs = 0x1b10
ldb>

Revision history for this message
Heinrich Schuchardt (xypron) wrote :

Is this problem specific to Ubuntu. Shouldn't it better be addressed upstream?

The problem is reproducible using Ubuntu Impish on HiFive Unmatched

Best regards

Heinrich

Changed in sbcl:
status: New → Confirmed
Revision history for this message
Douglas Katzman (dougk) wrote :

Digging into this a bit, I found that riscv has a bug in EQL on (complex single-float).
* (eql #c(1s0 1s0) #c(1s0 1s0)) => NIL

Why it matters: there are assertions that compile-time foldable calls such as (COMPLEX 1s0 1s0) are converted to a literal value - as if written using #C reader syntax, and one such assertion occurs in cold-init, which fails too early for error recovery. This is the exact failure that Andreas Schwab reported. And also as stated, the failure goes away if :sb-show is enabled.
So apparently when I had a passing build at the revision which I identified in comment #1, I must have had :sb-show enabled as is typical of my workflow for assumed-buggy builds.

All told, I think this would have to be kicked back to the person who is most knowledgeable about the riscv port because I for one don't really care to fix the EQL function. But I can remove the assertion about EQL working, and see how far it gets.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers