CORRUPTION WARNING in SBCL X86 TO ARM64 cross compile

Bug #2024003 reported by Robert Palm
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Fix Released
Undecided
Unassigned

Bug Description

Hi,

I am trying to cross compile sbcl from x86 to arm64 and receive this error.

Any ideas?

Thank you.

//entering make-target-2.sh
//doing warm init - compilation phase
This is SBCL 2.3.5.117-850a8f314-WIP, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Initial page table:
        Immobile Object Counts
 Gen layout fdefn symbol code Boxed Cons Raw Code SmMix Mixed LgRaw LgCode LgMix Waste% Alloc Trig Dirty GCs Mem-age
  6 0 0 0 0 0 71 0 207 0 293 0 0 0 0.5 37228304 2000000 207 0 0.0000
Tot 0 0 0 0 0 71 0 207 0 293 0 0 0 0.5 37228304 [3.5% of 1073741824 max]
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> CORRUPTION WARNING in SBCL pid 87552 pthread 0x4c101c290:
Memory fault at 0x257448 (pc=0x25731c)
The integrity of this image is possibly compromised.
Exiting.
Error opening /dev/tty: Device not configured

Revision history for this message
Douglas Katzman (dougk) wrote :

What's the host SBCL version and feature set?
And I notice it says "-WIP" - what are the diffs you've applied?

Revision history for this message
Robert Palm (r-p-x) wrote :

Hi, thanks.

I tried SBCL 2.3.3 and SBCL 2.3.5.117-850a8f314-WIP (which I built successfully natively on my x86 machine).

Only thing I changed in the cross compile script was omitting the --touch in a tar command as it was not supported.

Revision history for this message
Douglas Katzman (dougk) wrote :

I'm not sure whether you meant x86-64 or x86(-32) as the host. I tried the following reproducer:

(1) build 32-bit x86 at 2.3.3
This is SBCL 2.3.3, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* *features*
(:X86 :GENCGC :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX
 :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-LDB :SB-PACKAGE-LOCKS :SB-THREAD
 :SB-UNICODE :SBCL :UNIX)

(2) run cross-make to gcc114.fsffrance which is an arm64 machine. It completed normally.
...
+ ssh gcc114.fsffrance.org cd devel/sbcl ';' sh make-target-2.sh '&&' sh make-target-contrib.sh sb-posix sb-bsd-sockets
//entering make-target-2.sh
make-target-2.sh: 2: set: can't access tty; job control turned off
//doing warm init - compilation phase
This is SBCL 2.3.5.117-850a8f314, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
Initial page table:
        Immobile Object Counts
 Gen layout fdefn symbol code Boxed Cons Raw Code SmMix Mixed LgRaw LgCode LgMix Waste% Alloc Trig Dirty GCs Mem-age
  6 0 0 0 0 0 70 0 207 0 293 0 0 0 0.4 37216976 2000000 207 0 0.0000
Tot 0 0 0 0 0 70 0 207 0 293 0 0 0 0.4 37216976 [3.5% of 1073741824 max]
COLD-INIT... Checking symbol printer: T
(Length(TLFs)= 4328)
"obj/from-xc/src/code/show.lisp-obj"

What OS doesn't have the "--touch" option on tar - is it macOS ?

Revision history for this message
Robert Palm (r-p-x) wrote :

Oh, yes I am very sorry. To be precise I mean AMD64.

I use FreeBSD 13.2 (amd64) as a host and OpenBSD (arm64) as target system.

Think --mount is not supported and -m didn't work either. There was a comment that it may be on sun os to be omitted, so I did.

https://man.freebsd.org/cgi/man.cgi?tar(1)

Think in the section where it runs down all this 302 steps or what it is there was a bit more popping up as with my native build. Think it was related to some number issues. I can attach a log if this has something to do with this bug.

Thank you for looking into it!

Revision history for this message
Robert Palm (r-p-x) wrote :

I am running a command like

./cross-make.sh sync <email address hidden> /usr/local/sbcl SBCL_ARCH=arm64

with latest cloned git repository on both sides at identical place.

Revision history for this message
Robert Palm (r-p-x) wrote :

I meant --touch not --mount.

Revision history for this message
Douglas Katzman (dougk) wrote :

sounds like we have an OpenBSD compatibility problem. I can't help you there.

Revision history for this message
Robert Palm (r-p-x) wrote :

Ok, so you say it is because of this option?

Revision history for this message
Douglas Katzman (dougk) wrote :

considering that I routinely build on macOS, and I just tested on Linux, an educated guess would be that yes it's related to OpenBSD

Revision history for this message
Robert Palm (r-p-x) wrote :

Okay, interesting. I saw someone else was trying to build on OpenBSD arm64, but natively.

https://bugs.launchpad.net/sbcl/+bug/2009585

Maybe I can ask him or someone else from OpenBSD folks.

Many thanks for your quick feedback!

Revision history for this message
Robert Palm (r-p-x) wrote :

Sorry, me again. Wonder about a couple of things.

Observed when using a fresh cloned git repo and do a ./cross-make.sh I get stuck with:

//entering make-host-1.sh
make-host-1.sh: 24: .: cannot open output/build-config: No such file

Is it mandatory to do a ./make.sh before a ./cross-make.sh ?

When I try that there seems to be a output file already existing (not a directory).

mkdir: cannot create directory 'output': File exists

Okay, I can delete that. After that do a ./make.sh it works.

I could generally start a ./cross-make.sh on my BSD machine.

But, as I didn't have luck with my BSD host I tried to use a different host:

Linux ws-1 5.10.0-21-arm64 #1 SMP Debian 5.10.162-1 (2023-01-21) aarch64 GNU/Linux with

SBCL 2.1.1.debian

Now, when I try a ./make.sh I get this new error:

//guessing default target CPU architecture from host architecture
//setting up CPU-architecture-dependent information
sbcl_arch="arm64"
//initializing /usr/local/sbcl/local-target-features.lisp-expr
//setting up OS-dependent information
make: Entering directory '/usr/local/sbcl/tools-for-build'
cc -I../src/runtime -std=gnu99 determine-endianness.c -ldl -Wl,-no-as-needed -o determine-endianness
make: cc: No such file or directory
make: *** [<builtin>: determine-endianness] Error 127
make: Leaving directory '/usr/local/sbcl/tools-for-build'

Revision history for this message
Robert Palm (r-p-x) wrote :

Sorry for my last point, I needed clang :/

Revision history for this message
Robert Palm (r-p-x) wrote :

I think you are right. I could cross compile from freebsd (amd64) to debian (arm64) successfully (even without --touch option of tar).

Revision history for this message
Robert Palm (r-p-x) wrote :

Copied in 3 patches I found in the OpenBSD Port.

Now it is a slightly different error message.

ldb> fatal error encountered in SBCL pid 40193 pthread 0x457a61290:
maximum interrupt nesting depth (8) exceeded

Think something similar already occured in the past:

https://github.com/darlinghq/darling/issues/930
https://sourceforge.net/p/sbcl/mailman/message/18414750/

How to increase the depth ?

Place to start at ? /src/runtime/interrupt.h

----------------

Don't try to guess (wrong) with clang. Just assume we have pie

Index: src/runtime/Config.generic-openbsd
--- src/runtime/Config.generic-openbsd.orig
+++ src/runtime/Config.generic-openbsd
@@ -17,11 +17,7 @@ CFLAGS += -pthread
 OS_LIBS += -pthread
 endif

-ifeq ($(DISABLE_PIE),yes)
-ifneq ($(shell $(CC) -dM -E - < /dev/null 2>/dev/null | grep -e '__clang__'),)
 CFLAGS += -fno-pie
 LINKFLAGS += -nopie
 LDFLAGS += -nopie
 __LDFLAGS__ += -nopie
-endif
-endif

----------------

Index: src/runtime/GNUmakefile
--- src/runtime/GNUmakefile.orig
+++ src/runtime/GNUmakefile
@@ -33,7 +33,7 @@ __LDFLAGS__ =

 include ../../output/prefix.def

-CFLAGS += -g -Wall -Wundef -Wsign-compare -Wpointer-arith -O3
+CFLAGS += -Wall -Wundef -Wsign-compare -Wpointer-arith
 ASFLAGS += $(CFLAGS)
 CPPFLAGS += -I.

----------------

ffsl is non-standard, but both gcc and clang have it as builtin...
clang only has it as builtin

Index: src/runtime/gc-common.c
--- src/runtime/gc-common.c.orig
+++ src/runtime/gc-common.c
@@ -58,6 +58,8 @@
 #define LONG_FLOAT_SIZE 3
 #endif

+#define ffsl __builtin_ffsl
+
 os_vm_size_t dynamic_space_size = DEFAULT_DYNAMIC_SPACE_SIZE;
 os_vm_size_t thread_control_stack_size = DEFAULT_CONTROL_STACK_SIZE;

----------------

Revision history for this message
Douglas Katzman (dougk) wrote :

Do not increase the interrupt nesting level. It should not get any nested interrupts. (Problem reports from 2008 are not really relevant)
Also I don't see any call to ffsl() but there is a call to ffs(). Where are you seeing ffsl used?

Revision history for this message
Robert Palm (r-p-x) wrote :

No, I just applied those as they were.

Asked OpenBSD devs: https://marc.info/?t=168726110500001&r=1&w=2

Pretty stuck.

Any ideas ?

Revision history for this message
Stas Boukarev (stassats) wrote :

I was able to run on openbsd 7.3 after applying the attached patch, maybe it's related to the problem?

Revision history for this message
Robert Palm (r-p-x) wrote :

I cannot express how thankful I am.

Tried several things in this config files but I do not know enough to find this.

How did you manage to find out? How did you debug this?

Many thanks!!!!

---------------------------
ws-4# uname -a
OpenBSD ws-4.my.domain 7.3 GENERIC.MP#2164 arm64
ws-4# sbcl
This is SBCL 2.3.5.138-8c5f7ec57-WIP, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* (machine-instance)
"ws-4.my.domain"
* (machine-type)
"ARM64"
* (machine-version)
"ARM Neoverse N1 r3p1"
* (software-type)
"OpenBSD"
* (software-version)
"7.3"
-----------------------------

Here is my git diff:

-----------------------------
diff --git a/cross-make.sh b/cross-make.sh
index 370c9a12e..ba57df215 100755
--- a/cross-make.sh
+++ b/cross-make.sh
@@ -53,7 +53,7 @@ mv build-id.inc output
 sh make-host-1.sh
 # workaround small amounts of clock skew by using --touch on the extraction
 # You'll probably have to remove the --touch when building for SunOS
-tar cf - src/runtime/genesis | ssh $ssh_port_opt $host cd $root \; tar xf - --touch
+tar cf - src/runtime/genesis | ssh $ssh_port_opt $host cd $root \; tar xf -

 # make-target-1 and copy back the artifacts
 ssh $ssh_port_opt $host cd $root \; $ENV sh make-target-1.sh
diff --git a/src/runtime/Config.arm64-openbsd b/src/runtime/Config.arm64-openbsd
index 929d8a627..9a91ca0cf 100644
--- a/src/runtime/Config.arm64-openbsd
+++ b/src/runtime/Config.arm64-openbsd
@@ -12,4 +12,4 @@
 include Config.arm64-bsd
 include Config.generic-openbsd

-LINKFLAGS += -Wl,--export-dynamic
+LINKFLAGS += -Wl,--export-dynamic -Wl,--no-execute-only
-----------------------------

I would say now it is possible to extend the OpenBSD port to the ARM64 arch.

There are some more (existing) patches in the OpenBSD port but maybe those are outdated?

One thing I still wonder about is the --touch option. Is it really needed ? And I don't understand why the equivalent (?) -m Option doesn't work.

Super cool :-)

Revision history for this message
Stas Boukarev (stassats) wrote :

> How did you manage to find out? How did you debug this?

I had a previously working build which stopped working after upgrading to 7.3, the debugger pointed to an instruction which was loading from a pc-relative address. After being confused for a bit I found in https://www.openbsd.org/73.html

>Some architectures now have non-readable code ("xonly"), both from the perspective of userland reading its own memory, or the kernel trying to read memory in a system call. Many sloppy practices in userland code had to be repaired to allow this. The linker (ld.lld(1) or ld.bfd(1)) option --execute-only is enabled by default. In order of development: arm64, riscv64, hppa, amd64, powerpc64, powerpc (G5 only), octeon, and sparc64 (sun4u only; unfinished).

Revision history for this message
Sebastien Marie (semarie) wrote :

arm64 (and some x86_64 if I remember well) has execute-only enforced in hardware. It means that the memory where the executable code is, isn't readable.

it was enforced in 7.3.

Regarding sbcl, a thread on ports@ mailing-list mentioned it: https://marc.info/?l=openbsd-ports&m=167483553608781&w=2.

Passing --no-execute-only to the linked is the simpler fix. The (better) other way would be to make sbcl to not required to read the code segment.

Revision history for this message
Stas Boukarev (stassats) wrote :

It wouldn't be better as the C part is largely irrelevant, as it's pretty small compared to the Lisp part (which even needs the code to be writable).

And it's openbsd's own assembler which puts the constant in

ldr x, =constant

into execute-only space.

Changed in sbcl:
status: New → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.