compiling SBCL (2.2.0-48-g6d4619e8e) using ECL (21.2.1) on Termux

Bug #1956852 reported by alexis rivera
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
SBCL
New
Undecided
Unassigned

Bug Description

I have been trying to compile SBCL version 2.2.0-48-g6d4619e8e on Termux using ECL version 21.2.1
It seems the compilation fails on src/compiler/generic/vm-fndb.lisp with the following error:

Condition of type: SIMPLE-ERROR
bad modulus specified for MOD type specifier: 0
Available restarts:

1. (RECOMPILE) Recompile
2. (ABORT-BUILD) Abort building SBCL.
3. (RESTART-TOPLEVEL) Go back to Top-Level REPL.

Broken at #:LAMBDA26. In: #<process TOP-LEVEL 0x7f7ecf8f80>.
>>
//entering make-target-1.sh
//building runtime system and symbol table file
make: Entering directory '/data/data/com.termux/files/home/sbcl/src/runtime'
GNUmakefile:41: genesis/Makefile.features: No such file or directory
make: *** No rule to make target 'genesis/Makefile.features'. Stop.
make: Leaving directory '/data/data/com.termux/files/home/sbcl/src/runtime'

The output of uname -a is
Linux localhost 4.9.186-21635681 #1 SMP PREEMPT Wed Jul 28 15:37:01 KST 2021 aarch64 Android

I will appreciate any guidance into debugging this error. I'm attaching a file dump.txt that contains the whole print of the compiling session. Unless it a known issue that ECL cannot compile SBCL.

Thanks,
Alexis

PS. One of the things that I'll try to do is to compile SBCL with ECL under Windows to see if I get the same error.

Tags: ecl termux
Revision history for this message
alexis rivera (riveraah) wrote :
alexis rivera (riveraah)
tags: added: termux
tags: added: ecl
Revision history for this message
alexis rivera (riveraah) wrote :

Compiling SBCL with ECL on Mingw failed too :( It is documented on Bug #1956876

Revision history for this message
Stas Boukarev (stassats) wrote :

ECL 16.1.3 builds SBCL fine. So I'm not convinced SBCL is to blame.

Revision history for this message
Douglas Katzman (dougk) wrote :

Our change https://sourceforge.net/p/sbcl/sbcl/ci/74605eae024986746ac98284323775ec9cec93ea says that ECL workarounds are not necessary, but I think they certainly were for 16.1.3, so I'm not sure how "fine" is defined.
Case in point: My ECL installation is 16.1.3 and I can no longer build with it because of what appears to be a scoping problem with macrolet (identical to ABCL bug, as it happens).

Our code in src/compiler/macros is:
(defun event-statistics (&optional (min-count 1) (stream *standard-output*))
  (collect ((info))
    (maphash (lambda (k v)
               (declare (ignore k))
               (when (>= (event-info-count v) min-count)
                 (info v)))
             *event-info*)

but ECL says:
;;; Compiling (DEFUN EVENT-STATISTICS ...).
;;; Error:
;;; in file macros.lisp, position 35397
;;; at (DEFUN EVENT-STATISTICS ...)
;;; * The macro form (INFO V) was not expanded successfully.
;;; Error detected:
;;; Too few arguments supplied to a macro or a destructuring-bind form:
;;; (INFO V)

because it uses the 3-arg compiler-macro. I tested renaming the local macro, and it got past that point, but then failed in a new place.
So if we expect some version of ECL to work, we probably need to revert the aforementioned change, unless Charles can tell us the exact ECL version and its build options to use.

Revision history for this message
Stas Boukarev (stassats) wrote :

It may have been not exactly 16.1.3, but some git version that still says "16.1.3".

Revision history for this message
alexis rivera (riveraah) wrote :

I wasn't able to compile SBCL with ECL 16.1.3 either :_( Attached is the output of that compilation process.

Was it a patched version of 16.1.3?
Is this something that I need to bring to the attention of the ECL developers?

Revision history for this message
alexis rivera (riveraah) wrote :

Going back to the compilation logs of ECL 21.2.1. If I am reading the log correctly, the error is in the file src/compiler/generic/vm-fndb.lisp on one of the mod function calls. The function is being called with incorrect inputs. There are a small number of calls to the mod function, any suggestions on how to narrow the list? Or am I way off in my intepretation of the logs?

1.
(defknown layout-eq ((or instance function) t (mod 16)) boolean (flushable))

2.
(defknown allocate-vector (#+ubsan boolean
                           word index
                           ;; The number of words is later converted
                           ;; to bytes, make sure it fits.
                           (and index
                                (mod #.(- (expt 2
                                                (- sb-vm:n-word-bits
                                                   sb-vm:word-shift
                                                   ;; all the allocation routines expect a signed word
                                                   1))
                                          ;; The size is double-word aligned, which is done by adding
                                          ;; (1- (/ sb-vm:n-word-bits 2)) and then masking.
                                          ;; Make sure addition doesn't overflow.
                                          3))))
    (simple-array * (*))
    (flushable movable))

3.
(defknown make-array-header ((unsigned-byte 8) (mod #.array-rank-limit)) array
  (flushable movable))

4.
(defknown (%add-with-carry %subtract-with-borrow)
          (bignum-element-type bignum-element-type (mod 2))
  (values bignum-element-type (mod 2))
  (foldable flushable movable always-translatable))

5.
(defknown (%ashl %ashr %digit-logical-shift-right)
          (bignum-element-type (mod #.sb-vm:n-word-bits)) bignum-element-type
  (foldable flushable movable always-translatable))

Revision history for this message
Douglas Katzman (dougk) wrote :

All those things pointed out in comment #7 are macros that take type specifiers. They're correct.

Revision history for this message
Douglas Katzman (dougk) wrote :

16.1.3 should work after https://sourceforge.net/p/sbcl/sbcl/ci/75919da1.
I'd be interested to know if later versions of ECL do too. If they don't, it's rather indicative of regressions in ECL.

Revision history for this message
alexis rivera (riveraah) wrote :

OK. You are suggesting to apply the code changes in comment #9 to my source tree and compile again with 16.1.3. Did I understood correctly?

From comment #7, the way that I was trying to reason the error is that the ECL compiler incorrectly evaluated one of the arguments to the mod macro (function?) which results in the error printed in the logs. My question was if there is a way to determine which of the macros ECL failed on. Does the call to mod fails with that error when called with a zero on one of the arguments? Again, is this a valid reasoning or is this a case where the compiler bug occurred in a previous step is manifesting on this file? Thanks.

Revision history for this message
alexis rivera (riveraah) wrote :

Another thing I found is that Termux uses clang instead of gcc.

I merged the changes suggested in comment #10. The compilation went a lot further with ECL 16.1.3. But it failed with two errors:
ld.lld: error: undefined symbol: getdtablesize
>>> referenced by run-program.c:106
>>> run-program.o:(closefrom_fallback)
>>> referenced by run-program.c:106
>>> run-program.o:(closefds_from)

ld.lld: error: undefined symbol: current_thread
>>> referenced by arm64-assem.S:147
>>> arm64-assem.o:(call_into_lisp)
>>> referenced by arm64-assem.S:148
>>> arm64-assem.o:(call_into_lisp)
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)

It seems that getdtablesize is deprecated so they don't include it. I went to the run-program.c and force the call to
 maxfd = sysconf(_SC_OPEN_MAX)-1;

instead, which got rid of one error. However, I don't know how to workaround the current_thread missing symbol.

Attached are the logs.

Revision history for this message
alexis rivera (riveraah) wrote (last edit ):

Compiling using ECL 21.2.1 using fixes in comment #10 results in the same error in the bug description (bad modulus specified for MOD)

Revision history for this message
Douglas Katzman (dougk) wrote :

Try passing '--without-gcc-tls' to make-config; it might fix the 'current_thread' problem.

Revision history for this message
alexis rivera (riveraah) wrote :

compiling --without-gcc-tls got me 99% there. I had to update grovel-headers.c to include <termios.h> instead of <sys/termios.h> in order to finally build. I saw there is a LISP_FEATURE_ANDROID that would have enable that path, but I don't know what else it would have enabled.

The one thing that failed was the tests on sb-posix :(

attached is the compilation log.

I will try to compile the patched code in MINGW 32-bit to see if that addresses the build failure I saw there.

Revision history for this message
alexis rivera (riveraah) wrote :

According to https://github.com/termux/termux-packages/issues/4283
It seems there is __TERMUX__ definition in this environment. I'll test to see if it can be used to include <termios.h>

Revision history for this message
alexis rivera (riveraah) wrote :

The ECL developers say that their develop branch compiles SBCL because it fixes some compiler errors. I'm going to download their develop branch and verify the compilation without pulling the patches in comment #10 first.

Revision history for this message
alexis rivera (riveraah) wrote :

I was able to compile SBCL with the ECL develop branch without the patches on comment #10!
The only modification was to run-program.c sot it calls sysconf instead of getdtablesize when LISP_FEATURE_ANDROID is defined since getdtablesize is deprecated on that platform. I would recommend making this change.

Attach is the diff.

I compile the code with the following settings:

sh make.sh --xc-host='path-to-dev-ecl/ecl --c-stack 16777217 --norc' --prefix=path-to-sbcl-install --with-android --without-gcc-tls

I don't think I need to specify the stack size. I can verify if necessary.

Only the sb:posix package failed.

Revision history for this message
alexis rivera (riveraah) wrote (last edit ):

the version of SBCL compiled was sbcl-2.2.0-106-g900d8b685

Revision history for this message
alexis rivera (riveraah) wrote :

Is the sb:posix package maintained by SBCL or a different developer? I want to figure out if the failure to build sb:posix is due to a limitation in termux or something missing in the code.

Revision history for this message
Douglas Katzman (dougk) wrote :

Contribs are maintained by SBCL.

I have to say that I'm having trouble following this. In comment 17 you said that something works without patches from comment #10. But comment 10 doesn't seem to have patches, though it refers to "code changes in comment #9". But comment 9 refers to a git checkin which everyone should be using at this point, as it reverts the removal of ECL compatibility. Then you provided I think a short diff to 'run-program' in comment 17 which is fine; I can commit that.
But the posix problem isn't a bug - it's just "permission denied" calling opendir on "/".
We can comment out the test for #+android. Will that be adequate?

Revision history for this message
Douglas Katzman (dougk) wrote :

I don't see anything in your build features that suggest that it's anything other than Linux (which is what I guess Termux is trying to do - make it look like Linux).

If you could provide a single patch that does all of the following, it would be appreciated:

* cause 'make-config' to autodetect termux and/or android, and insert something into the target features
* add some #+/- around the failing POSIX tests (OPENDIR.1, READDIR.1) and either comments out or fixes the MKSTEMP.NULL-TERMINATE to use a a tempdir that exists
* fix run-program and/or grovel-headers (at some point I thought you said that grovel-headers wasn't finding termios)

we can check that it as "Get SBCL working under Termux" rather than having piecemeal changes that don't all work.

Revision history for this message
alexis rivera (riveraah) wrote :

I'm sorry about the confusion. Part of it was that we were troubleshooting with different versions of ECL (20.2.1 and 16.1.3) and SBCL (from git vs git + https://sourceforge.net/p/sbcl/sbcl/ci/75919da1).

To summarize, I was able to compile SBCL sbcl-2.2.0-106-g900d8b685 with ECL from their development branch using the following options:

sh make.sh --xc-host='path-to-dev-ecl/ecl --c-stack 16777217 --norc' --prefix=path-to-sbcl-install --with-android --without-gcc-tls

and this change to run-program.c (the only code change)

diff --git a/src/runtime/run-program.c b/src/runtime/run-program.c
index 929ac9914..fc48ebc40 100644
--- a/src/runtime/run-program.c
+++ b/src/runtime/run-program.c
@@ -100,7 +100,7 @@ void closefrom_fallback(int lowfd)
 {
     int fd, maxfd;

-#ifdef SVR4
+#if defined(SVR4) || defined(LISP_FEATURE_ANDROID)
     maxfd = sysconf(_SC_OPEN_MAX)-1;
 #else
     maxfd = getdtablesize()-1;

The only tests that failed were those related to sb:posix.

Next, I can try your suggestions about adding #+/- around the failing POSIX tests.
Hope this clears the picture of the state of things.

Revision history for this message
alexis rivera (riveraah) wrote :

I'm working on modifying the unit tests that failed. My approach to get the tests to work in general was to replace the hard coded "/" and "/tmp" with *current-directory*. By default, Termux doesn't give permisions to "/". In addition, "/tmp" doesn't exist.

The tests OPENDIR.1 and READDIR.1 worked when I replaced "/" directory for *current-directory*. Is it ok to make this change? This change should work in general so I don't have to add conditional compilation statements.

I tried to something similar for MKSTEMP.NULL-TERMINATE by changing
(default (make-pathname :directory '(:absolute "tmp")))
to
(default (make-pathname :directory '(:relative "tmp")))

and the last line of the test from
  t "/tmp/mkstemp-1")
to
  t "tmp/mkstemp-1")

But that didn't work. I'm trying to get this one to work.

This test seems to be creating a temporary directory of the form
#P"/tmp/mkstemp-1.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxXXXXXX"

Is the last line of code of the test the expected value? Shouldn't the expected value be that form? ie.
  t "tmp/mkstemp-1.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxXXXXXX")

The reason that I'm hesitant to put conditional compilation statements is because termux is a hybrid platform; almost like linux but not quite. I don't know which conditions define termux exclusively. #-android and #-arm?

I will appreciate your suggestions. (I'm new to lisp, this is a good learning experience)

Revision history for this message
alexis rivera (riveraah) wrote (last edit ):

changes to make ecl_dev compile SBCL on termux:
posix-tests.lisp
  opendir.1 and readdir.1 open the *current-directory* instead of "/"
  mkstemp.null-terminate creates the file under *test-directory* instead of /tmp.
     (please review, don't know if the modifications go against the intentions of the test. Not sure if the 64 limit was on the filename or the full pathname).

run-program.c
   closefrom_fallback(int) preprocessor test on LISP_FEATURE_ANDROID to use sysconf instead of getdtablesize

Revision history for this message
Stas Boukarev (stassats) wrote :

The patch doesn't seem to be related to ECL, though?

Revision history for this message
alexis rivera (riveraah) wrote (last edit ):

These were errors that occurred when I compiled SBCL with ECL on Termux. Once the ECL compiler bugs were fixed on their dev branch, these were the remaining issues that prevented me from building SBCL.

Under Termux, the posix-tests would have failed the same way if SBCL was compiled with itself because Termux's directory structure and permissions are different from Linux. Termux also deprecated the function getdtablesize.

Revision history for this message
David Pflug (dpflug) wrote :

If I might suggest: Failing POSIX tests are a correct result for Termux. It's not POSIX-compliant. / and /tmp (and access to them) are mandated by the standard: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap10.html

Revision history for this message
alexis rivera (riveraah) wrote (last edit ):
Download full text (5.9 KiB)

I'm continuing to try to compile on termux. I haven't been successful with the latest version that I've tried so far (2.2.10.128-c6465a4de). As also noted on https://bugs.launchpad.net/sbcl/+bug/1972063,
the compilation of the contrib packages fails to compile with errors like:

//entering make-target-contrib.sh
make: Entering directory '/data/data/com.termux/files/home/sbcl-master/sbcl/contrib'
make MODULE_REQUIRES="" -C sb-posix
make[1]: Entering directory '/data/data/com.termux/files/home/sbcl-master/sbcl/contrib/sb-posix'
../..//src/runtime/sbcl --noinform --core ../..//output/sbcl.core --lose-on-corruption --disable-debugger --no-sysinit --no-userinit --load ../make-contrib.lisp "sb-posix" </dev/null
; Note: Building "sb-posix"
While evaluating the form starting at line 140, column 0
  of #P"/data/data/com.termux/files/home/sbcl-master/sbcl/contrib/sb-posix/../make-contrib.lisp":
Unhandled SIMPLE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING
                                    {1001FE0003}>:
  C compilation failed

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {1001FE0003}>
0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<SIMPLE-ERROR "C compilation failed" {10034E34B3}> #<unused argument> :QUIT T)
1: (SB-DEBUG::RUN-HOOK *INVOKE-DEBUGGER-HOOK* #<SIMPLE-ERROR "C compilation failed" {10034E34B3}>)
2: (INVOKE-DEBUGGER #<SIMPLE-ERROR "C compilation failed" {10034E34B3}>)
3: (ERROR "C compilation failed")
4: (RUN-DEFS-TO-LISP (("constants" . :SB-POSIX)) "../../obj/from-self/contrib/sb-posix/generated-constants.lisp")
5: (PERFORM (DEFSYSTEM "sb-posix" :DEFSYSTEM-DEPENDS-ON ("sb-grovel") :COMPONENTS ((:FILE "defpackage") (:FILE "strtod" :DEPENDS-ON ("defpackage")) (:FILE "designator" :DEPENDS-ON ("defpackage")) (:FILE "macros" :DEPENDS-ON ("designator")) (:SB-GROVEL-CONSTANTS-FILE "constants" :PACKAGE :SB-POSIX :DEPENDS-ON ("defpackage")) (:FILE "interface" :DEPENDS-ON ("constants" "macros" "designator")))))
6: ((LAMBDA NIL :IN "/data/data/com.termux/files/home/sbcl-master/sbcl/contrib/make-contrib.lisp"))
7: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LET ((FORM (WITH-OPEN-FILE (F #) (LET # # #)))) (LET ((EVAL (GETF FORM :EVAL))) (WHEN EVAL (EVAL EVAL))) (LET ((BINDINGS (GETF FORM :BIND)) (*COMPILE-VERBOSE* NIL)) (PROGV (MAPCAR (QUOTE FIRST) BINDINGS) (MAPCAR (QUOTE SECOND) BINDINGS) (PERFORM FORM)))) #<NULL-LEXENV>)
8: (EVAL-TLF (LET ((FORM (WITH-OPEN-FILE (F #) (LET # # #)))) (LET ((EVAL (GETF FORM :EVAL))) (WHEN EVAL (EVAL EVAL))) (LET ((BINDINGS (GETF FORM :BIND)) (*COMPILE-VERBOSE* NIL)) (PROGV (MAPCAR (QUOTE FIRST) BINDINGS) (MAPCAR (QUOTE SECOND) BINDINGS) (PERFORM FORM)))) 10 NIL)
9: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (LET ((FORM (WITH-OPEN-FILE (F #) (LET # # #)))) (LET ((EVAL (GETF FORM :EVAL))) (WHEN EVAL (EVAL EVAL))) (LET ((BINDINGS (GETF FORM :BIND)) (*COMPILE-VERBOSE* NIL)) (PROGV (MAPCAR (QUOTE FIRST) BINDINGS) (MAPCAR (QUOTE SECOND) BINDINGS) (PERFORM FORM)))) 10)
10: ((LAMBDA (SB-KERNEL:FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN SB-INT:LOAD-AS-SOURCE) (LET ((FORM (WITH-OPEN-FILE (F #) (LET # # #)))) (LET ((EVAL (GETF FORM :EVAL))) (WHEN EVAL (EVAL EVAL))) (LET ((BINDINGS (GETF FORM :BIND)) (*COMPILE-...

Read more...

Revision history for this message
alexis rivera (riveraah) wrote :

Log of compilation error. ECL compiler vs SBCL 2.2.10.128-c6465a4de

Revision history for this message
alexis rivera (riveraah) wrote :

Patches added to the version 2.2.10.128-c6465a4de

The build command is sh make.sh --xc-host='path-to-ecl --norc' --prefix=output-dir --with-termux --fancy --without-gcc-tls

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.