clisp crashed in impish compiling xindy

Bug #1931531 reported by Christian Ehrhardt 
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
clisp (Ubuntu)
Fix Released
Undecided
Unassigned
xindy (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

clisp segfaults wehn compiling xindy in impish

for i in src tex2xindy user-commands; do /usr/bin/make -C $i all || exit 1; done
make[1]: Entering directory '/<<PKGBUILDDIR>>/src'
sed 's|@MODULEDIR[@]|/usr/lib/xindy/modules|g' <./defaults.xdy.in >defaults.xdy
/usr/bin/clisp -q -E iso-8859-1 -c base.lsp -o base.fas
;; Compiling file /<<PKGBUILDDIR>>/src/base.lsp ...
;; Wrote file /<<PKGBUILDDIR>>/src/base.fas
0 errors, 0 warnings
/usr/bin/clisp -q -E iso-8859-1 -c ordrules.lsp -o ordrules.fas
;; Compiling file /<<PKGBUILDDIR>>/src/ordrules.lsp ...
*** - handle_fault error2 ! address = 0x45b926e6958 not in [0x50000020000,0x500000803c8) !
SIGSEGV cannot be cured. Fault address = 0x45b926e6958.
GC count: 4
Space collected by GC: 3451320
Run time: 0 84663
Real time: 0 472415
GC time: 0 18809
Permanently allocated: 165256 bytes.
Currently in use: 4447024 bytes.
Free space: 52452 bytes.
make[1]: *** [Makefile:507: ordrules.fas] Segmentation fault (core dumped)

Chances are this is related to another case which is clisp no-change-rebuild being FTFBS on ppc64/s390x which ?by accident? are exactly the platform this crashes.
Maybe a library transitioned as there was no test hitting this, but now that all is in impish this indirectly breaks the xindy builds - just a theory for now.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Attaching a full backtrace of the fail.

tags: added: update-excuse
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Clisp error:
/home/ubuntu/clisp-2.49.20180218+really2.49.92/src/lispbibl.d:6673:6: error: #error oint_addr_mask does not cover CODE_ADDRESS_RANGE !!
 6673 | #error oint_addr_mask does not cover CODE_ADDRESS_RANGE !!
      |

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Note:
On Debian s390x xindy build doesn't even start to build as it has missing
build-depdencies.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The code for the error in clisp is:

/* Verify the oint_addr_shift value w.r.t. the autoconfigured CODE_ADDRESS_RANGE
   and MALLOC_ADDRESS_RANGE values. */
#if !defined(WIDE_SOFT)
  /* The CODE_ADDRESS_RANGE needs to be checked because we store code
     pointers in Lisp objects (cf. macro ThePseudofun).
     In case of TYPECODES, the typecode() of such pointers must be machine_type,
     otherwise gc_mark() gets confused and crashes. */
  #if (CODE_ADDRESS_RANGE >> addr_shift) & ~(oint_addr_mask >> oint_addr_shift)
    #error oint_addr_mask does not cover CODE_ADDRESS_RANGE !!
  #endif

Values are:
CODE_ADDRESS_RANGE 0x000002AA09000000UL
addr_shift 0
oint_addr_mask (((2UL<<((64)-1))-1) & ~(0x01FC00000000UL | (1UL<<63)))
oint_addr_shift 0

So we can ignore the shft and this comes down to
  CODE_ADDRESS_RANGE & ~(oint_addr_mask)
  0x000002AA09000000UL & ~((((2UL<<((64)-1))-1) & ~(0x01FC00000000UL | (1UL<<63))))
= 0x000002AA09000000UL & ~(FFFFFFFFFFFFFFFF & 7FFFFE03FFFFFFFF)
= 0x000002AA09000000UL & 800001FC00000000
= 000000A800000000

I need a good case to compare it, the abstraction layers in the includes are too complex for someone not knowing the code to just read-what-should-happen

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I've built the same in Hirsute, just as broken :-/
Last successful build was in Focal.
Note Focal has the same clisp version and only two debian packaging
versions bumped for xindy.
So I'll start at Focal trying to find a working environment.

build impish version of clisp in impish - fail
build impish version of xindy in impish - fail
build impish version of clisp in hirsute - fail
build impish version of xindy in hirsute - fail
build impish version of clisp in focal - fail
build impish version of xindy in focal - fail

Grml, the last published good build is from focal and carried on since then.
But it was actually built in eoan. So we'd want to start disco/eoan with this.

In Bionic as the older still available release there is no clisp yet.
So xindy can't be built, and clisp even there fails the same way.

build impish version of clisp in bionic - fail
build impish version of xindy in bionic - n/a

Thereby BTW these two packages as-is are FTBFS in F-I and thereby non-serviceable
at the moment :-/

It almost seems that there was a magic time when the stars aligned around
eoan everything built back then and since then it won't build anymore

Right now I'm unsure what to do next about this ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Good build log:
https://launchpadlibrarian.net/440395396/buildlog_ubuntu-eoan-s390x.clisp_1%3A2.49.20180218+really2.49.92-3build3_BUILDING.txt.gz

Happened mid Eoan on 2019-09-05 so final Eoan might be just as bad as focal.

Build xindy/clisp in Disco - same fail
Build xindy/clisp in Eoan - same fail

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

As last resort I have built the latest clisp from upstream
without packaging magic just the way upstream intends to.

But since the last version from upstream is more than a decade old from
2010-07-07 maybe the time has come to ignore it.
Latest release still is that from
 https://ftp.gnu.org/pub/gnu/clisp/release/latest/
We are on beta releases but even that is from 2018 and old by now
 https://gitlab.com/gnu-clisp/clisp/-/tags

All fail the same way on all releases.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I'll time-box my activity for now, since it works on other platforms
a removal might be too hard.

But if you happen to find other related issues please add them here so that
when it gets too much it can be considered to be removed.

For the sake of not fully giving up I've reported it upstream at:
https://sourceforge.net/p/clisp/mailman/clisp-devel/thread/CAATJJ0KdgVUA6kb_QQVBVgFcKuyeCF_9Z4NcmVokfydhhYx3%2BQ%40mail.gmail.com/#msg37300059

Revision history for this message
Steve Langasek (vorlon) wrote :

Looking into the build failure of clisp on Ubuntu ppc64el, it looks like a difference in behavior of KASLR. Output from the failure log in the bug shows:

checking for the code address range... 0x000002AA39000000

Successive invocations on Ubuntu with a focal kernel show:

checking for the code address range... 0x000003E0F6000000
checking for the code address range... 0x00000654DF000000

And successive invocations on Debian show:

checking for the code address range... 0x0000000113000000

And the last successful build in Ubuntu had:

checking for the code address range... 0x0000000110000000

This is with focal 5.4.0 kernel.

And the build failure is also reproducible in a sid chroot on top of a focal kernel.

Revision history for this message
Steve Langasek (vorlon) wrote :

Correction, this doesn't seem to be a change in the KASLR behavior, so much as in the base addresses used by the kernel for mappings.

I'm not sure how to figure out a "correct" address mask to use for this on ppc64el.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote (last edit ):

So I think what changed to break this was this change in the kernel: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=47ebb09d54856500c5a5e14824781902b3bb738e which I'm pretty sure got backported everywhere. But this is only the base address for dynamic executables. I wonder if it will build if we disable PIE by default...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

If I disable PIE for ppc64el it builds on my canonistack instance, but all architectures fail to build in my PPA for what seem to be unrelated reasons :(

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Ah so the unrelated reasons would be that glibc snapshot I have in that PPA -- clisp with PIE disabled for s390x and ppc64el built fine in a different PPA and then xindy built with the resulting clisp too so I'll upload that. It would be better to fix this properly by choosing better addresses for clisp but well for now it will do.

Changed in xindy (Ubuntu):
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package clisp - 1:2.49.20180218+really2.49.92-3ubuntu1

---------------
clisp (1:2.49.20180218+really2.49.92-3ubuntu1) impish; urgency=medium

  * Disable PIE when building for ppc64el and s390x as PIE executables get
    loaded at addresses that do not work with clisp's GC. (LP: #1931531)

 -- Michael Hudson-Doyle <email address hidden> Tue, 13 Jul 2021 10:59:10 +1200

Changed in clisp (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.