fftw3 ftbfs in eoan (armhf only)

Bug #1843733 reported by Matthias Klose
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
FFTW
New
Unknown
fftw3 (Debian)
Fix Released
Unknown
fftw3 (Ubuntu)
Fix Released
High
Daniel van Vugt
Eoan
Won't Fix
High
Daniel van Vugt

Bug Description

fftw3 ftbfs in eoan (armhf only).

https://launchpadlibrarian.net/441265031/buildlog_ubuntu-eoan-armhf.fftw3_3.3.8-2_BUILDING.txt.gz

( cd tests ; /usr/bin/make smallcheck )
make[1]: Entering directory '/<<PKGBUILDDIR>>/tests'
perl -w ./check.pl -r -c=1 -v `pwd`/bench
Executing "/<<PKGBUILDDIR>>/tests/bench --verbose=1 --verify 'ok7e10x12bv29' --verify 'ik7e10x12bv29' --verify '//obr2304' --verify '//ibr2304' --verify '//ofr2304' --verify '//ifr2304' --verify 'obr2304' --verify 'ibr2304' --verify 'ofr2304' --verify 'ifr2304' --verify '//obc2304' --verify '//ibc2304' --verify '//ofc2304' --verify '//ifc2304' --verify 'obc2304' --verify 'ibc2304' --verify 'ofc2304' --verify 'ifc2304'"
ok7e10x12bv29 1.71318e-07 1.44516e-06 2.00887e-07
ik7e10x12bv29 1.66737e-07 1.44901e-06 1.76717e-07
Segmentation fault (core dumped)
FAILED /<<PKGBUILDDIR>>/tests/bench: --verify 'ok7e10x12bv29' --verify 'ik7e10x12bv29' --verify '//obr2304' --verify '//ibr2304' --verify '//ofr2304' --verify '//ifr2304' --verify 'obr2304' --verify 'ibr2304' --verify 'ofr2304' --verify 'ifr2304' --verify '//obc2304' --verify '//ibc2304' --verify '//ofc2304' --verify '//ifc2304' --verify 'obc2304' --verify 'ibc2304' --verify 'ofc2304' --verify 'ifc2304'
make[1]: *** [Makefile:723: smallcheck] Error 1
make[1]: Leaving directory '/<<PKGBUILDDIR>>/tests'

Tags: ftbfs
Matthias Klose (doko)
Changed in fftw3 (Ubuntu):
status: New → Confirmed
importance: Undecided → High
tags: added: ftbfs rls-ee-incoming
Will Cooke (willcooke)
Changed in fftw3 (Ubuntu):
assignee: nobody → Daniel van Vugt (vanvugt)
tags: removed: rls-ee-incoming
Changed in fftw3 (Ubuntu Eoan):
status: Confirmed → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

So far I can't reproduce any crash in the tests on armhf (or aarch64). But I haven't tried everything yet...

In the meantime, can anyone reproduce the failure and produce a fresh log? If so, can you verify for me which binary is crashing?

Changed in fftw3 (Ubuntu Eoan):
status: In Progress → Incomplete
Revision history for this message
Matthias Klose (doko) wrote :
Changed in fftw3 (Ubuntu Eoan):
status: Incomplete → Confirmed
Revision history for this message
Sebastien Bacher (seb128) wrote :

Trying to debug a bit on a canonistack instance

==5906== Command: ./bench --verify ofr108*118
==5906==
==5906== Invalid write of size 8
==5906== at 0x49174A8: vst1_f32 (arm_neon.h:10876)
==5906== by 0x49174A8: ST (simd-neon.h:119)
==5906== by 0x49174A8: n1fv_9 (n1fv_9.c:230)
==5906== by 0x4884F4F: dobatch (direct.c:51)
==5906== by 0x48850B5: apply_buf (direct.c:87)
==5906== by 0x48869B1: fftwf_dft_solve (solve.c:29)
==5906== by 0x4882407: measure (timer.c:136)
==5906== by 0x4882407: fftwf_measure_execution_time (timer.c:159)
==5906== by 0x4880725: evaluate_plan (planner.c:460)
==5906== by 0x48808B9: search0 (planner.c:529)
==5906== by 0x48809B1: search (planner.c:600)
==5906== by 0x48809B1: mkplan (planner.c:711)
==5906== by 0x4880D2D: fftwf_mkplan_d (planner.c:970)
==5906== by 0x48B5A0F: mkplan (ct-hc2c.c:198)
==5906== by 0x48807E3: invoke_solver (planner.c:486)
==5906== by 0x48807E3: search0 (planner.c:529)
==5906== by 0x48809B1: search (planner.c:600)
==5906== by 0x48809B1: mkplan (planner.c:711)
==5906== Address 0x96bafa8 is not stack'd, malloc'd or (recently) free'd
==5906==
==5906==
==5906== Process terminating with default action of signal 11 (SIGSEGV)
==5906== Access not within mapped region at address 0x96BAFA8
==5906== at 0x49174A8: vst1_f32 (arm_neon.h:10876)
==5906== by 0x49174A8: ST (simd-neon.h:119)
==5906== by 0x49174A8: n1fv_9 (n1fv_9.c:230)
==5906== by 0x4884F4F: dobatch (direct.c:51)
==5906== by 0x48850B5: apply_buf (direct.c:87)
==5906== by 0x48869B1: fftwf_dft_solve (solve.c:29)
==5906== by 0x4882407: measure (timer.c:136)
==5906== by 0x4882407: fftwf_measure_execution_time (timer.c:159)
==5906== by 0x4880725: evaluate_plan (planner.c:460)
==5906== by 0x48808B9: search0 (planner.c:529)
==5906== by 0x48809B1: search (planner.c:600)
==5906== by 0x48809B1: mkplan (planner.c:711)
==5906== by 0x4880D2D: fftwf_mkplan_d (planner.c:970)
==5906== by 0x48B5A0F: mkplan (ct-hc2c.c:198)
==5906== by 0x48807E3: invoke_solver (planner.c:486)
==5906== by 0x48807E3: search0 (planner.c:529)
==5906== by 0x48809B1: search (planner.c:600)
==5906== by 0x48809B1: mkplan (planner.c:711)

Revision history for this message
Sebastien Bacher (seb128) wrote :

Reported upstream on https://github.com/FFTW/fftw3/issues/182 , with some luck they have an idea about the bug

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

When I build the native armhf tarball on armhf and then run the same command I find many different bugs:

$ valgrind ./bench --verify ofr108*118
...
==27624== ERROR SUMMARY: 8257294 errors from 859 contexts (suppressed: 0 from 0)

This is why I was reluctant to use Valgrind. Because more often than not a project will be full of errors other than the one you're looking for, and Valgrind will report them all.

I'm not yet sure the error in comment #3 is the error we are looking for in this bug. Although if it's the only error you get from a deb build then it does seem more likely. It will take me another day or so to confirm.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Also without Valgrind the above command does not crash at all on armhf for me. So I'm now working on the theory this bug is a consequence of the deb's custom build flags, which includes neon supporting the idea in comment #3.

Changed in fftw3:
status: Unknown → New
Changed in fftw3 (Ubuntu Eoan):
status: Confirmed → In Progress
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

There's an additional bug that's slowing me down right now:

If you run 'bench' under gdb or just compile the package using gcc-8 then 'bench' spins forever at 100% CPU never completing any tests.

I will try to avoid having to fix that bug here...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package fftw3 - 3.3.8-2ubuntu1

---------------
fftw3 (3.3.8-2ubuntu1) focal; urgency=medium

  * debian/rules: Disable Neon for armhf single precision builds to avoid
    crashes. It's unclear if this is an fftw3 internal bug in its code
    generator (which fftw3 does by hand) or related to gcc changes in recent
    years. But certainly disabling Neon avoids the crashes. (LP: #1843733)

 -- Daniel van Vugt <email address hidden> Sat, 30 Nov 2019 23:40:05 +0100

Changed in fftw3 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

We probably don't need to fix eoan too (?)

Changed in fftw3 (Ubuntu Eoan):
status: In Progress → Won't Fix
Changed in fftw3 (Debian):
status: Unknown → Confirmed
Changed in fftw3 (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.