Raspbian version of libllvm7 miscompiles floating point code

Bug #1836203 reported by Matthijs Kooijman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Raspbian
Fix Released
Undecided
Unassigned

Bug Description

I ran into a problem compiling the typenums rust package (things like `the name N1000 is defined multiple times` for anyone googling). I traced this down to a problem with the LLVM optimizer. However, replacing the `libllvm7` package with the version from Debian Buster armhf makes things work as expected, which makes me report this bug here.

Comparing the debian and Raspbian version of llvm-toolchain-7, it seems pretty much the only relevant change is using armv6 rather than arm7. This suggests that either something in the way this change is implemented by Raspbian is causing this problem, or upstream has a bug in their armv6 code generation that does not occur for armv7. I've also tried the below testcase with llvm-9, which does work. Looking at changelog.Debian the Raspbian-specific patches are unchanged so that suggests this might be an upstream problem in LLVM that was since fixed.

However, since Buster's rust compiler (and probably others) is linked against llvm-7, it might still be relevant to fix this.

To reproduce, here's the testcase. This is just a function that calculates the 2-log of 1024 (by calculating the natural log of 1024 and 2 and dividing them). I've also added functions that individually calculate these logs, so you can see that it's not the division itself triggering the problem.

$ cat log.ir
declare double @llvm.log.f64(double %Val)

define double @foo() {
start:
  %0 = call double @llvm.log.f64(double 1.024000e+03) #8
  %1 = call double @llvm.log.f64(double 2.000000e+00) #8
  %2 = fdiv double %0, %1
  ret double %2
}

define double @bar() {
start:
  %0 = call double @llvm.log.f64(double 1.024000e+03) #8
  ret double %0
}

define double @baz() {
start:
  %0 = call double @llvm.log.f64(double 2.000000e+00) #8
  ret double %0
}

With libllvm7 from Debian, this works as expected. The optimizer replaces the log calls with constants:

$ opt-7 log.ir -S -instcombine
; ModuleID = 'log.ir'
source_filename = "log.ir"

; Function Attrs: nounwind readnone speculatable
declare double @llvm.log.f64(double) #0

define double @foo() {
start:
  ret double 1.000000e+01
}

define double @bar() {
start:
  ret double 0x401BB9D3BEB8C86B
}

define double @baz() {
start:
  ret double 0x3FE62E42FEFA39EF
}

attributes #0 = { nounwind readnone speculatable }

With libllvm7 from Raspbian, this breaks. Every log call is replaced by a NaN value:

$ opt-7 log.ir -S -instcombine
; ModuleID = 'log.ir'
source_filename = "log.ir"

; Function Attrs: nounwind readnone speculatable
declare double @llvm.log.f64(double) #0

define double @foo() {
start:
  ret double 0x7FF8000000000000
}

In the original

define double @bar() {
start:
  ret double 0x7FF8000000000000
}

define double @baz() {
start:
  ret double 0x7FF8000000000000
}

attributes #0 = { nounwind readnone speculatable }

In the original `rustc -C print-after-all` output of the original testcase, I also noticed that the EarlyCSE transformation evaluates log(1024) incorrectly, but I could not find which opt option to use to reproduce this. I suspect this has the same underlying problem, though I'm not sure.

*** IR Dump After SROA ***
; Function Attrs: inlinehint nounwind nonlazybind readnone uwtable
define internal fastcc double @"_ZN3std3f6421_$LT$impl$u20$f64$GT$3log17hca0d05b8ada2f919E"() unnamed_addr #1 personality i32 (i32, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = call double @llvm.log.f64(double 1.024000e+03) #8
  %1 = call double @llvm.log.f64(double 2.000000e+00) #8
  %2 = fdiv double %0, %1
  ret double %2
}
*** IR Dump After Early CSE w/ MemorySSA ***
; Function Attrs: inlinehint nounwind nonlazybind readnone uwtable
define internal fastcc double @"_ZN3std3f6421_$LT$impl$u20$f64$GT$3log17hca0d05b8ada2f919E"() unnamed_addr #1 personality i32 (i32, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = call double @llvm.log.f64(double 2.000000e+00) #8
  %1 = fdiv double 0x747BB550747BA440, %0
  ret double %1
}

For completeness, here is (as somewhat stripped down version) of the original rust code that showed the problem:

$ cat main.rs
fn main() {
    let highest: u64 = 1024;

    let first2: u32 = (highest as f64).log(2.0) as u32 + 1;
    let first10: u32 = (highest as f64).log(10.0) as u32 + 1;
    let uints2 = (0..(highest + 1))
        .chain((first2..64).map(|i| 2u64.pow(i)))
        .chain((first10..20).map(|i| 10u64.pow(i)));

    // Expected to print 1 to 1024, followed by powers of two starting at 2048 and powers of ten
    // starting at 10000.
    for u in uints2 {
        print!("{}, ", u);
    }
    println!();
}
$ rustc -O main.rs
$ ./main

And here are some versions of installed packages:
ii libstd-rust-dev:armhf 1.34.2+dfsg1-1+rpi1 armhf Rust standard libraries - development files
ii rustc 1.34.2+dfsg1-1+rpi1 armhf Rust systems programming language
ii libstd-rust-1.34:armhf 1.34.2+dfsg1-1+rpi1 armhf Rust standard libraries
ii libllvm7:armhf 1:7.0.1-8+rpi1 armhf Modular compiler and toolchain technologies, runtime library
ii llvm-7 1:7.0.1-8+rpi1 armhf Modular compiler and toolchain technologie

Tags: buster llvm
Revision history for this message
peter green (plugwash) wrote :

I have run into another issue which I think may be related.

It seems that when building for armv6k llvm 7 defaults to assuming no FPU is available. Unfortunately it seems that rather than doing the sensible thing when an impossible combination of fpu and abi is specified and screaming and dying it silently compiles the code with the soft float ABI, but tags it with the hard float ABI.

Debian/Raspbian builds it's final llvm binaries using llvm (after building first stage binaries using g++), so it seems like there is probably an ABI breakage between the llvm libraries and the rest of the system, I can easily see how this could lead to mis-compilation.

The fix for that would seem to be to change the default FPU settings for armv6k builds, unfortunately I am having some difficulties there (which may well turn out to just be build hardware being flaky).

Revision history for this message
peter green (plugwash) wrote :

Specifically in this case I suspect the failure is happening when llvm calls into the standard C library to calculate the logarithm.

Revision history for this message
peter green (plugwash) wrote :

I think the following command should find most files on a system that were miscompiled in this way.

for FILE in `find /usr/lib/ -name '*.so'` ; do objdump -T $FILE | grep __aeabi_dadd | awk -v prefix="$FILE" '{print prefix $0}' ; done

Revision history for this message
peter green (plugwash) wrote :

to clarify my previous comment I mean files that were miscompiled directly due to llvm assuming there was no FPU and screwing up the ABI, not files that were miscompiled as a result of llvm being miscompiled and therefore buggy like the example at the beginning of this bug.

Revision history for this message
Matthijs Kooijman (matthijskooijman) wrote :

I just ran that command on a Raspbian Buster system, and the only hit is libLLVM-7 (apart from some errors about unrecognized file formats for .so files containing LD scripts). Interstingly enough other libLLVM versions do *not* show up (I have 6.0, 7 and 9 installed).

/usr/lib/arm-linux-gnueabihf/libLLVM-7.so00000000 DF *UND* 00000000 GCC_3.5 __aeabi_dadd

I think I understand what you're saying, but it's been too long since I worked with LLVM for me to really see a path forward here. If you need me to test anything else, let me know :-)

Revision history for this message
peter green (plugwash) wrote :

Yes, the "assuming the target has no FPU and then failing to follow the ABI" issue does seem to be specific to llvm 7. llvm 6 doesn't suffer from it and I have a llvm 8 build locally (which I may or may not upload) which also doesn't suffer from it.

rust seems to be overriding the defaults from llvm, so it's only subject to secondary miscompilations, not the primary one.

I'm hoping I can fix this issue llvm 7, failing that it would be possible to flip the default llvm to a different version, but i'm not sure if the risks of that justify the benefits.

Revision history for this message
peter green (plugwash) wrote :

Cross-linking the (closed as invalid) rustc bug for reference.

https://github.com/rust-lang/rust/issues/60019

Revision history for this message
peter green (plugwash) wrote :

Update.

The difficulties I mentioned earlier, indeed appeared to be buildbox flakiness, I was able to successfully get past them building on a different box.

I have confirmed that my change fixes both the primary and secondary miscompilations. I just need to get a clean build before I can upload, unfortunately building llvm-toolchain-7 takes ages, so I won't be able to do that tonight.

Revision history for this message
Matthijs Kooijman (matthijskooijman) wrote :

Nice! I'm in no particular hurry (my current workaround of using libllvm7 from Debian seems to hold up well) and won't have time for any testing anyway until monday anyway.

Pander (pander)
tags: added: buster llvm
Revision history for this message
peter green (plugwash) wrote :

This should now be fixed.

Changed in raspbian:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.