gcc 8 miscompiles scipy/optimize/minpack/qrsolv.f

Bug #1811798 reported by Graham Inggs on 2019-01-15
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gcc
Fix Released
Medium
gcc-8 (Ubuntu)
Undecided
Unassigned
imexam (Ubuntu)
Undecided
Unassigned
python-scipy (Debian)
Fix Released
Unknown
python-scipy (Ubuntu)
Undecided
Unassigned

Bug Description

Originally reported in https://bugs.debian.org/906198

The following simple test fails on s390x:

---------------------------8<------------------------------
import numpy as np
from scipy.optimize import leastsq

y = np.array([0., 1., 1., 2., 1., 1., 0.])
x = np.arange(len(y))

def func(par):
    return par[2] * np.exp(-(x - par[0])**2/par[1]) - y

print(leastsq(func, [0,1,0]))
---------------------------8<------------------------------

The expected result is

(array([3. , 4.42280548, 1.67210345]), 1)

while on S390x, one gets

(array([0. , 1. , 0.34027645]), 3)

Further information from https://bugs.debian.org/915738

The bug seems to be a miscompilation of scipy/optimize/minpack/qrsolv.f
with -funroll-loops. Removing -funroll-loops is enough to get the
leastsq testcase to pass.

Changed in python-scipy (Debian):
status: Unknown → New

seen when build scipy on s390x with the current GCC 8 branch. According to the Debian report [1] this is tracked down to miscompilation of one Fortran file with -funroll-loops. The Ubuntu report [2] has a "standalone" python test case.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=915738
[2] https://bugs.launchpad.net/ubuntu/+source/python-scipy/+bug/1811798

affects: scipy → gcc
Changed in gcc:
importance: Unknown → Medium
status: Unknown → New
In , Rguenth (rguenth) wrote :

I guess we need something better than a python testcase.

Yeah, like that problematic source, all gfortran options used to compile that, any needed modules too + stubbed whatever it calls and whatever is needed in MAIN__ or main to reproduce, ideally with minimal dependencies.
If you know the exact problematic routine, see in the debugger how many times it is called and in which invocation of the routine it misbehaves and try to capture on which arguments it is called. If you don't know the exact problematic routine, one can e.g. play with assembly bisection between -funroll-loops and no -funroll-loops, rename .L* labels in one of them so that it can be merged by hand more easily. I think gfortran doesn't have optimize (0) attribute yet.

In , Krebbel (krebbel) wrote :

I've tried building scipy 1.1.0 from github on a Fedora installation. The build already uses -funroll-loops. But I couldn't reproduce the problem with the resulting binary.

gcc version 8.0.1 20180324

Aurelien already tracked it down to a miscompilation of
scipy/optimize/minpack/qrsolv.f

This source file appears to contain just a single function (qrsolv) which is not too big. I think I can work with that after being able to reproduce the problem.

As Jakub mentioned the exact compiler cmdline would be good.

Changed in gcc:
status: New → Incomplete
In , Krebbel (krebbel) wrote :

I'm able to reproduce the problem now and will try to have a look.

Changed in gcc:
status: Incomplete → In Progress
In , Krebbel (krebbel) wrote :

Created attachment 45586
qrsolv-reduc.f the miscompiled fortran file autoreduced from scipy

In , Krebbel (krebbel) wrote :

Created attachment 45587
A C wrapper to call the qrsolv function in the fortran snippet

In , Krebbel (krebbel) wrote :

gfortran -O3 -march=zEC12 -funroll-loops -fpie qrsolv-reduc.f -c
gcc qrsolv-caller.c -c
gcc qrsolv-caller.o qrsolv-reduc.o -o t

r265191
./t
1.359429

r265193
./t
0.000000

In , Krebbel (krebbel) wrote :
Download full text (6.7 KiB)

The r265193 patch was found via reghunt. However, it just reveals an underlying issue.

The problem can also be seen with mainline.

The miscompile happens in the following loop:
      do 110 j = 1, n
         if (sdiag(j) .eq. zero .and. nsing .eq. n) nsing = j - 1
         if (nsing .lt. n) wa(j) = zero
  110 continue

The problem appears to be rather related to ifcvt. ifcvt generates a load on condition for the sdiag(j) .eq. zero comparison by inserting insns: 2480, 2481, 2482:

265.ce2

(insn 915 918 916 88 (set (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ])
        (mem:DF (reg/v/f:DI 239 [ sdiag ]) [2 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B]+0 S8 A64])) "min.qrsolv.f":51 1289 {*movdf_64dfp}
     (nil))
(insn 916 915 2480 88 (set (reg:CCZ 33 %cc)
        (compare:CCZ (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ])
            (const_double:DF 0.0 [0x0.0p+0]))) "min.qrsolv.f":51 1255 {*cmpdf_ccs}
     (expr_list:REG_DEAD (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ])
        (nil)))
(insn 2480 916 2481 88 (set (reg:SI 733)
        (const_int 0 [0])) 1274 {*movsi_zarch}
     (nil))
(insn 2481 2480 2482 88 (set (reg:CCZ 33 %cc)
        (compare:CCZ (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ])
            (const_double:DF 0.0 [0x0.0p+0]))) 1255 {*cmpdf_ccs}
     (nil))
(insn 2482 2481 927 88 (set (reg/v:SI 109 [ nsing ])
        (if_then_else:SI (ne (reg:CCZ 33 %cc)
                (const_int 0 [0]))
            (reg/v:SI 109 [ nsing ])
            (reg:SI 733))) 1676 {*movsicc}
     (nil))
(note 927 2482 928 88 NOTE_INSN_DELETED)
(jump_insn 928 927 932 88 (parallel [
            (set (pc)
                (if_then_else (le (reg:SI 320 [ _444 ])
                        (reg/v:SI 109 [ nsing ]))
                    (label_ref:DI 943)
                    (pc)))
            (clobber (reg:CC 33 %cc))
        ]) "min.qrsolv.f":52 1260 {*cmp_and_br_signed_si}
     (expr_list:REG_UNUSED (reg:CC 33 %cc)
        (int_list:REG_BR_PROB 536870916 (nil)))
 -> 943)

In the backend we have that interesting splitter which triggers for the old and now obsolete compare in insn 916

(define_split
  [(set (match_operand 0 "cc_reg_operand")
 (compare (match_operand:FP 1 "register_operand")
   (match_operand:FP 2 "const0_operand")))]
  "TARGET_HARD_FLOAT && REG_P (operands[1]) && dead_or_set_p (insn, operands[1])"
  [(parallel
    [(set (match_dup 0) (match_dup 3))
     (clobber (match_dup 1))])]
 {
   /* s390_match_ccmode requires the compare to have the same CC mode
      as the CC destination register. */
   operands[3] = gen_rtx_COMPARE (GET_MODE (operands[0]),
      operands[1], operands[2]);
 })

268.split1 insn 916 -> insn 2677
The REG_DEAD note becomes a clobber due to that

(insn 915 918 2677 105 (set (reg:DF 590 [ MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B] ])
        (mem:DF (reg/v/f:DI 239 [ sdiag ]) [2 MEM[base: sdiag_143(D), index: ivtmp.67_240, offset: 0B]+0 S8 A64])) "min.qrsolv.f":51 1289 {*movdf_64dfp}
     (nil))
(insn 2677 915 2480 105 (parallel [
            (set (reg:CCZ 33 %cc)
                (compa...

Read more...

Created attachment 45588
experimental patch

That patch appears to fix the problem for me.

I admit I have just a vague recollection of this, but I thought since df has been added, usually if a pass wants REG_DEAD notes, it needs to df_note_add_problem () and df_analyze should rebuild the REG_DEAD/REG_UNUSED notes.

So, to me this looks like a backend bug, using dead_or_set_p in a splitter when the split passes don't really compute the note problem. Seems s390 is the only backend that does this, other backends use dead_or_set_p either only in peephole2s (which is fine, peephole2 pass starts with
  df_set_flags (DF_LR_RUN_DCE);
  df_note_add_problem ();
  df_analyze ();
and even when many targets don't use df_or_set_p, they do use peep2_dead*), or (cris) in delayed branch scheduling (I believe that doesn't guarantee that either). Can't what you are doing in the splitters be done in define_peephole2 instead?

Segher on IRC says that removing REG_DEAD notes that aren't valid is the right thing, so paging others what they think.

> Segher on IRC says that removing REG_DEAD notes that aren't valid is the
> right thing, so paging others what they think.

Definitely not, passes are not required to maintain REG_DEAD/REG_UNUSED notes, it's the exclusive job of DF and we're better not open this can of worms.

(In reply to Jakub Jelinek from comment #11)
> ... Can't what you are doing in the splitters be done in
> define_peephole2 instead?

Not that easy unfortunately. peephole2 will run after reload. So the FP constant ok 0.0 will already be reloaded into a register first or pushed into literal pool. The point of doing the transformation is to avoid this.

(In reply to Andreas Krebbel from comment #14)
> (In reply to Jakub Jelinek from comment #11)
> > ... Can't what you are doing in the splitters be done in
> > define_peephole2 instead?
>
> Not that easy unfortunately. peephole2 will run after reload. So the FP
> constant ok 0.0 will already be reloaded into a register first or pushed
> into literal pool. The point of doing the transformation is to avoid this.

But it is still invalid. If those splitters are essentially to you and worth slowing the compiler on s390*, then you could e.g. add a custom target pass right before split1, where you'd just df_note_addr_problem (); df_analyze (); and thus ensured that during the splitting you could use the REG_DEAD/REG_UNUSED notes safely. In the splitters you'd likely need to use current_pass to verify you are in the pass for which you've done this.

I'll commit a patch which just removes the splitter for now. I'll try to come up with a nicer testcase.

(In reply to Andreas Krebbel from comment #16)
> I'll commit a patch which just removes the splitter for now. I'll try to
> come up with a nicer testcase.

All 3 s390 splitters that do this?

Author: krebbel
Date: Tue Feb 5 17:14:11 2019
New Revision: 268550

URL: https://gcc.gnu.org/viewcvs?rev=268550&root=gcc&view=rev
Log:
S/390: Remove load and test fp splitter

gcc/ChangeLog:

2019-02-05 Andreas Krebbel <email address hidden>

 PR target/88856
 * config/s390/s390.md: Remove load and test FP splitter.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/s390/s390.md

Author: krebbel
Date: Tue Feb 5 17:17:00 2019
New Revision: 268551

URL: https://gcc.gnu.org/viewcvs?rev=268551&root=gcc&view=rev
Log:
S/390: Remove load and test fp splitter

gcc/ChangeLog:

2019-02-05 Andreas Krebbel <email address hidden>

 Backport from mainline
 2019-02-05 Andreas Krebbel <email address hidden>

 PR target/88856
 * config/s390/s390.md: Remove load and test FP splitter.

Modified:
    branches/gcc-8-branch/gcc/ChangeLog
    branches/gcc-8-branch/gcc/config/s390/s390.md

Author: krebbel
Date: Tue Feb 5 17:19:26 2019
New Revision: 268552

URL: https://gcc.gnu.org/viewcvs?rev=268552&root=gcc&view=rev
Log:
S/390: Remove load and test fp splitter

gcc/ChangeLog:

2019-02-05 Andreas Krebbel <email address hidden>

 Backport from mainline
 2019-02-05 Andreas Krebbel <email address hidden>

 PR target/88856
 * config/s390/s390.md: Remove load and test FP splitter.

Modified:
    branches/gcc-7-branch/gcc/ChangeLog
    branches/gcc-7-branch/gcc/config/s390/s390.md

(In reply to Jakub Jelinek from comment #17)
> (In reply to Andreas Krebbel from comment #16)
> > I'll commit a patch which just removes the splitter for now. I'll try to
> > come up with a nicer testcase.
>
> All 3 s390 splitters that do this?

I've only removed the load and test splitter for now. The other two are only used for access register setters. There is only that one user in Glibc and we have it that way since the very beginning. I will revisit these for GCC >9 but would rather leave them in for now.

What we could do there is remove the first of those two splitters, remove the
&& !dead_or_set_p (insn, operands[1])
test from the second, and add peephole2 that would transform
   (set (access part 1) (subreg:SI (match_dup 1) low))
   (set (match_dup 1) (rotate:DI (match_dup 1) (const_int 32)))
   (set (access part 2) (subreg:SI (match_dup 1) low))
with a lshiftrt instead of rotate if reg 1 is dead after the third insn (assuming rotate is more expensive as right shift, if it is the same expensive/same size, then having the two splitters makes no sense). The last rotate should have been removed by DCE already if it was truly dead (though, of course, if it for some reason isn't yet, you could have another peephole2 for that too).

That said, the regression is fixed now.

Changed in gcc:
status: In Progress → Fix Released
Changed in python-scipy (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.