lerc: FTBFS with gcc-14

Bug #2078812 reported by Alessandro Astone
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
gcc
In Progress
Medium
gcc-14 (Ubuntu)
In Progress
Undecided
Alessandro Astone
lerc (Ubuntu)
Triaged
Undecided
Alessandro Astone

Bug Description

From the test rebuild, lerc failed to rebuild on ppc64el: https://launchpadlibrarian.net/745205771/buildlog_ubuntu-oracular-ppc64el.lerc_4.0.0+ds-4ubuntu2_BUILDING.txt.gz

gcc-14 has introduced a regression which causes it to crash when compiling valid code: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116460

Revision history for this message
In , Alessandro Astone (aleasto) wrote :

When recompiling https://github.com/esri/lerc on Ubuntu 24.10 with gcc-14 we hit this LTO error:

during GIMPLE pass: forwprop
Lerc.cpp: In function 'EncodeInternal.constprop':
Lerc.cpp:606:9: internal compiler error: in build2, at tree.cc:5097
  606 | ErrCode Lerc::EncodeInternal(const T* pData, int version, int nDepth, int nCols, int nRows, int nBands,
      | ^
0x1074e99f internal_error(char const*, ...)
 ???:0
0x1074ead7 fancy_abort(char const*, int, char const*)
 ???:0
0x10887107 build2(tree_code, tree_node*, tree_node*, tree_node*)
 ???:0
0x10c7a91f gimple_assign_rhs_to_tree(gimple*)
 ???:0
0x10b97bd3 insert_debug_temp_for_var_def(gimple_stmt_iterator*, tree_node*)
 ???:0
0x1098a083 insert_debug_temps_for_defs(gimple_stmt_iterator*)
 ???:0
0x1098979f gsi_remove(gimple_stmt_iterator*, bool)
 ???:0
0x10a793eb simple_dce_from_worklist(bitmap_head*, bitmap_head*)
 ???:0

The command-line options required to trigger the error is: `-g -O3 -flto`. All three must be enabled.

Steps to reproduce:
```
wget -qO- https://github.com/Esri/lerc/archive/refs/tags/v4.0.0.tar.gz | tar xvz
cd lerc-4.0.0/src/LercLib
g++ -shared -fPIC -g -O3 -flto *.cpp Lerc1Decode/*.cpp
```

I could reproduce with the gcc-14 packages from Ubuntu 24.10 and Fedora 41.
I couldn't reproduce on any architecture but ppc64le.
It can be reproduced both on a ppc64le machine (qemu-user is good enough), or by cross-compiling to ppc64le.

Attaching intermediate files.

Revision history for this message
In , Alessandro Astone (aleasto) wrote :

Cannot attach intermediate files because the size is too large.
Sorry that I don't have a minimal reproducer :/

Revision history for this message
In , Pinskia (pinskia) wrote :

Can you try adding -fchecking and seeing if that fails differently?

Revision history for this message
In , Alessandro Astone (aleasto) wrote :

(In reply to Andrew Pinski from comment #2)
> Can you try adding -fchecking and seeing if that fails differently?

No, it fails in the same way.

BTW i realize i didn't specify this other than mentioning 'lto': it fails at linking time.

Revision history for this message
In , Pinskia (pinskia) wrote :

(In reply to Alessandro Astone from comment #1)
> Cannot attach intermediate files because the size is too large.
> Sorry that I don't have a minimal reproducer :/

Is there a place where you could upload the intermediate files to?

Revision history for this message
In , Alessandro Astone (aleasto) wrote :

Here's the preprocessed files: https://aleasto.fedorapeople.org/ii.tar.gz

Let me know if you'd like other intermediate outputs.

Revision history for this message
In , Pinskia (pinskia) wrote :

So the good news is I am able to reproduce it on the trunk (on cfarm29 which is powerpc64le):
```
pinskia@cfarm29:~/src/t$ ~/ugcc/bin/g++ -shared -fPIC -g -O3 -flto *.ii
lto-wrapper: warning: using serial compilation of 11 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more information
during GIMPLE pass: forwprop
../../../../tmp/lerc/src/LercLib/Lerc.cpp: In function ‘EncodeInternal.constprop’:
../../../../tmp/lerc/src/LercLib/Lerc.cpp:606:9: internal compiler error: in build2, at tree.cc:5103
0x120f7dff internal_error(char const*, ...)
        /home/pinskia/src/gcc/gcc/diagnostic-global-context.cc:491
0x102ebbdf fancy_abort(char const*, int, char const*)
        /home/pinskia/src/gcc/gcc/diagnostic.cc:1772
0x110963bb build2(tree_code, tree_node*, tree_node*, tree_node*)
        /home/pinskia/src/gcc/gcc/tree.cc:5103
0x103ef03b gimple_assign_rhs_to_tree(gimple*)
        /home/pinskia/src/gcc/gcc/cfgexpand.cc:116
0x10f69763 insert_debug_temp_for_var_def(gimple_stmt_iterator*, tree_node*)
        /home/pinskia/src/gcc/gcc/tree-ssa.cc:408
0x10f69c67 insert_debug_temps_for_defs(gimple_stmt_iterator*)
        /home/pinskia/src/gcc/gcc/tree-ssa.cc:506
0x1069cd6b gsi_remove(gimple_stmt_iterator*, bool)
        /home/pinskia/src/gcc/gcc/gimple-iterator.cc:568
0x10db4d63 simple_dce_from_worklist(bitmap_head*, bitmap_head*)
        /home/pinskia/src/gcc/gcc/tree-ssa-dce.cc:2263
0x10dd68ff execute
        /home/pinskia/src/gcc/gcc/tree-ssa-forwprop.cc:4154
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: /home/pinskia/ugcc/bin/g++ returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

```

Revision history for this message
In , Pinskia (pinskia) wrote :

Created attachment 58988
non-reduced preprocessed source

This is just the needed 2 preprocessed source to reproduce the failure:
pinskia@cfarm29:~/src/t$ /home/pinskia/ugcc/bin/g++ -Wfatal-errors -shared -fPIC -g -O3 -flto a-Lerc.ii a-Lerc2.ii
lto-wrapper: warning: using serial compilation of 9 LTRANS jobs
lto-wrapper: note: see the ‘-flto’ option documentation for more information
during GIMPLE pass: forwprop
../../../../tmp/lerc/src/LercLib/Lerc.cpp: In function ‘EncodeInternal.constprop’:
../../../../tmp/lerc/src/LercLib/Lerc.cpp:606:9: internal compiler error: in build2, at tree.cc:5103

Revision history for this message
In , Pinskia (pinskia) wrote :

./a.ltrans6.ltrans.212t.forwprop4

Removing dead stmt noDataCandVec$_M_start_888 = PHI <_1783(176), _577(186)>
...
Removing dead stmt:_598 = _888 + 16;

So it looks like we remove the statement defining _888 and then removing the use.
The removal of _888 happens directly from forwprop while _598 definition removal comes from simple_dce_from_worklist .

The ICE happens because the ssa name _888 has already been freed so the type is null (and not in this case a pointer) since this was originally a pointer plus.

Trying to reduce this further.

Revision history for this message
In , Pinskia (pinskia) wrote :

Created attachment 58990
single file testcase

Reduced down to single file testcase; `-g -O3` is enough, no LTO needed now.

Revision history for this message
In , Pinskia (pinskia) wrote :

Created attachment 58992
much more reduced testcase

Now this testcase also ICEs on x86_64-linux-gnu with `-O3 -g`.

Revision history for this message
In , Pinskia (pinskia) wrote :

Created attachment 58993
Further reduced

This is as far as reduced as I am going to get it tonight.

This ICEs at `-O3 -g` on both powerpc* and x86_64/i?86 and aarch64.

Revision history for this message
In , Pinskia (pinskia) wrote :

Confirmed.

Revision history for this message
In , Pinskia (pinskia) wrote :

(In reply to Andrew Pinski from comment #8)
> ./a.ltrans6.ltrans.212t.forwprop4
>
> Removing dead stmt noDataCandVec$_M_start_888 = PHI <_1783(176), _577(186)>
> ...
> Removing dead stmt:_598 = _888 + 16;
>
> So it looks like we remove the statement defining _888 and then removing the
> use.
> The removal of _888 happens directly from forwprop while _598 definition
> removal comes from simple_dce_from_worklist .
>
> The ICE happens because the ssa name _888 has already been freed so the type
> is null (and not in this case a pointer) since this was originally a pointer
> plus.
>
> Trying to reduce this further.

_888 definition is from a BB which is going to be removed so we should not need to mark its uses as being needed for dce worklist. But I am not sure how to detect that case.

Revision history for this message
In , Rguenth (rguenth) wrote :
Download full text (5.9 KiB)

(In reply to Andrew Pinski from comment #13)
> (In reply to Andrew Pinski from comment #8)
> > ./a.ltrans6.ltrans.212t.forwprop4
> >
> > Removing dead stmt noDataCandVec$_M_start_888 = PHI <_1783(176), _577(186)>
> > ...
> > Removing dead stmt:_598 = _888 + 16;
> >
> > So it looks like we remove the statement defining _888 and then removing the
> > use.
> > The removal of _888 happens directly from forwprop while _598 definition
> > removal comes from simple_dce_from_worklist .
> >
> > The ICE happens because the ssa name _888 has already been freed so the type
> > is null (and not in this case a pointer) since this was originally a pointer
> > plus.
> >
> > Trying to reduce this further.
>
> _888 definition is from a BB which is going to be removed so we should not
> need to mark its uses as being needed for dce worklist. But I am not sure
> how to detect that case.

forwprop shouldn't remove _888 if there's a use left. When adding
simple_dce_from_worklist, did you remove some manual stmt removal
(adding to to_remove)? Having both is a bit ugly (see also
remove_prop_source_from_use), but the sets need to be separate to
avoid interactions like this.

Ah, we have

          /* Substitute from our lattice. We need to do so only once. */
          bool substituted_p = false;
          use_operand_p usep;
          ssa_op_iter iter;
          FOR_EACH_SSA_USE_OPERAND (usep, stmt, iter, SSA_OP_USE)
            {
              tree use = USE_FROM_PTR (usep);
              tree val = fwprop_ssa_val (use);
              if (val && val != use)
                {
                  if (!is_gimple_debug (stmt))
                    bitmap_set_bit (simple_dce_worklist, SSA_NAME_VERSION (use));

I think that's problematic iff 'use' as definition is already in to_remove.

Note to_remove also removes in "proper" order for debug stmt generation
while simple_dce_from_worklist doesn't iff the DCEd bits are not
independent.

OK, so the issue is that when we add

noDataCandVec$_M_start_3 = PHI <_158(9), _51(12)>

to to_remove the uses are

noDataCandVec$_M_start_3 : -->5 uses.
# DEBUG __i$_M_current => noDataCandVec$_M_start_3 + 1
_56 = noDataCandVec$_M_finish_75 - noDataCandVec$_M_start_3;
ivtmp.163_15 = (unsigned long) noDataCandVec$_M_start_3;
_108 = noDataCandVec$_M_start_3 + 15;
# DEBUG __first$_M_current => noDataCandVec$_M_start_3
_57 = noDataCandVec$_M_start_3 + 16;
# DEBUG __first => noDataCandVec$_M_start_3
if (noDataCandVec$_M_start_3 != noDataCandVec$_M_finish_75)
# DEBUG __first => noDataCandVec$_M_start_3
# DEBUG __first => noDataCandVec$_M_start_3
# DEBUG noDataCandVec$_M_start => noDataCandVec$_M_start_3

and _3 is in BB 13 while the use in _57 = noDataCandVec$_M_start_3 + 16 is
in BB 15 and we are not processing this block as it is compute dead.
So the uses in dead stmts are not replaced which is what confuses
simple_dce_from_worklist.

Note it's _57 that ends up on the worklist somehow even though it's def is in
a dead BB(!?). That's from

              auto_vec<tree, 8> uses;
              FOR_EACH_SSA_USE_OPERAND (usep, stmt, iter, SSA_OP_USE)
                if (uses.space (1))
                  uses.quick_push (USE_FROM_PTR (usep))...

Read more...

Revision history for this message
In , Rguenth (rguenth) wrote :

Ugh - simplify_gimple_switch_label_vec :/

Revision history for this message
In , Pinskia (pinskia) wrote :

(In reply to Richard Biener from comment #14)
> (In reply to Andrew Pinski from comment #13)
> > (In reply to Andrew Pinski from comment #8)
> > > ./a.ltrans6.ltrans.212t.forwprop4
> > >
> > > Removing dead stmt noDataCandVec$_M_start_888 = PHI <_1783(176), _577(186)>
> > > ...
> > > Removing dead stmt:_598 = _888 + 16;
> > >
> > > So it looks like we remove the statement defining _888 and then removing the
> > > use.
> > > The removal of _888 happens directly from forwprop while _598 definition
> > > removal comes from simple_dce_from_worklist .
> > >
> > > The ICE happens because the ssa name _888 has already been freed so the type
> > > is null (and not in this case a pointer) since this was originally a pointer
> > > plus.
> > >
> > > Trying to reduce this further.
> >
> > _888 definition is from a BB which is going to be removed so we should not
> > need to mark its uses as being needed for dce worklist. But I am not sure
> > how to detect that case.
>
> forwprop shouldn't remove _888 if there's a use left. When adding
> simple_dce_from_worklist, did you remove some manual stmt removal
> (adding to to_remove)? Having both is a bit ugly (see also
> remove_prop_source_from_use), but the sets need to be separate to
> avoid interactions like this.

Richard, you are the one who added simple_dce_from_worklist to forwprop in the end; I had tried originally by not do the manual one but ran into regressions so I didn't submit it.

Revision history for this message
In , Rguenth (rguenth) wrote :

(In reply to Andrew Pinski from comment #16)
> (In reply to Richard Biener from comment #14)
> > (In reply to Andrew Pinski from comment #13)
> > > (In reply to Andrew Pinski from comment #8)
> > > > ./a.ltrans6.ltrans.212t.forwprop4
> > > >
> > > > Removing dead stmt noDataCandVec$_M_start_888 = PHI <_1783(176), _577(186)>
> > > > ...
> > > > Removing dead stmt:_598 = _888 + 16;
> > > >
> > > > So it looks like we remove the statement defining _888 and then removing the
> > > > use.
> > > > The removal of _888 happens directly from forwprop while _598 definition
> > > > removal comes from simple_dce_from_worklist .
> > > >
> > > > The ICE happens because the ssa name _888 has already been freed so the type
> > > > is null (and not in this case a pointer) since this was originally a pointer
> > > > plus.
> > > >
> > > > Trying to reduce this further.
> > >
> > > _888 definition is from a BB which is going to be removed so we should not
> > > need to mark its uses as being needed for dce worklist. But I am not sure
> > > how to detect that case.
> >
> > forwprop shouldn't remove _888 if there's a use left. When adding
> > simple_dce_from_worklist, did you remove some manual stmt removal
> > (adding to to_remove)? Having both is a bit ugly (see also
> > remove_prop_source_from_use), but the sets need to be separate to
> > avoid interactions like this.
>
> Richard, you are the one who added simple_dce_from_worklist to forwprop in
> the end; I had tried originally by not do the manual one but ran into
> regressions so I didn't submit it.

Did I ... ah you only effectively amended it (r15-2981-g3ae8794665ee7c)
and fixed it (r15-2691-g33baa20c5cdcf5).

So I'll take this bug and will try to unsort the mess ...

Revision history for this message
In , Rguenth (rguenth) wrote :

Note the real issue is that debug temp insertion cannot handle the situation
at all where we remove stmts in the wrong order, thus removing a stmt
which has uses that are released.

Revision history for this message
In , Rguenth (rguenth) wrote :

Note we do have ad-hoc code in insert_debug_temp_for_var_def dealing with such
situations, but it doesn't trigger here because there is dominator info
available used as indicator that SSA form is up-to-date:

      if (!dom_info_available_p (CDI_DOMINATORS))
        {
          struct walk_stmt_info wi;

          memset (&wi, 0, sizeof (wi));

          /* When removing blocks without following reverse dominance
             order, we may sometimes encounter SSA_NAMEs that have
             already been released, referenced in other SSA_DEFs that
             we're about to release. Consider:

             <bb X>:
             v_1 = foo;

             <bb Y>:
             w_2 = v_1 + bar;
             # DEBUG w => w_2

             If we deleted BB X first, propagating the value of w_2
             won't do us any good. It's too late to recover their
             original definition of v_1: when it was deleted, it was
             only referenced in other DEFs, it couldn't possibly know
             it should have been retained, and propagating every
             single DEF just in case it might have to be propagated
             into a DEBUG STMT would probably be too wasteful.

             When dominator information is not readily available, we
             check for and accept some loss of debug information. But
             if it is available, there's no excuse for us to remove
             blocks in the wrong order, so we don't even check for
             dead SSA NAMEs. SSA verification shall catch any
             errors. */
          if ((!gsi && !gimple_bb (def_stmt))
              || walk_gimple_op (def_stmt, find_released_ssa_name, &wi))
            no_value = true;

I'm quite sure we're getting this wrong more often, but only few build
workers ICE on error_mark TREE_TYPE.

Revision history for this message
In , Rguenth (rguenth) wrote :

Unfortunately the following doesn't reproduce the issue.

#include <vector>
#include <algorithm>

void g();

void f(int nBands, double maxZErr) {
  for (int iBand = 0; iBand < nBands; iBand++)
   {
    g();
    std::vector<signed char> noDataCandVec;
    std::vector<signed char> distCandVec = {0, 1, 10, 100, 5, 6};
    for (signed char dist : distCandVec)
      noDataCandVec.push_back(1);
    std::sort(noDataCandVec.begin(), noDataCandVec.end(),
              std::greater<double>());
  }
}

I'll add the preprocessed source.

Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

The master branch has been updated by Richard Biener <email address hidden>:

https://gcc.gnu.org/g:0ceeb9926d69dbb382622a8eae9eef7ed8ac3e97

commit r15-3191-g0ceeb9926d69dbb382622a8eae9eef7ed8ac3e97
Author: Richard Biener <email address hidden>
Date: Mon Aug 26 10:01:44 2024 +0200

    tree-optimization/116460 - improve forwprop compile-time

    The following improves forwprop block reachability which I noticed
    when debugging PR116460 and what is also noted in the comment. It
    avoids processing blocks in natural loops determined unreachable,
    thereby making the issue in PR116460 latent.

            PR tree-optimization/116460
            * tree-ssa-forwprop.cc (pass_forwprop::execute): Do not
            process blocks in unreachable natural loops.

Revision history for this message
In , Pinskia (pinskia) wrote :

(In reply to Richard Biener from comment #20)
> Unfortunately the following doesn't reproduce the issue.
>
> #include <vector>
> #include <algorithm>
>
> void g();
>
> void f(int nBands, double maxZErr) {
> for (int iBand = 0; iBand < nBands; iBand++)
> {
> g();
> std::vector<signed char> noDataCandVec;
> std::vector<signed char> distCandVec = {0, 1, 10, 100, 5, 6};
> for (signed char dist : distCandVec)
> noDataCandVec.push_back(1);
> std::sort(noDataCandVec.begin(), noDataCandVec.end(),
> std::greater<double>());
> }
> }
>
> I'll add the preprocessed source.

Yes it depends on some definitions not to be inlined, even the original code had a check on typeinfo which dependedent on the `operator==` not being inlined but that was able to change to the simple call of `g()` with no definition. I didn't try to figure out which functions needed to be inlined or not when I was reducing the code. This is also why it originally needed LTO to hit the ICE because the usage across different TUs caused slightly different inlining decisions and then as definitions were able to be removed; you could hit it without LTO.

Revision history for this message
In , Cvs-commit (cvs-commit) wrote :

The master branch has been updated by Richard Biener <email address hidden>:

https://gcc.gnu.org/g:172637cf0d9b7b2798f83b9c5f9598b449675cb0

commit r15-3210-g172637cf0d9b7b2798f83b9c5f9598b449675cb0
Author: Richard Biener <email address hidden>
Date: Mon Aug 26 13:50:00 2024 +0200

    tree-optimization/116460 - ICE with DCE in forwprop

    The following avoids removing stmts with defs that might still have
    uses in the IL before calling simple_dce_from_worklist which might
    remove those as that will wreck debug stmt generation. Instead first
    perform use-based DCE and then remove stmts which may have uses in
    code that CFG cleanup will remove. This requires tracking stmts
    in to_remove by their SSA def so we can check whether it was removed
    before without running into the issue that PHIs can be ggc_free()d
    upon removal. So this adds to_remove_defs in addition to to_remove
    which has to stay to track GIMPLE_NOPs we want to elide.

            PR tree-optimization/116460
            * tree-ssa-forwprop.cc (pass_forwprop::execute): First do
            simple_dce_from_worklist and then remove stmts in to_remove.
            Track defs to be removed in to_remove_defs.

            * g++.dg/torture/pr116460.C: New testcase.

Revision history for this message
In , Rguenth (rguenth) wrote :

Fixed on trunk sofar.

Changed in lerc (Ubuntu):
assignee: nobody → Alessandro Astone (aleasto)
Changed in gcc-14 (Ubuntu):
assignee: nobody → Alessandro Astone (aleasto)
Changed in lerc (Ubuntu):
status: New → Triaged
Changed in gcc-14 (Ubuntu):
status: New → In Progress
Revision history for this message
In , Alessandro Astone (aleasto) wrote :

Thanks!

What's the process for getting this to the stable branch?

Revision history for this message
Alessandro Astone (aleasto) wrote :

Matthias, I've backported the upstream fix here if you want to include it:
https://salsa.debian.org/aleasto/gcc/-/commit/3416b33ed98e749068f57d5fe805f2e1e85be1f3

Revision history for this message
In , Rguenther-suse (rguenther-suse) wrote :

On Tue, 3 Sep 2024, ales.astone at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116460
>
> --- Comment #25 from Alessandro Astone <ales.astone at gmail dot com> ---
> Thanks!
>
> What's the process for getting this to the stable branch?

It will be picked up after some soaking and when time permits.

Changed in gcc:
importance: Unknown → Medium
status: Unknown → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.