LibreOffice needs to be rebuild with gcc#58800 fixed.

Bug #1261669 reported by Björn Michaelsen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
LibreOffice
Won't Fix
Medium
gcc-defaults
Fix Released
High
gcc-defaults (Ubuntu)
Fix Released
Undecided
Unassigned
libreoffice (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Upstream bug report:
"When I use median function in a spreadsheet it crashes randomly when I type a value into the cell which is in the range belongs to the function. Sometimes it crashes sometimes not. It is unable to reproduce the error. But the error happens only when I use median. If I delete median funciton, no error."

caused by http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58800

Operating System: Ubuntu 13.10
Version: 4.1.2.3 release

Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

This program segfaults on gcc 4.8.1 from Ubuntu 13.10. I'm building a copy of the 4.8 branch to see if I can reproduce it there.

This is on x86_64. I used a gross hack to specify the input so I don't have to think about precision issues. The numbers are all well-behaved values near 0.9.

I'll attach preprocessed source. The preprocessed source crashes even if built with gcc 4.7.something.

#include <algorithm>
#include <stdint.h>
//#include <iostream>

double to_double(uint64_t x)
{
        union {double d; uint64_t x;} u;
        u.x = x;
        return u.d;
}

int main()
{
        std::vector<double> v = {
                to_double(4606672070890243784),
                to_double(4606672025854247510),
                to_double(4606671800674266141),
                to_double(4606671575494284772),
                to_double(4606672115926240057),
                to_double(4606672160962236330),
                to_double(4606672070890243784),
        };

// for (auto i : v)
// std::cout << i << std::endl;

        std::nth_element(v.begin(), v.begin() + 3, v.end());
        return 0;
}

Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

Created attachment 31047
preprocessed source

This was generated with g++ -std=gnu++11 -E sort.cc on Ubuntu 13.10.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Please figure out a testcase involving plain integers.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Unfortunately the issue seems real. I can reproduce it with:

        std::vector<uint64_t> v = {
                4606672070890243784,
                4606672025854247510,
                4606671800674266141,
                4606671575494284772,
                4606672115926240057,
                4606672160962236330,
                4606672070890243784,
        };

Chris?

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

I can also reproduce with something this simple:

        std::vector<int> v = {
                207089,
                202585,
                180067,
                157549,
                211592,
                216096,
                207089
        };

-D_GLIBCXX_DEBUG reveals that we are dereferencing a past-the-end iterator. We badly need a quick fix. Say 2-3 day max or we have to revert the fix for 58437.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Don't tell me is just this:

Index: include/bits/stl_algo.h
===================================================================
--- include/bits/stl_algo.h (revision 203835)
+++ include/bits/stl_algo.h (working copy)
@@ -1917,7 +1917,7 @@
     _RandomAccessIterator __last, _Compare __comp)
     {
       _RandomAccessIterator __mid = __first + (__last - __first) / 2;
- std::__move_median_to_first(__first, __first + 1, __mid, (__last - 2),
+ std::__move_median_to_first(__first, __first + 1, __mid, (__last - 1),
       __comp);
       return std::__unguarded_partition(__first + 1, __last, __first, __comp);
     }

??

Andy, while waiting for Chris' feedback, you may want to test the above on your real applications. In 4.8.x some nth_algorithm details are different (the above is versus mainline), but you can quickly find a couple of (__last - 2) which I'm asking you to change to (__last - 1)

Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

I changed two instances of __last - 2 to __last - 1. I'm running against r203836 on gcc-4_8-branch.

The thing that caught it was an experiment, not part of my test suite, so this is a little tricky. It already survived longer than the unpatched version did. The crash wasn't the same segfault, though -- it was a much later crash due to memory corruption. I'll let it run for a while under valgrind to see what, if anything, pops up.

Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

FWIW, replacing that list with a random permutation seems to crash about 5% of the time.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Thanks Andy. We really want to fix this as soon as possible. In the meanwhile in my tests the tweak passed the testsuite, without regressing on the performance of std::sort (libstdc++/58437). Thus apparently we are on the right track.

Revision history for this message
In , Andy Lutomirski (luto-mit) wrote :

This code has survived my smoke test so far, which is a good sign. If it was causing some kind of bad problem, I'd expect something to have exploded by now.

Revision history for this message
In , Glisse-6 (glisse-6) wrote :

{0, 1, 2, 0} should be enough.

__unguarded_partition only stops increasing __first when *__first is not smaller than the pivot. If all elements are smaller than the pivot, boom. And when we have fewer than 5 elements (the check in __introselect is only for > 3), the pivot selection code doesn't guarantee that (some of the 3 pointers used to compute the median are the same, and with Paolo's change they become distinct).

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Hi Marc. I'm coming to the conclusion that the '- 2' in the patch sent by Chris was in any case a typo. Now, sorry I didn't investigate the issue further, I'm not sure to understand if you think changing it to '- 1' is enough or not: it certainly passes or my tests so far ({0, 1, 2, 0} included)

Revision history for this message
In , Christopher Jefferson (azumanga) wrote :

Yes, I didn't trace all the members which call __unguarded_partition_pivot.

The better (in my opinion) fix is to change __introselect from:

__last - __first > 3

to:

__last - __first > int(_S_threshold)

Like the other call the __unguarded_partition_pivot.

I am currently testing that change.

The 'last - 2' was on purpose (although, it is causing the problem!), as 'last - 1' is the the last element of the array which we want to avoid (think of it as 'last - 1 - 1', to go with 'first + 1'. Obviously we couldn't de-ref 'last - 1'.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Ok, note that I was reviewing the description you originally sent to the mailing list and it never talked about the - 2...

Let's figure out something safe, please.

Revision history for this message
In , Glisse-6 (glisse-6) wrote :

(In reply to Chris Jefferson from comment #12)
> The better (in my opinion) fix is to change __introselect from:
>
> __last - __first > 3
>
> to:
>
> __last - __first > int(_S_threshold)
>
> Like the other call the __unguarded_partition_pivot.

Note that the optimal threshold is probably not the same for sort and for select (which drops to full sorting below that threshold!).

Revision history for this message
In , Christopher Jefferson (azumanga) wrote :

In my last comment, "Obviously we couldn't de-ref 'last - 1'" should have been "Obviously we couldn't de-ref 'last'"

Revision history for this message
In , Christopher Jefferson (azumanga) wrote :

Created attachment 31049
random test

In order to catch any other issues, and stop this kind of thing happening again, I want to quickly and massively expand the algorithms testsuite.

The attached test tests a whole set of random nth_element cases (it would catch the problem we are currently having).

The problem is on my computer, unoptimised, this test takes 30 seconds, and I want to add a similar test to all the sorting related algorithms. Is there a standard way of submitting tests that take a lot of time and memory?

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

At the moment there isn't, but some tests self adjust to run less on simulators. For the time being I think we could certainly add a few (not 10 or 20) of the tests tou are talking about, but remember all with the bits for the simulators. We could also easily add some tests to the performance testsuite (there are no limits there and isn't run by default) which, outside the timing loop also check that the computation is correct (well, we should probably allways do that)

Revision history for this message
In , Christopher Jefferson (azumanga) wrote :

(In reply to Paolo Carlini from comment #13)
> Ok, note that I was reviewing the description you originally sent to the
> mailing list and it never talked about the - 2...
>
> Let's figure out something safe, please.

I miswrote in my final paragraph:

...first+1, mid, last-1 (there is no reason in this bug to consider
last-1 instead of last, but I thought it wouldn't do any harm, and
might avoid other issues of accidentally choosing a bad pivot.

Should have said:

...first+1, mid, last-2 (there is no reason in this bug to consider *last-2* instead of *last-1*, but I thought it wouldn't do any harm, and might avoid other issues of accidentally choosing a bad pivot.

So, the original code chose last-1, I moved that to last-2, not checking all the places that called the code enough (although, I could also have broken it with the first+1 if the ranges had been small enough).

Changing 'last-2' to 'last-1' might well be the easiest, smallest fix, if it does not effect the performance testsuite. We can then, once we have a much more comphrehensive testsuite, make another pass at performance improvements and tweaks.

Revision history for this message
In , Christopher Jefferson (azumanga) wrote :

Created attachment 31051
nth_element patch

Here is a patch. The actual patch is just changing '__last - 2' to '__last - 1' (***NOTE*** When back-applying this patch, before the recent merge of algorithms, there will be two copies of __last - 2 to change).

The larger part is a patch just for this bug, and a general big improvement in the test procedure, to sanity check no other related methods have any issues (they do not). I intend to continue to add new testsuite functions in the future, but this will do to start.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Chris, <random> isn't unconditionally available (requires _GLIBCXX_USE_C99_STDINT_TR1 defined). Please massage the new tests (and the new header) to just do nothing when <random> isn't available and send the result separately to the library mailing list. Let's first commit everywhere fix + minimal testcase.

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

I think you can probably simply add at the beginning of the new tests:

// { dg-require-cstdint "" }

and then the new include doesn't require any change. But please double check, we don't want new Bugzillas: say, damage check_v3_target_cstdint to artificially return false on your machine and double check that the new testcases are simply skipped.

Revision history for this message
In , Paolo-k (paolo-k) wrote :

Author: paolo
Date: Sun Oct 20 09:07:36 2013
New Revision: 203872

URL: http://gcc.gnu.org/viewcvs?rev=203872&root=gcc&view=rev
Log:
2013-10-20 Chris Jefferson <email address hidden>
     Paolo Carlini <email address hidden>

 PR libstdc++/58800
 * include/bits/stl_algo.h (__unguarded_partition_pivot): Change
 __last - 2 to __last - 1.
 * testsuite/25_algorithms/nth_element/58800.cc: New

Added:
    trunk/libstdc++-v3/testsuite/25_algorithms/nth_element/58800.cc
Modified:
    trunk/libstdc++-v3/ChangeLog
    trunk/libstdc++-v3/include/bits/stl_algo.h

Revision history for this message
In , Paolo-k (paolo-k) wrote :

Author: paolo
Date: Sun Oct 20 09:08:12 2013
New Revision: 203873

URL: http://gcc.gnu.org/viewcvs?rev=203873&root=gcc&view=rev
Log:
2013-10-20 Chris Jefferson <email address hidden>
     Paolo Carlini <email address hidden>

 PR libstdc++/58800
 * include/bits/stl_algo.h (__unguarded_partition_pivot): Change
 __last - 2 to __last - 1.
 * testsuite/25_algorithms/nth_element/58800.cc: New

Added:
    branches/gcc-4_8-branch/libstdc++-v3/testsuite/25_algorithms/nth_element/58800.cc
Modified:
    branches/gcc-4_8-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_8-branch/libstdc++-v3/include/bits/stl_algo.h

Revision history for this message
In , Paolo-k (paolo-k) wrote :

Author: paolo
Date: Sun Oct 20 09:08:26 2013
New Revision: 203874

URL: http://gcc.gnu.org/viewcvs?rev=203874&root=gcc&view=rev
Log:
2013-10-20 Chris Jefferson <email address hidden>
     Paolo Carlini <email address hidden>

 PR libstdc++/58800
 * include/bits/stl_algo.h (__unguarded_partition_pivot): Change
 __last - 2 to __last - 1.
 * testsuite/25_algorithms/nth_element/58800.cc: New

Added:
    branches/gcc-4_7-branch/libstdc++-v3/testsuite/25_algorithms/nth_element/58800.cc
Modified:
    branches/gcc-4_7-branch/libstdc++-v3/ChangeLog
    branches/gcc-4_7-branch/libstdc++-v3/include/bits/stl_algo.h

Revision history for this message
In , Paolo-carlini (paolo-carlini) wrote :

Fixed for 4.7.4/4.8.3/4.9.0.

Revision history for this message
In , Glisse-6 (glisse-6) wrote :

*** Bug 59116 has been marked as a duplicate of this bug. ***

Revision history for this message
In , VChris (vchris1975) wrote :

Problem description:

When I use median function in a spreadsheet it crashes randomly when I type a value into the cell which is in the range belongs to the function. Sometimes it crashes sometimes not. It is unable to reproduce the error. But the error happens only when I use median. If I delete median funciton, no error.

Steps to reproduce:
1. ....
2. ....
3. ....

Current behavior:

Expected behavior:

Operating System: Ubuntu
Version: 4.1.2.3 release

Revision history for this message
In , Eike Rathke (erack) wrote :

Could we have a detailed description how to reproduce this please?

Revision history for this message
In , Caolanm (caolanm) wrote :

insert

23
8
21
17
27
26

=MEDIAN(<the above range>)

Revision history for this message
In , Caolanm (caolanm) wrote :
Revision history for this message
In , Caolanm (caolanm) wrote :

caolanm->varadi.krisztian.info: Where did the version of LibreOffice you use come from. Was it bundled with Ubuntu ?

caolanm->bjoern: What compiler version do you compile LibreOffice with. If this build is the supplied-with-Ubuntu one then it looks like the compiler needs to be fixed.

Revision history for this message
In , VChris (vchris1975) wrote :

Yes, it is bundled with Ubuntu 13.10.
How can I fix this bug?

-------------------------
Sent from iPhone5 16Gb mobile device
-------------------------

2013.12.13. dátummal, 14:31 időpontban <email address hidden> írta:

> Caolán McNamara changed bug 72201
> What Removed Added
> Status ASSIGNED NEW
> Comment # 4 on bug 72201 from Caolán McNamara
> caolanm->varadi.krisztian.info: Where did the version of LibreOffice you use
> come from. Was it bundled with Ubuntu ?
>
> caolanm->bjoern: What compiler version do you compile LibreOffice with. If this
> build is the supplied-with-Ubuntu one then it looks like the compiler needs to
> be fixed.
> You are receiving this mail because:
> You reported the bug.

Revision history for this message
In , Björn Michaelsen (bjoern-michaelsen) wrote :

Downstreaming bug to Ubuntu as: https://bugs.launchpad.net/gcc-defaults/+bug/1261669 and closing as NOTOURBUG here.

@Caolan: Thanks for the hint.

Changed in libreoffice (Ubuntu):
status: New → Incomplete
Revision history for this message
Björn Michaelsen (bjoern-michaelsen) wrote :

@gcc guy: Do we have a fix for this in saucy (released or planned)?

Changed in df-libreoffice:
importance: Unknown → Medium
status: Unknown → Won't Fix
Changed in gcc-defaults:
importance: Unknown → High
status: Unknown → Fix Released
Revision history for this message
In , Rguenth (rguenth) wrote :

*** Bug 59557 has been marked as a duplicate of this bug. ***

Revision history for this message
Matthias Klose (doko) wrote :

fixed in trusty, not fixing in saucy

Changed in gcc-defaults (Ubuntu):
status: New → Fix Released
Changed in libreoffice (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.