mountall fails, broken (powerpc?) gcc?

Bug #432222 reported by David Hedberg
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
gcc-4.4 (Ubuntu)
Fix Released
Undecided
Unassigned
mountall (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: mountall

My system (ibook running karmic) is not booting since the "general breakage" in the last couple of days. Installing the latest upgrades does not help.

Edit: This seems to be due to a gcc bug, see comment #8

The boot seems to stop at mountall, whose output contains (perhaps unrelated) some garbled paths that I haven't been able to track down the source of. The "garbled pattern" is different between the boots, so it's obviously reading from some uninitialized memory somewhere.

I attach a log of the output of mountall (obtained by inserting "bash" into /etc/init/mountall.conf and running mountall --debug &> .. as described in another bug), and my fstab.

I just refuse to believe that I have to actually reinstall.. ;-)

Installed versions of upstart and mountall:
ii mountall 0.1.6 filesystem mounting tool
ii upstart 0.6.3-3 event-based init daemon

kernel is:
Linux ibook 2.6.31-10-powerpc #34-Ubuntu Tue Sep 15 23:53:36 UTC 2009 ppc GNU/Linux

fstab and mountall log attached.

Revision history for this message
David Hedberg (david-hedberg) wrote :
Revision history for this message
David Hedberg (david-hedberg) wrote :
description: updated
description: updated
Revision history for this message
David Hedberg (david-hedberg) wrote :

Ok, I have it running.

As I'm not quite sure what the C standard says here I don't know if I just replaced one kludge for another or if there's an obscure compiler issue involved on ppc, but with the attached patch, my ubuntu starts once again for the first time in a couple of days.

(And there's probably no reason to make the variable static either)

Revision history for this message
David Hedberg (david-hedberg) wrote :

And now diffed in the correct order.. ;-)

Revision history for this message
David Hedberg (david-hedberg) wrote :

Sorry for all the spam, but I have confirmed that this approach does not work unless the variables are made static. I'll also attach a cleaner patch while I'm at it.

Revision history for this message
3vi1 (launchpad-net-eternaldusk) wrote :

I see this exact same issue after updating my PS3 today... Guess I'm going to have to learn how to cross-compile mountall with your patch to get it back running.

Revision history for this message
Norberto Bensa (nbensa) wrote :

Hello,

my box booted once in ten tries (amd64). It didn't show garbled paths, but it didn't like my xfs partitions. Sometimes it was /home, sometimes /var.

Your patch worked in the first try (I'll update tomorrow if it works in every boot) but being able to reboot is a step in the right direction.

I still get var mounted on /dev/.var in mount/df (mtab has wrong info) but that's for sure another bug.

BTW: __mountall needs to be static because of the *_hook_walk() funtions inside *_hook()
Maybe it is a gcc 4.4.1 issue because it doesn't make sense to me the need for static char __mountall[].

Revision history for this message
David Hedberg (david-hedberg) wrote :

Yes, maybe it is a compiler issue. I tried to write a small test case that isolates the problem, and I think that I have succeeded. It needs to be (like mountall) compiled with both -fPIE and -Os to actually show the problem, so my guess is that something important is being optimized away.

This is the output from the "problematic system":
david@ibook:~$ uname -a
Linux ibook 2.6.31-10-powerpc #34-Ubuntu Tue Sep 15 23:53:36 UTC 2009 ppc GNU/Linux
david@ibook:~$ gcc --version | head -n1
gcc (Ubuntu 4.4.1-4ubuntu1) 4.4.1
david@ibook:~$ gcc -fPIE -Os test.c -o gcc-test && ./gcc-test
a in extern_func: 123456789
a in intern_func: -1077763264

This is the output from an x86 system running debian, showing the "expected" output:
david@fww:~$ uname -a
Linux fww 2.6.26-2-686 #1 SMP Fri Aug 14 01:27:18 UTC 2009 i686 GNU/Linux
david@fww:~$ gcc --version | head -n 1
gcc (Debian 4.3.2-1.1) 4.3.2
david@fww:~$ gcc -fPIE -Os test.c -o gcc-test && ./gcc-test
a in extern_func: 123456789
a in intern_func: 123456789

As you can see the compilers aren't exactly comparable, but nevertheless it seems to point at a real problem with gcc.

Test app is attached. As said, this seems to happen only if the flags -fPIE and -Os are specified.

summary: - boot fails, mountall likely culprit
+ mountall fails, broken (powerpc?) gcc?
Revision history for this message
David Hedberg (david-hedberg) wrote :

As this really seems to be a problem with gcc, I'm "reassigning" this.

affects: mountall (Ubuntu) → gcc-4.4 (Ubuntu)
description: updated
Revision history for this message
Matthias Klose (doko) wrote :

> As this really seems to be a problem with gcc, I'm "reassigning" this.

and keep the broken mountall binary? If standard optimisation options (-O2), these should be used.

Revision history for this message
Colin Watson (cjwatson) wrote :

Don't use variable names starting with "__" - those are reserved for the language implementation, per the C standard.

Revision history for this message
David Hedberg (david-hedberg) wrote :

Thanks, I failed to figure out how to assign the bug to several packages. :-)

Yes, I just confirmed that compiling mountall with -O2 fixes the issue here, or I should say, produces a working binary. So that would be a solution I guess, if it could be incorporated into the official package. The underlying issue still seems to be gcc though.

Regarding the patch, "to my defense" I wasn't really expecting it to be applied "officially" anyway as it doesn't really seem like a proper fix to me.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Right, it's a gcc bug.

As I'm sure you're away, the PowerPC port is not a first-class port at this point

Changed in mountall (Ubuntu):
status: New → Won't Fix
Revision history for this message
Matthias Klose (doko) wrote :

Scott, maybe if you use the default optimization settings, but if you do choose to use anything else than -g -O2, please care about the reports.

Changed in mountall (Ubuntu):
status: Won't Fix → Confirmed
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Why? It's still clearly a gcc bug, no?

Changed in mountall (Ubuntu):
status: Confirmed → Won't Fix
Matthias Klose (doko)
Changed in mountall (Ubuntu):
status: Won't Fix → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mountall - 0.1.7

---------------
mountall (0.1.7) karmic; urgency=low

  * Build with -O2 on powerpc to work around wrong-code generation with -Os.
    LP: #432222.

 -- Matthias Klose <email address hidden> Tue, 22 Sep 2009 00:31:52 +0200

Changed in mountall (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote :

Hrm, somebody is taking chances here.

Nested functions are evil and forbidden by ISO C standard.

I don't know what kind of black magic gcc is supposed to use to be able to find the local variables of the declaration scope when the nested function is called via a function pointer from outside. I suppose it will have to create some kind of trampoline on the stack, which means you lose non-exec stack capability for your program.

It may still be a bug that it doesn't work on ppc with some optimisations, but it's really not proper C in the first place.

Revision history for this message
Matthias Klose (doko) wrote :

please reopen with a testcase if this turns out to be a problem in GCC

Changed in gcc-4.4 (Ubuntu):
status: New → Invalid
Revision history for this message
David Hedberg (david-hedberg) wrote :

Is the test case attached in comment #8 insufficient?

According to the gcc documentation ( in particular, http://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html ), the code should be valid as long as the call to the nested function is executed before the outer function exits. I take it this functionality isn't actually used much, but to me anyway it seems like a bug..

(I should also confirm that mountall 0.1.7 fixes the issue for me, thanks.)

Revision history for this message
Benjamin Herrenschmidt (benh-kernel) wrote : Re: [Bug 432222] Re: mountall fails, broken (powerpc?) gcc?

On Thu, 2009-09-24 at 00:13 +0000, davidh wrote:
> Is the test case attached in comment #8 insufficient?
>
> According to the gcc documentation ( in particular,
> http://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html ), the code
> should be valid as long as the call to the nested function is executed
> before the outer function exits. I take it this functionality isn't
> actually used much, but to me anyway it seems like a bug..
>
> (I should also confirm that mountall 0.1.7 fixes the issue for me,
> thanks.)

Well, it did look like a gcc bug to me indeed.

However, the test case here doesn't seem to break on fedora gcc 4.4.0 or
jaunty 4.3.3

I'm still putting back that karmic machine into shape and will then be
able to check what's up with the karmic gcc version.

Cheers,
Ben.

Revision history for this message
Alan Modra (amodra) wrote :

I also couldn't reproduce the problem here with any of the compilers I have lying around here. davidh, can you attach the .o for your testcase to this bug report?

Revision history for this message
David Hedberg (david-hedberg) wrote :

.o as requested, produced by
gcc -c -fPIE -Os test.c

Linking and running:
gcc test.o -o gcc-test && ./gcc-test
a in extern_func: 123456789
a in intern_func: -1074204592

Revision history for this message
David Hedberg (david-hedberg) wrote :

A typical run of a slightly modified test case ("extended" printf) gives:
a (@0xbfc22160) in extern_func: 123456789
a (@0xbfc22150) in intern_func: -1077796448

where the address is off by the same nicely even amount every run.

Revision history for this message
Alan Modra (amodra) wrote :

Huh. You have what looks like ppc64 code in there.

  4c: 80 02 8f f8 lwz r0,-28680(r2)
  50: 90 01 00 3c stw r0,60(r1)

Where did that come from? It seems you have _FORTIFY_SOURCE defined for you too, somehow. Maybe that is pulling in a bad printf define? Please attach the .i generated by "gcc -E -fPIE -Os test.c"

Revision history for this message
David Hedberg (david-hedberg) wrote :

Attached file generated by "gcc -E -fPIE -Os test.c > test.i"

Whatever the problem is, it should at the very least be shared by the system building the powerpc packages. Either that, or my test case is broken. :)

Revision history for this message
Kees Cook (kees) wrote :

On Thu, Sep 24, 2009 at 03:32:43AM -0000, Alan Modra wrote:
> Huh. You have what looks like ppc64 code in there.
>
> 4c: 80 02 8f f8 lwz r0,-28680(r2)
> 50: 90 01 00 3c stw r0,60(r1)
>
> Where did that come from? It seems you have _FORTIFY_SOURCE defined for
> you too, somehow. Maybe that is pulling in a bad printf define? Please
> attach the .i generated by "gcc -E -fPIE -Os test.c"

-D_FORTIFY_SOURCE=2 is a default in Ubuntu's compiler. It can be turned
off with -U_FORTIFY_SOURCE in the build if you want to see if that could be
causing the glitch.

https://wiki.ubuntu.com/CompilerFlags

--
Kees Cook
Ubuntu Security Team

Matthias Klose (doko)
Changed in gcc-4.4 (Ubuntu):
status: Invalid → New
Revision history for this message
Alan Modra (amodra) wrote :

Found the problem. It's nothing to do with _FORTIFY_SOURCE, and those intructions indexing off r2 are to do with -fstack-check so no problem there either.

Fixed as follows. Incidentally this bug was triggered by fixing the obvious bug in no_global_regs_above

Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c (revision 152105)
+++ gcc/config/rs6000/rs6000.c (working copy)
@@ -16148,7 +16148,7 @@ rs6000_emit_prologue (void)
   int using_store_multiple;
   int using_static_chain_p = (cfun->static_chain_decl != NULL_TREE
                               && df_regs_ever_live_p (STATIC_CHAIN_REGNUM)
- && !call_used_regs[STATIC_CHAIN_REGNUM]);
+ && call_used_regs[STATIC_CHAIN_REGNUM]);
   HOST_WIDE_INT sp_offset = 0;

   if (TARGET_FIX_AND_CONTINUE)

Revision history for this message
Matthias Klose (doko) wrote :

running a test build

Changed in gcc-4.4 (Ubuntu):
status: New → In Progress
Revision history for this message
Matthias Klose (doko) wrote :

progressions in the testsuite (all for -m32):

-FAIL: gcc.c-torture/execute/nestfunc-3.c execution, -Os
-FAIL: gfortran.dg/char_cshift_1.f90 -Os execution test
-FAIL: gfortran.dg/char_cshift_2.f90 -Os execution test
-FAIL: gfortran.dg/char_eoshift_1.f90 -Os execution test
-FAIL: gfortran.dg/char_eoshift_2.f90 -Os execution test
-FAIL: gfortran.dg/char_eoshift_3.f90 -Os execution test
-FAIL: gfortran.dg/char_eoshift_4.f90 -Os execution test
-FAIL: gfortran.dg/default_initialization_3.f90 -Os execution test
-FAIL: gfortran.dg/namelist_28.f90 -Os execution test
-FAIL: gfortran.dg/parent_result_ref_2.f90 -Os execution test
-FAIL: gfortran.dg/ret_pointer_2.f90 -Os execution test

regressions (for -m32):

+FAIL: gcc.dg/torture/calleesave-sse.c -Os execution test

No changes for -m64.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gcc-4.4 - 4.4.1-4ubuntu4

---------------
gcc-4.4 (4.4.1-4ubuntu4) karmic; urgency=low

  * Disable the build of neon optimized runtime libs on armel.
  * libjava: Use atomic builtins For Linux ARM/EABI, backported from the
    trunk.
  * Proposed patch to fix wrong-code on powerpc (Alan Modra). LP: #432222.
  * Update to SVN 20090925 from the gcc-4_4-branch (r152174).
    - Fixes PR target/40473.

 -- Matthias Klose <email address hidden> Thu, 24 Sep 2009 14:16:41 +0200

Changed in gcc-4.4 (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.