firefox-3.1 crashed with SIGSEGV in memalign()

Bug #319480 reported by Lorenco Trichardt
138
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Invalid
Medium
xulrunner-1.9.1 (Ubuntu)
Fix Released
Medium
Unassigned
Jaunty
Fix Released
High
Alexander Sack

Bug Description

Binary package hint: firefox-3.1

1. start browser
2. run from a separate terminal: firefox-3.1

actual result: remote client process segfaults on exit
expected result: remote client process should exit cleanly.

Stacktrace:
#0 0xb7ccf3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7cd2cc3 in memalign_hook_ini () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7cd2ae2 in memalign () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ff0d72 in tls_get_addr_tail () from /lib/ld-linux.so.2
#4 0xb7ff121b in ___tls_get_addr_internal () from /lib/ld-linux.so.2
#5 0xb5de1962 in ?? () from /lib/libselinux.so.1
#6 0xb5ddab7e in ?? () from /lib/libselinux.so.1
#7 0xb5dd31d8 in ?? () from /lib/libselinux.so.1
#8 0xb5de3340 in _fini () from /lib/libselinux.so.1
#9 0xb7feea43 in _dl_fini () from /lib/ld-linux.so.2
#10 0xb7c8db89 in exit () from /lib/tls/i686/cmov/libc.so.6
#11 0xb7c7577d in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#12 0x08049611 in ?? ()

Revision history for this message
In , Reed Loden (reed) wrote :

Better debugging symbols would be nice, but I'll toss this to jemalloc to see if they have any ideas.

Revision history for this message
In , Fta+bugzilla (fta+bugzilla) wrote :

I have most possible debugging symbols installed on my Ubuntu/Jaunty box, including libc, firefox but apparently, i still miss some. /me looking.

Revision history for this message
In , Fta+bugzilla (fta+bugzilla) wrote :

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7c906d0 (LWP 1554)]
0xb7d0b3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt
#0 0xb7d0b3a1 in ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
#1 0xb7d0ecc3 in memalign_hook_ini () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d0eae2 in memalign () from /lib/tls/i686/cmov/libc.so.6
#3 0xb8031d82 in tls_get_addr_tail () from /lib/ld-linux.so.2
#4 0xb803222b in ___tls_get_addr_internal () from /lib/ld-linux.so.2
#5 0xb5dee962 in fini_context_translations () at setrans_client.c:211
#6 0xb5de7b7e in fini_lib () at init.c:127
#7 0xb5de01d8 in __do_global_dtors_aux () from /lib/libselinux.so.1
#8 0xb5df0340 in _fini () from /lib/libselinux.so.1
#9 0xb802fa53 in _dl_fini () from /lib/ld-linux.so.2
#10 0xb7cc9b89 in exit () from /lib/tls/i686/cmov/libc.so.6
#11 0xb7cb177d in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#12 0x080496f1 in ?? ()
(gdb)

Revision history for this message
In , Jason Evans (jasone) wrote :

For some reason the TLS code is calling into glibc's malloc implementation (via memalign()). Since ptmalloc_init() is being called as a result, we can tell that glibc's internal malloc data structures have never been initialized. I don't know why this is causing a crash, but it isn't even touching jemalloc. Maybe there's some strange interaction with SELinux...

Revision history for this message
Lorenco Trichardt (trichalo) wrote :

Binary package hint: firefox-3.1

Opened the browser .....

ProblemType: Crash
Architecture: i386
CrashCounter: 1
DistroRelease: Ubuntu 9.04
ExecutablePath: /usr/lib/firefox-3.1b2/firefox-3.1
NonfreeKernelModules: ath_hal
Package: firefox-3.1 3.1~b2+build1+nobinonly-0ubuntu1
ProcCmdline: /usr/lib/firefox-3.1b2/firefox-3.1
ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
Signal: 11
SourcePackage: firefox-3.1
StacktraceTop:
 ?? () from /lib/tls/i686/cmov/libc.so.6
 ?? () from /lib/tls/i686/cmov/libc.so.6
 memalign () from /lib/tls/i686/cmov/libc.so.6
 ?? () from /lib/ld-linux.so.2
 ___tls_get_addr () from /lib/ld-linux.so.2
Title: firefox-3.1 crashed with SIGSEGV in memalign()
Uname: Linux 2.6.28-4-generic i686
UserGroups: adm admin audio cdrom dialout fax lpadmin netdev plugdev sambashare scanner video

Revision history for this message
Lorenco Trichardt (trichalo) wrote :
Revision history for this message
Apport retracing service (apport) wrote : Symbolic stack trace

StacktraceTop:ptmalloc_init () from /lib/tls/i686/cmov/libc.so.6
memalign_hook_ini () from /lib/tls/i686/cmov/libc.so.6
memalign () from /lib/tls/i686/cmov/libc.so.6
tls_get_addr_tail () from /lib/ld-linux.so.2
___tls_get_addr_internal () from /lib/ld-linux.so.2

Revision history for this message
Apport retracing service (apport) wrote : Symbolic threaded stack trace
Changed in firefox-3.1:
importance: Undecided → Medium
Revision history for this message
Alexander Sack (asac) wrote :

I think this is due to jemalloc doing crazy stuff. I can reproduce. Lets check whether that happens with upstream builds too.

Changed in firefox-3.1:
status: New → Triaged
description: updated
description: updated
Revision history for this message
In , Alexander Sack (asac) wrote :
Changed in firefox:
status: Unknown → Confirmed
Revision history for this message
sfn (safin-poczta) wrote :

Mar 2 18:48:58 sfn-esprimo kernel: [10899.323888] firefox-3.1[10789]: segfault at 7fd2b2b7d07d ip 00007fd2b4204f68 sp 00007fffbd5c7aa0 error 4 in libc-2.9.so[7fd2b418c000+168000]

Alexander Sack (asac)
affects: firefox-3.1 (Ubuntu) → firefox-3.5 (Ubuntu)
Revision history for this message
In , Alexander Sack (asac) wrote :

Created attachment 383038
leaking env in nsAppRunner.cpp

leaking env in nsAppRunner.cpp seems to fix this.

Revision history for this message
In , Alexander Sack (asac) wrote :

Created attachment 383040
 leaking env in nsAppRunner.cpp (non garbage patch)

previous wasn't a legal diff

Revision history for this message
In , Timeless-bemail (timeless-bemail) wrote :

so um, are you saying it isn't safe to call PR_SetEnv w/ a null pointer?

Revision history for this message
In , Wan-Teh Chang (wtc-google) wrote :

I think it is the strdup call rather than the null pointer
check that fixes the crash. The string you pass to PR_SetEnv,
which simply passes it to putenv, needs to stay around after
PR_SetEnv returns because the string "shall become part of the
environment":
http://www.opengroup.org/onlinepubs/009695399/functions/putenv.html

The putenv man page also says:
  A potential error is to call putenv() with an automatic
  variable as the argument, then return from the calling
  function while string is still part of the environment.

Revision history for this message
In , Timeless-bemail (timeless-bemail) wrote :

Comment on attachment 383040
 leaking env in nsAppRunner.cpp (non garbage patch)

ok, here's a free r-,
   // We intentionally leak |expr| here since it is required by PR_SetEnv.
the code is already trying to leak, but it's trying to leak an nspr pointer. If you're going to do this, just fix SaveWordToEnv to do:

SaveWordToEnv(const char *name, const nsACString & word)
{
  char *expr = PR_smprintf("%s=%s", name, PromiseFlatCString(word).get());
  if (!expr)
      return;

  /* this copy is intentionally leaked per setenv requirements */
  PR_SetEnv (strdup (env));

  PR_smprintf_free(env);
}

Revision history for this message
In , Timeless-bemail (timeless-bemail) wrote :

and note that there shouldn't be a space between foo and () if foo is a function.

Revision history for this message
In , Alexander Sack (asac) wrote :

(In reply to comment #10)
> (From update of attachment 383040 [details])
> ok, here's a free r-,
> // We intentionally leak |expr| here since it is required by PR_SetEnv.
> the code is already trying to leak, but it's trying to leak an nspr pointer. If
> you're going to do this, just fix SaveWordToEnv to do:
>

Yeah. SaveWordToEnv is already properly leaked. But not all places are using that func to set their env. This patch ensures that also the direct invocations to PR_SetEnv are getting leaked.

Revision history for this message
In , Timeless-bemail (timeless-bemail) wrote :

well, please find them and identify the right one.

Revision history for this message
In , Karlt (karlt) wrote :

nsAppRunner.cpp is linked in libxul.so so I wonder whether it has been unloaded at this stage so that it's static strings inserted into the environment are no longer valid pointers.

Making sure the static strings are copied to the heap would fix this.
Setting environ to NULL on return might also be a hacky way to fix this.

The stack here highlights another issue too though:

readelf --reloc /lib32/ld-linux.so.2 includes this

Relocation section '.rel.plt' at offset 0x8d4 contains 5 entries:
 Offset Info Type Sym.Value Sym. Name
0001afd0 00000b07 R_386_JUMP_SLOT 00014b40 __libc_memalign
0001afd4 00000407 R_386_JUMP_SLOT 00014c60 malloc
0001afd8 00001007 R_386_JUMP_SLOT 00014d30 calloc
0001afdc 00000807 R_386_JUMP_SLOT 00014c90 realloc
0001afe0 00000507 R_386_JUMP_SLOT 00014ae0 free

I don't know why only memalign calls use the libc-specific symbol name but dl-tls.c looks like it calls free on pointers from __libc_malloc.
That doesn't seem to explain the crash in ptmalloc_init but is probably something that should be fixed.

Attachment 386469 would resolve the inconsistency here by making __libc_memalign use jemalloc's memalign. That would also avoid the call to memalign_hook_ini, avoiding this crash.

Revision history for this message
Alexander Sack (asac) wrote :

fixed in 3.5 final upload:
Fix crash on shutdown by leaking env between firefox and xulrunner - add debian/patches/bz473629_lp319480_leak_setenv_apprunner.patch

Changed in firefox-3.5 (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Alexander Sack (asac) wrote :

please pocket-copy binaries from https://edge.launchpad.net/~ubuntu-mozilla-security/+archive/ppa for this SRU

xulrunner-1.9.1 - 1.9.1+nobinonly-0ubuntu0.9.04.1

affects: firefox-3.5 (Ubuntu Jaunty) → xulrunner-1.9.1 (Ubuntu Jaunty)
Changed in xulrunner-1.9.1 (Ubuntu Jaunty):
assignee: nobody → Alexander Sack (asac)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Martin Pitt (pitti) wrote :

Accepted into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in xulrunner-1.9.1 (Ubuntu Jaunty):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Nizar Kerkeni (nizarus) wrote :

I can't reproduce this bug with package into jaunty-proposed (64 bits version)

Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Revision history for this message
Alexander Sack (asac) wrote :

update now available in jaunty-security and jaunty-updates ... closing bug manually because i failed to properly mark it in the changelog.

Changed in xulrunner-1.9.1 (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Revision history for this message
In , Alexander Sack (asac) wrote :

(In reply to comment #14)
> nsAppRunner.cpp is linked in libxul.so so I wonder whether it has been unloaded
> at this stage so that it's static strings inserted into the environment are no
> longer valid pointers.
>

right. This matches my evaluation. hence almost all PR_SetEnv need to be leaked in this file which is why i used this macro. We can move it further down or go through the list and explicitly strcpy all but one PR_Set env things.

Revision history for this message
In , Alexander Sack (asac) wrote :

(In reply to comment #14)
>
> Attachment 386469 [details] would resolve the inconsistency here by making
> __libc_memalign use jemalloc's memalign. That would also avoid the call to
> memalign_hook_ini, avoiding this crash.

I can check your patch as well. however, we also had other problems because of environ getting trashed after unloading libxul.so (namely, the libatk bridge made things crash when it tried to access environ in exit_hook); so leaking is probably still needed.

Revision history for this message
Alexander Sack (asac) wrote :

here the jaunty changelog:
xulrunner-1.9.1 (1.9.1+nobinonly-0ubuntu0.9.04.1) jaunty-security; urgency=low

  revert changes not suitable for stable branch
  + revert moving section and priority in debian/control

  [ Fabien Tassin ]
  * New upstream release taken from tag FIREFOX_3_5_RELEASE
  * Add libxul-dev and libmozjs-dev to Conflicts of the main
    xulrunner-1.9.1 package
    - update debian/control
  * Change the id of the langpacks to 1.9.1
    - update debian/translation-support/install.rdf.in

xulrunner-1.9.1 (1.9.1~rc2+nobinonly-0ubuntu1) karmic; urgency=low

  * New upstream release taken from tag FIREFOX_3_5rc2_RELEASE
  * Bump requirement for system nspr to >= 4.8 (bmo 492464), for sqlite
    to >= 3.6.10 (bmo 478297) and for cairo to >= 1.6.0 (bmo 428563)
    - update debian/rules
  * Bump Standards-Version to 3.8.1
    - update debian/control
  * Make the -dbg package strictly depend on the main package, and move it
    to the debug section, and assign its priority to extra
    - update debian/control
  * Fix crash on shutdown by leaking env between firefox and xulrunner
    - add debian/patches/bz473629_lp319480_leak_setenv_apprunner.patch
    - update debian/patches/series
  * Update diverged patches:
    - update debian/patches/bzXXX_fix_test_suite_bashisms.patch
    - update debian/patches/bzXXX_urlclassifier_prefs_in_toolkit.patch

 -- Alexander Sack < <email address hidden>> Wed, 01 Jul 2009 01:09:12 +0200

Revision history for this message
In , Karlt (karlt) wrote :

(In reply to comment #16)
> however, we also had other problems because of environ getting trashed after
> unloading libxul.so (namely, the libatk bridge made things crash when it
> tried to access environ in exit_hook); so leaking is probably still needed.

Yes, sounds like leaking would still be needed.
Here too:

http://hg.mozilla.org/mozilla-central/file/0471fa063e43/accessible/src/atk/nsAppRootAccessible.cpp#l567

I'm not a toolkit peer, but I think it would be better to change only the
instances that are not already leaked to use a function name that is not
PR_SetEnv. SaveWordToEnv() is probably fine for this.

Maybe the attempts to clear environment values could check that they are set
before needing to leak:

http://hg.mozilla.org/mozilla-central/file/f46e6aee1335/toolkit/xre/nsAppRunner.cpp#l3303

Revision history for this message
In , Karlt (karlt) wrote :

Just my notes on various environment APIs:

glibc's setenv (which copies) will reuse strings (from previous setenv calls)
if the value matches an old value for a variable. I don't know why it never
modifies or frees old strings. Perhaps this is to make the result of getenv
persistant but that is not guaranteed anyway because of putenv.

POSIX Programmer's Manual page for putenv says "The space used by string is no
longer used once a new string which defines name is passed to putenv()."

PR_SetEnv() docs say:
"The caller must ensure that the string passed to PR_SetEnv() is persistent."

nsIEnvironment deletes strings passed to PR_SetEnv once the value has been
replaced with another.
http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsIEnvironment.idl

IEEE Std 1003.1, 2003 Edition POSIX does not have const in the prototype
definition for putenv, but I don't know why this is because there doesn't seem
to be any reason why putenv should modify the string:

int putenv(char *string);

/home/karl/moz/dev/nsprpub/pr/src/misc/prenv.c: In function 'PR_SetEnv':
/home/karl/moz/dev/nsprpub/pr/src/misc/prenv.c:96: warning: passing argument 1 of 'putenv' discards qualifiers from pointer target type

Revision history for this message
In , Karlt (karlt) wrote :

(In reply to comment #17)
> (In reply to comment #16)
> > however, we also had other problems because of environ getting trashed after
> > unloading libxul.so (namely, the libatk bridge made things crash when it
> > tried to access environ in exit_hook); so leaking is probably still needed.
>
> Yes, sounds like leaking would still be needed.

Actually, on further thought, the atk crash is probably caused by something else.
If dependent libraries were unloaded before atexit hooks run then I expect bigger problems.

It's probably worth checking our theory here that libxul has been unloaded even for the crash in comment 3. "info shared" at the time of the crash will show the loaded shared libraries.

Revision history for this message
In , Timeless-bemail (timeless-bemail) wrote :

atk is generally broken, there are bugs, you can find them.

putenv isn't const because on some platforms you can do this:

char x[]="test=hello cruel world";
putenv(x);
char *evil = getenv("test");
evil[11]='d';
evil[13]='o';
evil[14]='l';

Revision history for this message
In , Mh+mozilla (mh+mozilla) wrote :

*** This bug has been marked as a duplicate of bug 555894 ***

Changed in firefox:
status: Confirmed → Invalid
Changed in firefox:
importance: Unknown → Medium
status: Invalid → Unknown
Changed in firefox:
status: Unknown → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.