Comment 0 for bug 663294

Revision history for this message
Chris Coulson (chrisccoulson) wrote : Firefox built with gcc-4.5 is a non-starter on i386

Binary package hint: gcc-4.5

I noticed this first last week when I tried to do a profile-guided build of Firefox with gcc-4.5 in maverick, and the build was hanging in my PPA. However, I've now recreated this locally on a normal build with gcc-4.5 (in maverick and natty) - on i386 only.

It is hanging here:

#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7fb1169 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142
#2 0xf7fac5cb in _L_lock_748 () from /lib/libpthread.so.0
#3 0xf7fac3f1 in __pthread_mutex_lock (mutex=0xf3a4fab0) at pthread_mutex_lock.c:61
#4 0xf7ff67b3 in malloc_mutex_lock (size=<value optimized out>) at /home/chrisccoulson/src/firefox-4.0/firefox-4.0-4.0~b8~hg20101012r55334+nobinonly/build-tree/mozilla/memory/jemalloc/jemalloc.c:1417
#5 arena_malloc_small (size=<value optimized out>) at /home/chrisccoulson/src/firefox-4.0/firefox-4.0-4.0~b8~hg20101012r55334+nobinonly/build-tree/mozilla/memory/jemalloc/jemalloc.c:3766
#6 arena_malloc (size=<value optimized out>) at /home/chrisccoulson/src/firefox-4.0/firefox-4.0-4.0~b8~hg20101012r55334+nobinonly/build-tree/mozilla/memory/jemalloc/jemalloc.c:3845
#7 imalloc (size=<value optimized out>) at /home/chrisccoulson/src/firefox-4.0/firefox-4.0-4.0~b8~hg20101012r55334+nobinonly/build-tree/mozilla/memory/jemalloc/jemalloc.c:3857
#8 malloc (size=<value optimized out>) at /home/chrisccoulson/src/firefox-4.0/firefox-4.0-4.0~b8~hg20101012r55334+nobinonly/build-tree/mozilla/memory/jemalloc/jemalloc.c:5873
#9 0xf40bf11f in __fopen_internal (filename=0xf3b49522 "/proc/filesystems", mode=0xf3b49417 "r", is32=0) at iofopen.c:76
#10 0xf40c17bc in _IO_fopen64 (filename=0xf3b49522 "/proc/filesystems", mode=0xf3b49417 "r") at iofopen64.c:39
#11 0xf3b3f3ca in ?? () from /lib/libselinux.so.1
#12 0xf3b48f0d in ?? () from /lib/libselinux.so.1
#13 0xf3b37158 in _init () from /lib/libselinux.so.1
#14 0xf7fdf70c in call_init (l=<value optimized out>, argc=<value optimized out>, argv=0xffffd5e4, env=0xffffd5ec) at dl-init.c:70
#15 0xf7fdf829 in _dl_init (main_map=0xf7fee918, argc=<value optimized out>, argv=<value optimized out>, env=0xffffd5ec) at dl-init.c:134
#16 0xf7fd188f in _dl_start_user () from /lib/ld-linux.so.2

...which is happening even before entering main()

If I break in pthread_mutex_lock, and look at the contents of the mutex, I see:

$61 = {__data = {__lock = -207291728, __count = 4087676744, __owner = -207291728, __kind = 0, __nusers = 4294960128, {__spins = 36645888, __list = {__next = 0x22f2c00}}},
  __size = "\260\372\244\363H\377\244\363\260\372\244\363\000\000\000\000\000\344\377\377\000,/\002", __align = -207291728}

...which is clearly full of garbage. Initially, I thought that perhaps the mutex wasn't being initialized - but it is. The interesting bit is happening here, in imalloc():

static inline void *
imalloc(size_t size)
{

 assert(size != 0);

 if (size <= arena_maxclass)
  return (arena_malloc(choose_arena(), size, false));
 else
  return (huge_malloc(size, false));

Note that in this case, size < arena_maxclass.

choose_arena() is an inline that returns arenas[0], which is a pointer to a struct arena_t (note that firefox is using only a single arena). If I break inside choose_arena(), and inspect some values, I see:

(gdb) print arenas[0]
$3 = (arena_t *) 0xf394e040
(gdb) print &arenas[0]
$4 = (arena_t **) 0xf394e000
(gdb) print arenas[0]->lock
$5 = {__data = {__lock = 0, __count = 0, __owner = 0, __kind = 3, __nusers = 0, {__spins = 0, __list = {__next = 0x0}}}, __size = '\000' <repeats 12 times>, "\003\000\000\000\000\000\000\000\000\000\000",
  __align = 0}

The mutex looks correctly initialized here, and is unlocked.

If I step through a few more instructions in to arena_malloc(), then the value of arena (which was returned by choose_arena), is something else entirely:

(gdb) print arena
$6 = (arena_t *) 0xf3a4fab0

...this should be the same as arenas[0], which is what the C code suggests was returned by choose_arena and passed to arena_malloc

Also:

(gdb) print arena->lock
$8 = {__data = {__lock = -207291728, __count = 4087676744, __owner = -207291728, __kind = 0, __nusers = 4294960128, {__spins = 36645888, __list = {__next = 0x22f2c00}}},
  __size = "\260\372\244\363H\377\244\363\260\372\244\363\000\000\000\000\000\344\377\377\000,/\002", __align = -207291728}

So, what is happening here? If I look at the registers, I see that eax and edi both contain this bogus pointer;

(gdb) info registers
eax 0xf3a4fab0 -207291728
ecx 0x3 3
edx 0x0 0
ebx 0xf7ffeebc -134222148
esp 0xffffd3d4 0xffffd3d4
ebp 0xffffd42c 0xffffd42c
esi 0x160 352
edi 0xf3a4fab0 -207291728
eip 0xf7ff673f 0xf7ff673f <malloc+159>
eflags 0x282 [ SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99

Here is the interesting part of malloc disassembled:

   0xf7ff66a0 <+0>: push %ebp
   0xf7ff66a1 <+1>: mov %esp,%ebp
   0xf7ff66a3 <+3>: sub $0x58,%esp
   0xf7ff66a6 <+6>: mov %ebx,-0xc(%ebp)
   0xf7ff66a9 <+9>: call 0xf7ff0917 <__i686.get_pc_thunk.bx>
   0xf7ff66ae <+14>: add $0x880e,%ebx
   0xf7ff66b4 <+20>: mov %esi,-0x8(%ebp)
   0xf7ff66b7 <+23>: mov 0x8(%ebp),%esi
   0xf7ff66ba <+26>: mov %edi,-0x4(%ebp)
   0xf7ff66bd <+29>: cmpb $0x0,0x151c(%ebx)
   0xf7ff66c4 <+36>: je 0xf7ff66f8 <malloc+88>
   0xf7ff66c6 <+38>: test %esi,%esi
   0xf7ff66c8 <+40>: mov $0x1,%eax
   0xf7ff66cd <+45>: cmove %eax,%esi
   0xf7ff66d0 <+48>: cmp 0x480(%ebx),%esi
   0xf7ff66d6 <+54>: jbe 0xf7ff6720 <malloc+128>
   0xf7ff66d8 <+56>: mov %esi,%eax
   0xf7ff66da <+58>: call 0xf7ff5980 <huge_malloc>
   0xf7ff66df <+63>: mov %eax,%esi
   0xf7ff66e1 <+65>: test %esi,%esi
   0xf7ff66e3 <+67>: je 0xf7ff6701 <malloc+97>
   0xf7ff66e5 <+69>: mov %esi,%eax
   0xf7ff66e7 <+71>: mov -0xc(%ebp),%ebx
   0xf7ff66ea <+74>: mov -0x8(%ebp),%esi
   0xf7ff66ed <+77>: mov -0x4(%ebp),%edi
   0xf7ff66f0 <+80>: mov %ebp,%esp
   0xf7ff66f2 <+82>: pop %ebp
   0xf7ff66f3 <+83>: ret
   0xf7ff66f4 <+84>: lea 0x0(%esi,%eiz,1),%esi
   0xf7ff66f8 <+88>: call 0xf7ff45c0 <malloc_init_hard>
   0xf7ff66fd <+93>: test %al,%al
   0xf7ff66ff <+95>: je 0xf7ff66c6 <malloc+38>
   0xf7ff6701 <+97>: call 0xf7ff0530 <__errno_location@plt>
   0xf7ff6706 <+102>: xor %esi,%esi
   0xf7ff6708 <+104>: movl $0xc,(%eax)
   0xf7ff670e <+110>: mov %esi,%eax
   0xf7ff6710 <+112>: mov -0xc(%ebp),%ebx
   0xf7ff6713 <+115>: mov -0x8(%ebp),%esi
   0xf7ff6716 <+118>: mov -0x4(%ebp),%edi
   0xf7ff6719 <+121>: mov %ebp,%esp
   0xf7ff671b <+123>: pop %ebp
   0xf7ff671c <+124>: ret
   0xf7ff671d <+125>: lea 0x0(%esi),%esi
   0xf7ff6720 <+128>: mov %gs:0x0,%eax
   0xf7ff6726 <+134>: nop
   0xf7ff6727 <+135>: lea 0x0(%esi,%eiz,1),%esi
   0xf7ff672b <+139>: lea 0x0,%edx
   0xf7ff6731 <+145>: mov %edx,-0x1c(%ebp)
   0xf7ff6734 <+148>: mov (%edx,%eax,1),%edi
   0xf7ff6737 <+151>: test %edi,%edi
   0xf7ff6739 <+153>: je 0xf7ff6960 <malloc+704>
   0xf7ff673f <+159>: cmp 0x14f8(%ebx),%esi
   0xf7ff6745 <+165>: ja 0xf7ff68a0 <malloc+512>
   0xf7ff674b <+171>: cmp 0x14f0(%ebx),%esi
   0xf7ff6751 <+177>: jae 0xf7ff6858 <malloc+440>
   0xf7ff6757 <+183>: sub $0x1,%esi
   0xf7ff675a <+186>: mov %esi,%edx
   0xf7ff675c <+188>: shr %edx
   0xf7ff675e <+190>: or %esi,%edx
   0xf7ff6760 <+192>: mov %edx,%eax
   0xf7ff6762 <+194>: shr $0x2,%eax
   0xf7ff6765 <+197>: or %edx,%eax
   0xf7ff6767 <+199>: mov %eax,%edx
   0xf7ff6769 <+201>: shr $0x4,%edx
   0xf7ff676c <+204>: or %eax,%edx
   0xf7ff676e <+206>: mov %edx,%eax
   0xf7ff6770 <+208>: shr $0x8,%eax
   0xf7ff6773 <+211>: or %edx,%eax
   0xf7ff6775 <+213>: mov $0xffffffff,%edx
   0xf7ff677a <+218>: mov %eax,%esi
   0xf7ff677c <+220>: shr $0x10,%esi
   0xf7ff677f <+223>: or %eax,%esi
   0xf7ff6781 <+225>: add $0x1,%esi
   0xf7ff6784 <+228>: mov %esi,%eax
   0xf7ff6786 <+230>: shr $0x2,%eax
   0xf7ff6789 <+233>: bsf %eax,%eax
   0xf7ff678c <+236>: cmove %edx,%eax
   0xf7ff678f <+239>: cmp $0x1,%esi
   0xf7ff6792 <+242>: lea 0x9(%eax,%eax,8),%eax
   0xf7ff6796 <+246>: lea 0x94(%edi,%eax,8),%eax
   0xf7ff679d <+253>: mov %eax,-0x1c(%ebp)
   0xf7ff67a0 <+256>: mov $0x2,%eax
   0xf7ff67a5 <+261>: cmova %esi,%eax
   0xf7ff67a8 <+264>: mov %eax,-0x20(%ebp)
   0xf7ff67ab <+267>: mov %edi,(%esp)
   0xf7ff67ae <+270>: call 0xf7ff0620 <pthread_mutex_lock@plt>

So, the contents of edi (which seems to contain our pointer to the bogus arena_t) is what is passed to pthread_mutex_lock (which makes sense - the lock is the first entry in struct arena_t).

From 0xf7ff6720 <+128> is happening inside choose_arena(). I'm not sure what the following instruction sequence is doing, but it's loading the wrong value in to edi, which is then passed to put on the stack and passed to pthread_mutex_lock:

   0xf7ff6720 <+128>: mov %gs:0x0,%eax
   0xf7ff672b <+139>: lea 0x0,%edx
   0xf7ff6734 <+148>: mov (%edx,%eax,1),%edi

I would have thought that this sequence would be loading the contents of 0xf394e000 in to the edi register, which is where the valid pointer is currently stored

I'm not sure what the problem is, but I can't see anything obviously wrong with jemalloc, and this worked just fine in gcc-4.4