NS_InvokeByIndex in xptcinvoke_arm.cpp is not Thumb-2 safe for Lucid
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mozilla Firefox |
Fix Released
|
Medium
|
|||
xulrunner-1.9.1 (Ubuntu) |
Fix Released
|
High
|
Alexander Sack | ||
Lucid |
Fix Released
|
High
|
Alexander Sack |
Bug Description
Binary package hint: xulrunner-1.9.1
Two issues:
1) Either this file should be built individually as ARM code using -marm, or the code needs to be reviewed and modified to be Thumb-2 safe.
Specifically, calling a function using:
mov lr, pc
mov pc, ip
...will not interwork properly between ARM and Thumb state, depending on the function called.
Instead, something like the following is needed:
#ifdef __thumb__
blx ip
0:
#else
mov lr, pc
mov pc, ip
#endif
I'll try and propose a suitable patch.
I'm currently scanning to see whether there is other problematic code in this package.
Patch posted below.
2) The kernel user helpers are not currently used for implementing atomic operations. (This is the same backend used by the GCC __sync_*() atomic intrinsics.)
There is a configure option --with-arm-kuser to enable this, but it's not currently bassed by the build rules. It's been supported since ff 3.5 / linux 2.6.17 (if I remember correctly)
summary: |
- NS_InvokeByIndex in xptcinvoke_arm.cpp is not Thumb-2 safe + NS_InvokeByIndex in xptcinvoke_arm.cpp is not Thumb-2 safe for Lucid |
Dave Martin (dave-martin-arm) wrote : | #1 |
Dave Martin (dave-martin-arm) wrote : | #2 |
Note: the above patch is not upstream yet, but should get upstreamed in the meantime.
It would be good to pull this into Ubuntu now though, if possible.
description: | updated |
Dave Martin (dave-martin-arm) wrote : | #3 |
- gdb transcript demonstrating the workings of the Thumb-2 patch Edit (25.3 KiB, text/plain)
Got a build and tested it against lucid firefox-3.5.
I didn't appear to get any problems (I'm reporting this bug using the Thumb-2 xulrunner library).
Attached is a gdb session transcript, where I set a breakpoint on the NS_InvokeByIndex implementation: the code is definitely Thumb-2, as can be observed by the instruction offsets not divisible by 4, and the mixture of instruction sizes (among other things).
I step some calls and returns through NS_InvokeByIndex; it seems to be doing the right thing.
Changed in xulrunner-1.9.1 (Ubuntu): | |
assignee: | nobody → Alexander Sack (asac) |
In Mozilla Bugzilla #532198, Reed Loden (reed) wrote : | #4 |
Created an attachment (id=415476)
patch - v1
Upstreaming a patch from Kevin Welton <email address hidden> to fix NS_InvokeByIndex() to use BLX for method invocations when compiling for Thumb. This comes from http://
In Mozilla Bugzilla #532198, Bugmail-lassey (bugmail-lassey) wrote : | #5 |
looks like that's patching a patch that was already backed out. http://
We can use the #ifdef _thumb_ for future work though.
In Mozilla Bugzilla #532198, Reed Loden (reed) wrote : | #6 |
(In reply to comment #1)
> looks like that's patching a patch that was already backed out.
> http://
That's the backout commit...
In Mozilla Bugzilla #532198, Reed Loden (reed) wrote : | #7 |
This patch applies fine to trunk with a few minor offsets.
In Mozilla Bugzilla #532198, Vladimir Vukicevic (vvuk) wrote : | #8 |
There's no reason to not use BLX all the time, no? I thought that's what the updated xptcinvoke patch did.
description: | updated |
In Mozilla Bugzilla #532198, Timeless-bemail (timeless-bemail) wrote : | #9 |
(From update of attachment 415476)
for lazy people, that's bug 530087
>+ "blx ip \n\t" /* call it... */
> "mov lr, pc \n\t" /* call it... */
comments should match the conventions of our file, that means the exact same whitespace, no compression.
>+ "blx ip \n\t" /* copy args to the stack like the
>+ * compiler would */
i think:
/* .... */
/* .... */
>+#ifdef __thumb__
>+ "blx ip \n\t" /* call method */
>+ "mov lr, pc \n\t" /* call method */
Changed in firefox: | |
status: | Unknown → Confirmed |
Changed in xulrunner-1.9.1 (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in xulrunner-1.9.1 (Ubuntu Lucid): | |
milestone: | none → lucid-alpha-2 |
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #10 |
You can (and should) use BLX all the time in this case, but it won't then work on ARMv4. The JIT doesn't work on ARMv4 anyway, and I don't think we care about such old machines.
Having said that, some distributions might care about ARMv4, and they might want support, even if it's slow. (Don't ask me why!) A solution here would be to use the old-style "MOV pc, ip" on ARMv4 (using a compile-time switch of some kind) and BLX everywhere else. We would also have to disable the JIT for ARMv4, but ARMv4 distributions must do that anyway because the JIT hasn't had ARMv4 support for a long time.
NOTE: Using "MOV pc, ip" will break the return stack prediction on newer cores, so even if you don't want to interwork, use BLX <reg>. The performance difference should be significant.
Launchpad Janitor (janitor) wrote : | #11 |
This bug was fixed in the package xulrunner-1.9.1 - 1.9.1.7+
---------------
xulrunner-1.9.1 (1.9.1.
* New upstream release v1.9.1.7 (FIREFOX_
- see USN-878-1
[ Micah Gersten <email address hidden> ]
* Drop patch after upstream landing of (bmo: 521780) aka
extension upgrade with a moved location breaks extension manager
- drop debian/
- update debian/
* Add patch to make Thumb-2 Safe for Lucid (LP: #488354)
- add debian/
- update debian/
[ Dmitrijs Ledkovs <email address hidden> ]
* Added dh_xulrunner from Debian (LP: #498973)
- add debian/
- add debian/
- add debian/
- update debian/rules
- update debian/
-- Micah Gersten <email address hidden> Tue, 05 Jan 2010 17:50:47 -0600
Changed in xulrunner-1.9.1 (Ubuntu Lucid): | |
status: | Triaged → Fix Released |
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #12 |
Created an attachment (id=428899)
Use BLX instead of "MOV pc" function calls.
The new patch uses BLX all the time, and thus imposes an ARMv5T baseline requirement for XPCOM. (We previously had this requirement for the JIT anyway, but that could be trivially disabled for crazy people who wanted to compile for ARMv4.)
I've (barely) tested this with the ARM build, but couldn't get GCC 4.3.3 to produce a Thumb-2 build. (The error looked like a compiler error in an unrelated module.) I could massage it into building just the bits we care about with -mthumb, but it takes a long time to build anyway, let alone with me hacking the build system.
In any case, I've delayed enough for such a trivial patch, so here it is. Whilst it's trivial, it wants pushing to the try server or something like that before it's actually pushed.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #13 |
(In reply to comment #6)
> Having said that, some distributions might care about ARMv4, and they might
> want support, even if it's slow. (Don't ask me why!)
Here is why: openmoko is armv4. Debian supports openmoko.
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #14 |
(In reply to comment #8)
> (In reply to comment #6)
> Here is why: openmoko is armv4. Debian supports openmoko.
Hmm. The Neo FreeRunner uses an ARM920T, which is an ARMv4T device. However, it sits at 400MHz and has just 128MB of RAM. I admit that this is the closest ARMv4(T) device I've seen to being a realistic platform, but is it really worth going out of our way to support? We would have to include #ifdef blocks in our code to support ARMv4T, and these blocks will need testing.
In Mozilla Bugzilla #532198, Vladimir Vukicevic (vvuk) wrote : | #15 |
Yeah, it's highly unlikely that we'll support any ARMv4 platforms going forward. (Even ARMv5 is borderline, but there's little cost in supporting it if we want to support ARMv6 anyway.)
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #16 |
Created an attachment (id=433119)
(Almost) C-only version
How about "simply" replacing the assembly with (almost) C code only ? The compiler will take care of using the proper opcode for the target architecture. It will also create more efficient code and also inline the invoke_* functions.
I get a 25% speed improvement on the TestXPTCInvoke speed test with an ARMv4T target. (note it ends up using bl on armv4t, and blx when using e.g. an armv5t target)
Original assembly:
direct took 3.78 seconds
invoke took 50.84 seconds
So, invoke overhead was ~ 47.06 seconds (~ 93%)
With C, without inlining:
direct took 3.62 seconds
invoke took 38.90 seconds
So, invoke overhead was ~ 35.28 seconds (~ 91%)
With inlined invoke functions:
direct took 3.62 seconds
invoke took 37.53 seconds
So, invoke overhead was ~ 33.91 seconds (~ 90%)
Note I'm not entirely sure the attached patch is totally safe for any compiler, but it works properly when built with gcc 4.4. If this is not the case, I'm pretty sure this can be made safer.
Also note invoke_count_words could also be removed, to improve speed even more, as per bug 530087.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #17 |
(In reply to comment #11)
> I get a 25% speed improvement on the TestXPTCInvoke speed test with an ARMv4T
> target. (note it ends up using bl on armv4t, and blx when using e.g. an armv5t
> target)
(Note this was compiled for ARMv4T, but run on an ARMv5T)
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #18 |
That's a neat solution. The only worry I have is the fiddling of the stack pointer within C code; that is likely to break even in the same compiler with unrelated code changes causing it to prod the stack immediately before calling "func".
Actually, I'd be interested in seeing the assembly for that, as I'd like to see what it's doing differently.
(In reply to comment #11)
> Note I'm not entirely sure the attached patch is totally safe for any compiler,
> but it works properly when built with gcc 4.4. If this is not the case, I'm
> pretty sure this can be made safer.
That doesn't matter; the original assembly was extremely compiler-dependent!
> Also note invoke_count_words could also be removed, to improve speed even more,
> as per bug 530087.
Not without sacrificing memory: invoke_
The best solution I have is to reserve arg_count*2 words on the stack and waste stack space if we have 32-bit arguments. That will allow us to stick to one pass, skipping invoke_
(In reply to comment #12)
> (In reply to comment #11)
> > I get a 25% speed improvement on the TestXPTCInvoke speed test with an ARMv4T
> > target. (note it ends up using bl on armv4t, and blx when using e.g. an armv5t
> > target)
>
> (Note this was compiled for ARMv4T, but run on an ARMv5T)
"bl" and "blx" will exhibit equivalent performance; the only difference is when interworking is involved. Return stack prediction is used for both (but maybe not on an ARMv5T device).
There are two issues here. I think you know these anyway, but for the sake of documentation & clarity in the bug:
* Correctness on a system with mixed ARM & Thumb code: "blx" _must_ be used here.
* Performance:
- ARMv4T interworking branches aren't return-
- ARMv4 long-range branches involve doing "mov pc, <dst_reg>". That's what the old code used to do. ("bl" can't branch to an address in a register and is restricted to PC±32MB.)
----
I was planning to work on this (and 530087) further over the next few days because it's been in my "to do" list for ages, but you're welcome to take it if you want to.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #19 |
(In reply to comment #13)
> That's a neat solution. The only worry I have is the fiddling of the stack
> pointer within C code; that is likely to break even in the same compiler with
> unrelated code changes causing it to prod the stack immediately before calling
> "func".
I think it should be safe, actually, because the compiler knows the stack pointer is being fiddled with by the assembly, and avoids using it. Without forcing the stack pointer with the assembly, the compiled code modifies sp to read the first 3 arguments. (iirc, once with add and once with ldmia)
While this would actually do the same thing, it wouldn't have the same guarantees.
> Actually, I'd be interested in seeing the assembly for that, as I'd like to see
> what it's doing differently.
Will attach.
> There are two issues here. I think you know these anyway, but for the sake of
> documentation & clarity in the bug:
> * Correctness on a system with mixed ARM & Thumb code: "blx" _must_ be used
> here.
> * Performance:
> - ARMv4T interworking branches aren't return-
> and this costs a huge amount of performance. (I've seen figures ranging from 5%
> to 50%, but I don't know what would happen in this particular case.)
> - ARMv4 long-range branches involve doing "mov pc, <dst_reg>". That's what
> the old code used to do. ("bl" can't branch to an address in a register and is
> restricted to PC±32MB.)
The undeniable advantage of the C code is that the compiler will just use the right construct depending on the target architecture. No need to #ifdef __thumb__ or whatever.
> I was planning to work on this (and 530087) further over the next few days
> because it's been in my "to do" list for ages, but you're welcome to take it if
> you want to.
Since I've gone this far already, I think I'll just take it ;)
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #20 |
(In reply to comment #14)
> The undeniable advantage of the C code is that the compiler will just use the
> right construct depending on the target architecture. No need to #ifdef
> __thumb__ or whatever.
Oh, certainly. C code is always preferable to assembly for that very reason, unless you need to do something that C code can't express efficiently (though that isn't the case here).
I'm still not entirely convinced about the SP-fiddling though. It seems unsafe to me. For instance, how do you know that the stack_space array is at the bottom of your frame? C doesn't guarantee anything there (even though you declare it last). Yes, it probably works, but it feels unsafe. That's why the original was in assembly, so it was clear what was happening.
In the case of writing to SP in the asm block: Yes, the compiler knows you've clobbered it, but C mandates that you have a stack and I'd be surprised if the compiler copes well with you doing that. If it isn't using a frame pointer, for example, it won't know how to wind back the stack pointer. Perhaps writing to SP in this way forces it to use a frame pointer, but none of that behaviour is documented (as far as I know) so it's likely to break between even minor compiler revisions.
I like the C implementation, but it does make me feel nervous :-)
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #21 |
(In reply to comment #15)
> (In reply to comment #14)
> > The undeniable advantage of the C code is that the compiler will just use the
> > right construct depending on the target architecture. No need to #ifdef
> > __thumb__ or whatever.
>
> Oh, certainly. C code is always preferable to assembly for that very reason,
> unless you need to do something that C code can't express efficiently (though
> that isn't the case here).
>
> I'm still not entirely convinced about the SP-fiddling though. It seems unsafe
> to me. For instance, how do you know that the stack_space array is at the
> bottom of your frame?
Because it is the only thing that is allocated on stack in the function. But we can probably use __builtin_alloca instead if you're uncomfortable with that, it generates the same code.
> C doesn't guarantee anything there (even though you
> declare it last). Yes, it probably works, but it feels unsafe. That's why the
> original was in assembly, so it was clear what was happening.
>
> In the case of writing to SP in the asm block: Yes, the compiler knows you've
> clobbered it, but C mandates that you have a stack and I'd be surprised if the
> compiler copes well with you doing that. If it isn't using a frame pointer, for
> example, it won't know how to wind back the stack pointer. Perhaps writing to
> SP in this way forces it to use a frame pointer, but none of that behaviour is
> documented (as far as I know) so it's likely to break between even minor
> compiler revisions.
I don't know if -fomit-
Anyways, I can think of several other approaches using a bit more assembly, but still leaving most of the work to the C code, I'll check what kind of code get generated with that.
Here is the assembly i get with the already attached C code (forcing -fno-inline), with an armv4t target:
stmfd sp!, {r4, r5, r6, r7, r8, r9, fp, lr}
mov r5, r0
add fp, sp, #28
mov r6, r1
mov r0, r2
mov r1, r3
mov r7, r2
mov r8, r3
bl _ZL18invoke_
mov r3, r0, asl #2
add r3, r3, #18
bic r3, r3, #7
sub sp, sp, r3
mov r1, r7
mov r2, r8
add r0, sp, #4
bl _ZL20invoke_
mov r4, sp
add r3, sp, #16
#APP
@ 200 "xptcinvoke_
mov sp, r3
@ 0 "" 2
mov r0, r5
ldr r3, [r4, #12]
ldr ip, [r5, #0]
ldmib r4, {r1, r2} @ phole ldm
ldr ip, [ip, r6, asl #2]
mov lr, pc
bx ip
sub sp, fp, #28
ldmfd sp!, {r4, r5, r6, r7, r8, r9, fp, lr}
bx lr
Here it uses bl for the function calls and bx for the method invoke.
With a armv5t ta...
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #22 |
> Because it is the only thing that is allocated on stack in the function. But we
> can probably use __builtin_alloca instead if you're uncomfortable with that, it
> generates the same code.
Not necessarily: For one thing, the "register" qualfier is just a hint, but
there may also be compiler temporaries that go on the stack.
Admittedly, for a function of this size and complexity, it's a fairly safe
assumption, but I think perhaps it needs a comment warning others from adding
code without considering the consequences, as it would be dreadfully easy to
break.
Is there any way we can ASSERT this? I don't think there's an easy way, but I
suppose in the assembly snippet we could assert that sp == stack_space before
assigning it:
#ifdef DEBUG
__asm__ __volatile__ (
"cmp sp, %1"
"moveq %0, #1"
"movne %0, #0"
"add sp, #(4*4)"
: "=r" (stack_ok)
: "r" (stack_space)
: "sp"
);
#else
__asm__ __volatile__ (
"add sp, #(4*4)" /* sp is at stack_space[0], so advance it to [4].
:
:
: "sp"
);
#endif
> I don't know if -fomit-
It sounds like touching the SP does force frame_needed to 1, so it's probably
safe.
> With a armv5t target, the end changes to the following:
> ldmib r4, {r1, r2} @ phole ldm
> mov lr, pc
> ldr pc, [ip, r6, asl #2]
> sub sp, fp, #28
> ldmfd sp!, {r4, r5, r6, r7, r8, r9, fp, pc}
Ouch. I don't know why the compiler is emiting "ldr pc, [...]". It'd be
preferable to load into a temporary (such as "ip") and then blx to that. Hmm.
What do you get if you specify "-mcpu=cortex-a8"? It's possible that the
compiler is assuming that it doesn't matter for ARMv5T devices (and it's
probably right).
(In any case, this is a compiler-related issue that will occur in all the
compiled code.)
> Without the assembly, this is how the first three 32-bits words are read from
> the stack:
> ldr r3, [sp, #12]
> ldr ip, [r5, #0]
> ldmib sp, {r1, r2} @ phole ldm
>
> In the end, sp would point on the third 32-bits word, instead of the fourth.
Nope, sp isn't updated there. It's a strange way to load the values as a single
ldm would suffice, but I suppose it would work and wouldn't affect performance
much.
----
I just have one final concern, and then I'll leave you alone :-) We need to
guarantee that the stack space is 64-bit aligned (for EABI compliance), but I'm
not sure how you do that. Indeed, you removed some alignment code from
invoke_count_words. Have you confirmed that this is actually correct? Any issues
would only show up with 64-bit arguments (and only then if they don't fall
across an alignment boundary).
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #23 |
I missed the fairly obvious call to "ASSERT(stack_ok)" in my DEBUG example :-)
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #24 |
(In reply to comment #17)
> I just have one final concern, and then I'll leave you alone :-) We need to
> guarantee that the stack space is 64-bit aligned (for EABI compliance), but I'm
> not sure how you do that. Indeed, you removed some alignment code from
> invoke_count_words. Have you confirmed that this is actually correct?
Stack allocation already guarantees the stack pointer is going to be aligned. That's why the allocation on stack is space + 1, and the pointer passed to invoke_
Anyways, I am currently testing a rather different approach (combined with the removal of invoke_count_words, because it makes things easier) that doesn't do too much magic with the stack.
I should be done (including full xpcshell-tests run) by tonight or tomorrow.
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #25 |
Right-o. This will also resolve bug 530087 :-)
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #26 |
I have a incomplete patch for the moment but the results are already good:
(somehow the machine I'm using got faster since yesterday, so I reiterated all the benchmarks for comparison)
original code:
direct took 2.20 seconds
invoke took 29.81 seconds
So, invoke overhead was ~ 27.61 seconds (~ 93%)
with attachment 433119:
direct took 2.20 seconds
invoke took 22.66 seconds
So, invoke overhead was ~ 20.45 seconds (~ 90%)
with the WIP patch:
direct took 2.22 seconds
invoke took 16.94 seconds
So, invoke overhead was ~ 14.72 seconds (~ 87%)
We're looking at almost 2x improvement, though the speed test is actually biased, as it only tries one type of calls, that doesn't even need to use the stack.
The current status of the WIP is that it is clean of any assembly and works properly, except that it lacks proper comments, and more importantly, breaks the non EABI case (which I unfortunately can't test, but I can at least try to make it theoretically work).
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #27 |
Some values for a modified speed test, using 10000000 iterations (10 times less) over AddMixedFloats instead of AddTwoInts:
original code:
direct took 10.72 seconds
invoke took 17.41 seconds
So, invoke overhead was ~ 6.69 seconds (~ 38%)
with attachment 433119:
direct took 10.79 seconds
invoke took 16.69 seconds
So, invoke overhead was ~ 5.90 seconds (~ 35%)
with the WIP patch:
direct took 10.78 seconds
invoke took 14.58 seconds
So, invoke overhead was ~ 3.80 seconds (~ 26%)
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #28 |
I wouldn't worry about supporting non-EABI in Linux. It might be nice to have a conditional #error in case someone tries it. GCC still supports non-EABI for now, at least, though I don't know of anyone using it.
A 2* improvement is fantastic! I'm guessing that's because you only make one pass of the argument list; I saw similar effects on the WinCE version with that modification (which was really simple for WinCE's calling convention).
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #29 |
Created an attachment (id=433581)
C-only version
The patch is unfortunately not easily readable. It's probably better to apply it to see the resulting file. Please tell me if the comments are not clear enough.
It is now free of assembly, and I think it is now perfectly safe.
FYI, the main reason I went with the copy_double_word function is that somehow, gcc would generate 2 almost identical assembly snippets for the double word copies when I just inlined the code like it is in the original code: 1 for nsXPTType::T_I64 and nsXPTType::T_U64 and another one for nsXPTType:
And when I say almost identical, it's almost pathetic that it can't factor:
add r0, r0, #7 add r0, r0, #7
bic r7, r0, #7 bic r7, r0, #7
cmp r7, r5 cmp r5, r7
moveq r7, lr moveq r7, lr
sub r9, r3, #16 sub r9, r3, #16
ldmia r9, {r8-r9} ldmia r9, {r8-r9}
add r0, r7, #4 add r0, r7, #4
stmia r7, {r8-r9} stmia r7, {r8-r9}
b .L5 b .L5
As a nice side effect, it made it cleaner to keep non-eabi support, so I just implemented it.
During the development of this patch, I hit some other pathetic cases of stupid gcc optimizations that I'll blog about.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #30 |
Mike - I will apply the patch and test on an N900 and N810. Are there tests apps you have been using to get your measurements?
I was going to test using some talos suites too.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #31 |
When building for N900:
c++ -o xptcinvoke_arm.o -c -fvisibility=hidden -DMOZILLA_
/home/mfinkle/
/home/mfinkle/
/home/mfinkle/
/home/mfinkle/
make[8]: *** [xptcinvoke_arm.o] Error 1
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #32 |
Created an attachment (id=433641)
Additional patch
Mark, can you try squashing this patch ?
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #33 |
(In reply to comment #25)
> Mike - I will apply the patch and test on an N900 and N810. Are there tests
> apps you have been using to get your measurements?
The measurements in e.g. comment 11 come from TestXPTCInvoke in xpcom/reflect/
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #34 |
(In reply to comment #27)
> Created an attachment (id=433641) [details]
> Additional patch
>
> Mark, can you try squashing this patch ?
With this patch applied on the first patch, I get a successful build.
However, when attempting to launch Firefox Mobile on my N900, I get a Segmentation fault.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #35 |
(In reply to comment #29)
> However, when attempting to launch Firefox Mobile on my N900, I get a
> Segmentation fault.
Can you try building the TestXPTCInvoke program in xpcom/reflect/
dist/bin/
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #36 |
Also, is the N900 toolchain building EABI binaries ?
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #37 |
(In reply to comment #30)
> Can you try building the TestXPTCInvoke program in
> xpcom/reflect/
> and then run it with:
> dist/bin/
Completes the whole thing. Even with DoSpeedTests uncommented.
> Also, is the N900 toolchain building EABI binaries ?
I think so. The arch in the packaged filenames says "guneabi"
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #38 |
I have not been able to install gdb on my N900, but I have seen that the segfault is happening in invoke_
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #39 |
(From update of attachment 433581)
>+ current = (PRUint32 *)(((PRUint32)
'~7' is clearer to me than '0xfffffff8'. I suppose that's subjective,
though.
>+ *((PRUint64*) current) = *dw;
I don't think that's valid in the non-EABI case where 'current' isn't
8-byte aligned. It will probably work anyway, but I think C assumes that
64-bit values have 64-bit alignment; LDRD and STRD had this restriction
on older architectures. I think the non-EABI and EABI cases should be
entirely contained in the #ifdef to avoid these issues.
>+ /* Wrap when reaching the end of the stack buffer */
>+ if (d == end) d = stk;
It might be wise to add a debug assertion here to check that 'd' is
within 'stk' and '&stk[end]'. That will catch some logic errors that can
be hard to spot, and easy to introduce in future patches.
> * The routine will issue calls to count the number of words
> * required for argument passing and to copy the arguments to
> * the stack.
We don't do this any more. Well, not explicitly anyway. Can you clobber
this bit of the comment please?
----
I like the look of this patch (with the 'Additional patch' on top), and it appears correct so I've marked it 'r+'. However, the N900 crash obviously needs to diagnosed & fixed before this is pushed.
Thanks,
Jacob
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #40 |
(In reply to comment #34)
> '~7' is clearer to me than '0xfffffff8'. I suppose that's subjective,
> though.
I prefer, too, but I just left what already was there.
> >+ *((PRUint64*) current) = *dw;
>
> I don't think that's valid in the non-EABI case where 'current' isn't
> 8-byte aligned. It will probably work anyway, but I think C assumes that
> 64-bit values have 64-bit alignment; LDRD and STRD had this restriction
> on older architectures. I think the non-EABI and EABI cases should be
> entirely contained in the #ifdef to avoid these issues.
I don't think there is a 64-bit alignment requirement on architectures that matter. FWIW, in Debian we target armv4t, and upstream Mozilla codebase already targets at least armv5.
> >+ /* Wrap when reaching the end of the stack buffer */
> >+ if (d == end) d = stk;
>
> It might be wise to add a debug assertion here to check that 'd' is
> within 'stk' and '&stk[end]'. That will catch some logic errors that can
> be hard to spot, and easy to introduce in future patches.
Fair enough.
> I like the look of this patch (with the 'Additional patch' on top), and it
> appears correct so I've marked it 'r+'. However, the N900 crash obviously needs
> to diagnosed & fixed before this is pushed.
Totally agreed. I'd like some more feedback from Mark if possible.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #41 |
Any update, Mark ?
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #42 |
We are looking into this. Brian Crowder was able to get a stack from a build:
#0 0xbe9ff4a8 in ?? ()
#1 0x43162a2c in invoke_
#2 0x42693488 in XPCWrappedNativ
at /home/mfinkle/
#3 0x4269c2c0 in XPCWrappedNativ
#4 0x4269993c in XPC_WN_GetterSetter (cx=0x41c05800, obj=<value optimized out>, argc=0, argv=0x40455608, vp=0xbe9ff708)
at /home/mfinkle/
#5 0x406d7c0c in js_Invoke (cx=0x41c05800, argc=0, vp=0x40455600, flags=<value optimized out>)
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #43 |
I have been testing on mozilla-192. I will start testing on mozilla-central as well and report.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #44 |
The xptcinvoke code is the same on 1.9.2 and trunk. It would be interesting to have the values for d, stk and paramCount on your stacktrace, but that would either need to break on the invoke_
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #45 |
I made a non-optimized build from mozilla-central. The build works fine on my N900.
I am now going to make a non-optimized build from mozilla-192 and try it.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #46 |
Non optimized build from mozilla-192 segfaults on startup. Here is the stack:
Starting program: /home/opt/
Program received signal SIGSEGV, Segmentation fault.
0xbe9efb70 in ?? ()
0xbe9efb70: andeq r0, r0, r0
(gdb) bt
#0 0xbe9efb70 in ?? ()
#1 0x43a98b30 in invoke_
at /home/mfinkle/
#2 0x426db3f8 in XPCWrappedNativ
at /home/mfinkle/
#3 0x426ebdf4 in XPCWrappedNativ
at /home/mfinkle/
#4 0x426e5fa0 in XPC_WN_GetterSetter (cx=0x40446200, obj=0x421d2200, argc=0, argv=0x404b05a0, vp=0xbe9eff3c)
at /home/mfinkle/
#5 0x4076948c in js_Invoke (cx=0x40446200, argc=0, vp=0x404b0598, flags=2)
at /home/mfinkle/
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #47 |
I tried stepping through invoke_
(gdb) step 1
80 in /home/mfinkle/
(gdb) step 1
88 in /home/mfinkle/
(gdb) step 1
86 in /home/mfinkle/
(gdb) step 1
87 in /home/mfinkle/
(gdb) step 1
92 in /home/mfinkle/
(gdb) step 1
nsXPTCVariant:
110 ../../.
in ../../.
(gdb) step 1
invoke_
at /home/mfinkle/
91 /home/mfinkle/
in /home/mfinkle/
(gdb) step 1
92 in /home/mfinkle/
(gdb) step 1
94 in /home/mfinkle/
(gdb) step 1
92 in /home/mfinkle/
(gdb) step 1
88 in /home/mfinkle/
(gdb) step 1
Program received signal SIGSEGV, Segmentation fault.
0xbe809b70 in ?? ()
0xbe809b70: andeq r0, r0, r0
Looks like the if(s->IsPtrData()) test is true and we attempt to continue. I should have been checking locals after each step. More coming.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #48 |
Mike - One thing I overlooked: This appears to be broken only on mozilla-192, but not on mozilla-central. So it should be OK to checkin the patches you have and we can continue to work on getting this to mozilla-192.
Jacob & Brian - Do you guys agree?
In Mozilla Bugzilla #532198, Crowderbt (crowderbt) wrote : | #49 |
I would personally prefer to understand the 192 broken-ness before landing it anywhere.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #50 |
(In reply to comment #44)
> I would personally prefer to understand the 192 broken-ness before landing it
> anywhere.
Especially considering that really, there should be no difference between both.
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #51 |
(In reply to comment #44)
> I would personally prefer to understand the 192 broken-ness before landing it
> anywhere.
I certainly agree! The brokenness is probably still there in mozilla-central, but just not exercised. It might be fine, but until we understand what's happening, I don't trust it.
In Mozilla Bugzilla #532198, Vladimir Vukicevic (vvuk) wrote : | #52 |
mfinkle, is this crashing on any/the first xptcall invocation? Or just a specific one? Stepping through it an instruction at a time and printing registers would be helpful; a gdb command that does 'si ; x/i $pc ; info reg' would probably be helpful, instead of 'step 1' above.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #53 |
(In reply to comment #47)
> mfinkle, is this crashing on any/the first xptcall invocation? Or just a
> specific one?
Happens on the first invocation
> Stepping through it an instruction at a time and printing
> registers would be helpful; a gdb command that does 'si ; x/i $pc ; info reg'
> would probably be helpful, instead of 'step 1' above.
I am collecting the data now. Will attach a dump of the session.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #54 |
Created an attachment (id=438574)
gdb session from N900
Here is the gdb session leading up to the crash - using the "si ; x/i $pc ; info reg" commands
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #55 |
Created an attachment (id=438688)
Patch v3
D'oh, the problem was so obvious and so problematic that I wonder how it could work without crashes on my builds, or with Mark's m-c builds O_o
The problem was that the stack buffer was too small in the paramCount == 1 case, and invoke_
Mark, can you check this patch ? I'll also try it on my end.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #56 |
Created an attachment (id=438697)
Patch v3.1
This version actually builds. I'm still waiting for the build to fully finish to actually test.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #57 |
Created an attachment (id=438700)
Patch v3.2
With a brown paper bag fix, this one seems to work properly. More testing undergoing.
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #58 |
(From update of attachment 438700)
I verified this one works for me. I think it will work for Mark, too.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #59 |
(From update of attachment 438700)
This patch works for me! Nice job Mike.
Running Fennec on my N900. It launches fine and I am browsing pages and running sunspider.
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #60 |
I tested Ts on my own Fennec build with and without the patch:
(averaged over 10 runs)
Ts without: 5912ms
Ts with: 5787ms
Nice 14% improvement
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #61 |
Problem in Google Spreadsheet formula. Correct percentage: 2%
Not as good, but still positive!
In Mozilla Bugzilla #532198, Jacob Bramley (jacob-bramley) wrote : | #62 |
(From update of attachment 438700)
Looks good to me. Nice job!
In Mozilla Bugzilla #532198, Vladimir Vukicevic (vvuk) wrote : | #63 |
(From update of attachment 438700)
Looks good to me! Note that I'm not an xpcom peer -- bsmedberg is though, so asking him just to confirm this patch. Code looks fine to me, and is much cleaner. Thanks!
In Mozilla Bugzilla #532198, Benjamin Smedberg (Mozilla) [:bs] (benjamin-smedbergs) wrote : | #64 |
(From update of attachment 438700)
I'll delegate to people who know!
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #65 |
In Mozilla Bugzilla #532198, Mh+mozilla (mh+mozilla) wrote : | #66 |
*** Bug 530087 has been marked as a duplicate of this bug. ***
Changed in firefox: | |
status: | Confirmed → Fix Released |
Changed in firefox: | |
milestone: | none → 3.7 |
Dave Martin (dave-martin-arm) wrote : | #67 |
Note that the upstream fix is a cleanup which replaces the assembler code with C(++) veneers. This is nice for the future, but the alternate fix already present in firefox 3.6 for lucid should be adequate for now.
Alexander Sack (asac) wrote : Re: [Bug 488354] Re: NS_InvokeByIndex in xptcinvoke_arm.cpp is not Thumb-2 safe for Lucid | #68 |
On Wed, Apr 21, 2010 at 09:13:09AM -0000, Dave Martin wrote:
> Note that the upstream fix is a cleanup which replaces the assembler
> code with C(++) veneers. This is nice for the future, but the alternate
> fix already present in firefox 3.6 for lucid should be adequate for now.
Thanks for your evaluation. I agree that we shouldnt re-backport this.
- Alexander
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #69 |
(From update of attachment 438700)
Code is baking on trunk and would be helpful on 192 for Fennec 1.1
In Mozilla Bugzilla #532198, Mark-finkle (mark-finkle) wrote : | #70 |
(From update of attachment 438700)
dropping for fennec 1.1
Changed in firefox: | |
importance: | Unknown → Medium |
In Mozilla Bugzilla #532198, Timeless-bemail (timeless-bemail) wrote : | #71 |
*** Bug 532194 has been marked as a duplicate of this bug. ***
Looks like a patch was already done for the ARM Linux Internet Platform (see http:// linux.onarm. com/bugzilla/ show_bug? id=36)
patch: http:// linux.onarm. com/bugzilla/ attachment. cgi?id= 5
I'm currently trying to do a build with this patch. /bugs.launchpad .net/ubuntu/ +bug/488302 will get fixed.
Generally, I'm also assuming that https:/