[needs-packaging] libjpeg-turbo

Bug #612377 reported by Yves Glodt
208
This bug affects 37 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Fix Released
Medium
Debian
Fix Released
Unknown
Ubuntu
Fix Released
Wishlist
Unassigned
Nominated for Maverick by Peyotll

Bug Description

libjpeg-turbo seems to greatly improve speed compared to libjpeg.

Read more here:
http://libjpeg-turbo.virtualgl.org/About/Performance
http://fedoraproject.org/wiki/Features/libjpeg-turbo

Maybe it would be good to package it, or at least offer it in a ppa.

Revision history for this message
In , Atzaus (atzaus) wrote :

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100611 Firefox/3.6.4
Build Identifier:

To quote the URL
"libjpeg-turbo is a high-speed version of libjpeg for x86 and x86-64 processors which uses SIMD instructions (MMX, SSE2, etc.) to accelerate baseline JPEG compression and decompression. libjpeg-turbo is generally 2-4x as fast as the unmodified version of libjpeg, all else being equal.

libjpeg-turbo was originally based on libjpeg/SIMD by Miyasaka Masaru, but the TigerVNC and VirtualGL projects made numerous enhancements to the codec, including improved support for Mac OS X, 64-bit support, support for 32-bit and big endian pixel formats (RGBA, ABGR, etc.), accelerated Huffman encoding/decoding, and various bug fixes. The goal was to produce a fully open source codec that could replace the partially closed source TurboJPEG/IPP codec used by VirtualGL and TurboVNC. libjpeg-turbo generally achieves 80-120% of the performance of TurboJPEG/IPP. It is faster in some areas but slower in others."

Fedora already working on incorporating it in their distribution.
http://fedoraproject.org/wiki/Features/libjpeg-turbo

License is wxWindows license for some parts and LGPL for the rest.

This would close a lot of bugs regarding libjpeg/SSE Optimization/64-bit Porting

Reproducible: Always

Revision history for this message
In , Chris Double (doublec) wrote :

You'll probably need to consult with <email address hidden> about the license. See guidelines here: http://www.mozilla.org/MPL/license-policy.html

When looking at third party libraries for the video decoding and YCbCr conversion LGPL was an issue.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

Using libjpeg-simd is probably an easier option here than libjpeg-turbo license wise.

Revision history for this message
In , Atzaus (atzaus) wrote :

According to
http://libjpeg-turbo.svn.sourceforge.net/viewvc/libjpeg-turbo/trunk/README-turbo.txt

"Some of the optimizations to the Huffman encoder/decoder were borrowed from
VirtualGL, and thus the libjpeg-turbo distribution, as a whole, falls under the
wxWindows Library Licence, Version 3.1. A copy of this license can be found in
this directory under LICENSE.txt. The wxWindows Library License is based on
the LGPL but includes provisions which allow the Library to be statically
linked into proprietary libraries and applications without requiring the
resulting binaries to be distributed under the terms of the LGPL.

The rest of the source code, apart from these modifications, falls under a less
restrictive, BSD-style license (see README.)"

So it seems that wxWindows license is a more liberal version of LGPL and the rest is BSD (I was mistaken in the initial posting)

Revision history for this message
In , Atzaus (atzaus) wrote :

(In reply to comment #2)
> Using libjpeg-simd is probably an easier option here than libjpeg-turbo license
> wise.

Sorry for the bug spam

Regarding libjpeg-simd

From the Google translation of http://cetus.sakura.ne.jp/softlab/jpeg-x86simd/jpegsimd.html

"In addition, AMD64 (EM64T/Intel64) in 64bit mode environment (currently) not supported."

"AMD64 compatible version is under construction...(October 26, 2006: The production process is currently stopped. Seems slow release schedule. Sorry.)"

Jpeg-simd would have to be ported to 64-bit which from what I have read is a cumbersome task.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #4)
> Jpeg-simd would have to be ported to 64-bit which from what I have read is a
> cumbersome task.

From the diffs between the 64-bit version and the 32-bit version it seems that the process would be mostly mechanical. In fact, it should be possible to use the same source for 64-bit as 32-bit like libvpx does. Further, the 64-bit versions in libjpeg-turbo should be under the same license as jpeg-simd so we should be able to just use them directly.

Revision history for this message
In , Dcommander (dcommander) wrote :

Hi, all. I'm the maintainer and one of the creators of libjpeg-turbo, so I thought I'd throw in some comments. For starters, libjpeg-turbo is not licensed under pure LGPL. It uses the wxWindows Library License, which eliminates some of the viral provisions of the LGPL. My understanding of the differences between the two is spelled out here:

http://www.virtualgl.org/About/License

I have confirmed with the wxWindows developers who created the license that this matches their understanding as well. Essentially, it is our understanding that wxWindows nullifies Sections 6-7 of LGPL v2.1, which means you can statically link the Library with any application without requiring the application as a whole to fall under the provisions of the LGPL. You are still bound by other sections of the LGPL, which means you would still be required to distribute source for the Library along with the application.

That being said, however, I can understand the desire for a fully unencumbered version of the source. The only part of the code which is licensed under wxWindows is the Huffman encoding/decoding optimizations (portions of jchuff.c and jdhuff.c.) These were developed while I was a Sun employee, so unfortunately I don't have the right to dual license them, or else I certainly would do so.

What I could do, however, is develop a mechanism whereby you could easily build the code without the Huffman optimizations ('configure --without-huffman-opt' or something like that.) This would simply involve building libjpeg-turbo with the "stock" versions of jchuff.c and jdhuff.c from libjpeg-6b. I could even go one step farther and make it so that the wxWindows-licensed files are not included in the distribution tarball if you 'configure --without-huffman-opt' and then 'make dist'.

This would slow down the overall performance by 20-25%, but it will still be miles faster than regular libjpeg on x86/x86-64 systems.

Revision history for this message
In , Gervase Markham (gerv-mozilla) wrote :

What DRC suggests seems like the smart thing to do, to begin with. Let's take the parts which are unambiguously BSD and still provide a significant win.

We currently have a policy of only taking code which is compatible with all three of our licenses. We'd have to do some analysis on the wxWindows licence to see whether it fits our policy. (Please, people, stop rolling your own licences! :-) If someone could open another bug for that, that would be great.

But if it's not compatible in that way, I think it's highly unlikely we'd want to introduce a new licence into the Mozilla mix just for a 25% win in JPEG decoding (particularly if we'd already recently had a big win over what we'd been using for years).

Gerv

Revision history for this message
In , Dcommander (dcommander) wrote :

wxWindows is actually a very mature license at this point-- it just isn't well known. It was created to solve a specific problem -- allowing static linking of an open source library with proprietary code without forcing the proprietary application to fall under the terms of the LGPL, but also guaranteeing the open nature of the code (which BSD-style licenses do not do.) OpenSceneGraph uses it, and that's where I learned about it originally.

Let me know if there's anything I can do to facilitate this on my end. If you're just going to merge the code into your build tree, then you can just as easily replace jchuff.c and jdhuff.c with their original versions when you do that, but I'm still willing to add a configure switch to libjpeg-turbo as well.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #8)
> Let me know if there's anything I can do to facilitate this on my end. If
> you're just going to merge the code into your build tree, then you can just as
> easily replace jchuff.c and jdhuff.c with their original versions when you do
> that, but I'm still willing to add a configure switch to libjpeg-turbo as well.

This seems like it might be best option.

Revision history for this message
In , Pavlov (pavlov) wrote :

Has any work been done on ARM or other platforms?

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #10)
> Has any work been done on ARM or other platforms?

One thing I've been meaning to do is switch mobile to use the IFAST dct instead of the ISLOW dct. Android uses IFAST and in fact have an ARM implementation of the IFAST dct. Unfortunately, it is licensed with the apache license. Perhaps we can convince Google to relicense it?

Revision history for this message
In , M-kato (m-kato) wrote :

(In reply to comment #10)
> Has any work been done on ARM or other platforms?

We should write NEON SIMD code if we change to libjpeg-turbo.

(In reply to comment #11)
> (In reply to comment #10)
> > Has any work been done on ARM or other platforms?
>
> One thing I've been meaning to do is switch mobile to use the IFAST dct instead
> of the ISLOW dct. Android uses IFAST and in fact have an ARM implementation of
> the IFAST dct. Unfortunately, it is licensed with the apache license. Perhaps
> we can convince Google to relicense it?

Past year, Michael Moy tried to change to IFAST on some platforms by bug 416157. But many test cases became failed. If changing to IFAST, we have to modify many test cases, too.

Revision history for this message
In , Gervase Markham (gerv-mozilla) wrote :

DRC: the problem is not just incompatibility, but simplicity. A library under the straight LGPL would be compatible with the MPL, the LGPL and the GPL (our three licences) but would increase the licensing complexity of our codebase, something we have historically tried to avoid.

Gerv

Revision history for this message
In , Dcommander (dcommander) wrote :

So, just so I understand, inbound code to the Mozilla project has to either be licensed under the tri-license or something less restrictive that can be re-licensed under the tri-license? So if it were possible to re-license the offending code in libjpeg-turbo under your tri-license, then that would solve the problem?

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

Our goal is that people should be able to redistribute all the code that goes into our products under the terms of any one of the MPL, LGPL, or GPL. So:
-- code that is licensed with a license compatible with all of those (e.g., BSD) is fine
-- code that is licensed with multiple license options that cover the three above licenses is also fine (e.g., dual licensed MPL/LGPL, since the LGPL option is compatible with the GPL as well)

Note that we don't relicense other people's code. E.g., when we import BSD-licensed code into our tree, it keeps the original license.

Revision history for this message
In , Dcommander (dcommander) wrote :

Unless I'm missing something fundamental, I think wxWindows-licensed code could be legally redistributed under all three licenses. However, I fully understand the desire not to introduce new licenses into your code tree.

One question-- do you currently copy the libjpeg code into the Mozilla source or do you build it separately?

Revision history for this message
In , Reed Loden (reed) wrote :

(In reply to comment #16)
> One question-- do you currently copy the libjpeg code into the Mozilla source
> or do you build it separately?

It's in our tree, but it's not by any means the original libjpeg, as it's been patched to pieces.

http://mxr.mozilla.org/mozilla-central/source/jpeg/

Revision history for this message
In , Dcommander (dcommander) wrote :

OK, then it seems as if merging in the "turbo" features (minus the Huffman coding) into your existing code tree would be the way to go.

Revision history for this message
In , M-kato (m-kato) wrote :

(In reply to comment #13)
> DRC: the problem is not just incompatibility, but simplicity. A library under
> the straight LGPL would be compatible with the MPL, the LGPL and the GPL (our
> three licences) but would increase the licensing complexity of our codebase,
> something we have historically tried to avoid.

Original license of SIMD code is http://cetus.sakura.ne.jp/softlab/jpeg-x86simd/jpegsimd.html#conditions

In Japanese, "この SIMD 拡張版 IJG JPEG software の使用条件については、オリジナル版の IJG JPEG software の使用条件が適用されます。"

In English by Google Translate, "This SIMD extension for IJG JPEG software terms of use, the original version of the IJG JPEG software is subject to terms of use."

So, original author who is MIYASAKA-san says that this code is same license as libjpeg. I believe that license is no problem if we port MIYASAKA-san's code to our tree, not using libjpeg-turbo.

Gerv, how about?

Revision history for this message
In , Dcommander (dcommander) wrote :

Why does the idea of using the old libjpeg/SIMD code keep coming up in this discussion? This has already been covered multiple times-- libjpeg/SIMD does not have 64-bit support or many of the other bug fixes we made to libjpeg-turbo.

libjpeg-turbo also bears the same license as libjpeg, if you don't use our accelerated version of jchuff.c and jdhuff.c.

In short-- merge in libjpeg-turbo, minus jchuff.c and jdhuff.c, and there is no licensing problem.

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

There seems to be some confusion around what should happen here... the right thing, I think, is:

1. Look at our in-tree libjpeg and how it differs from stock libjpeg.

2. Based on that, either merge the delta from libjpeg -> libjpeg-turbo into our tree, OR merge delta of stock libjpeg and in-tree libjpeg into libjpeg-turbo, and then pull that in.

(minus j[cd]huff.c, of course).

I don't really know how extensive the changes in #1 are, and I don't know what, if any, conflicts there would be between the changes for -turbo and our tree version of libjpeg. But I think it's worth it to find out; the perf wins wouldn't hurt as we try to do decode on draw, and it sounds like there are some pretty significant ones here.

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

Forgot to ask -- who wants to make a patch? :-)

Revision history for this message
In , Dcommander (dcommander) wrote :

After a cursory look at the code, it seems like libjpeg-turbo may want to borrow Mozilla's enhancements, where applicable. In particular, the use of static tables in choice areas like RGB->YUV conversion is likely to improve our performance as well.

Also, there is some of the code which is going to conflict, because your version already has some SIMD acceleration. I think that most of it could be replaced with ours, but the areas in which you use SSE2 would probably need to be benchmarked against our version to see which is faster (integer IDCT, for instance.)

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #23)
> After a cursory look at the code, it seems like libjpeg-turbo may want to
> borrow Mozilla's enhancements, where applicable. In particular, the use of
> static tables in choice areas like RGB->YUV conversion is likely to improve our
> performance as well.
>
> Also, there is some of the code which is going to conflict, because your
> version already has some SIMD acceleration. I think that most of it could be
> replaced with ours, but the areas in which you use SSE2 would probably need to
> be benchmarked against our version to see which is faster (integer IDCT, for
> instance.)

Our integer SSE2 idct is buggy (bug 477728), so while it would be nice to see how the performance compares, I think I would still switch to the libjpeg-turbo one even if ours is faster.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #22)
> Forgot to ask -- who wants to make a patch? :-)

I'll try.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #6)
> What I could do, however, is develop a mechanism whereby you could easily build
> the code without the Huffman optimizations ('configure --without-huffman-opt'
> or something like that.)

My guess is that we'll not want to include j{c,d}huff.c in our source tree at all, if they're encumbered with an incompatible license. Is this right?

Revision history for this message
In , Gervase Markham (gerv-mozilla) wrote :

jlebar: that seems wisest to me.

Gerv

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

To the build team: Are we willing to add a dependency on nasm so we can get this library? If I understand correctly, we'd need nasm on both Windows and *nix.

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

Can you get it building with YASM? YASM is already a build requirement everywhere except Win32. It claims to support nasm-compatible syntax.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I think yasm will work.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I have something which is building and running (with yasm) on x64. I'd love to run some performance tests so we can see whether this is worth investing in further.

I figure I should try and run tprender tests, as in

  dist/bin/firefox -tprender -tp manifest -tpformat text

but this runs the test and produces no output.

Can anyone suggest a way to make this work or an alternative benchmarking methodology?

Revision history for this message
In , Ryanvm (ryanvm) wrote :

Created attachment 458811
JPEG loading test

I used this to test some of mmoy's JPEG patches back in the day. It's got a bit of scatter, but I found that it was reasonably consistent if I did 10-20 runs (unfortunately, manually on each one). The images you see referenced in the file were just a bunch of ~3MB pictures from a digital camera.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #31)
> I have something which is building and running (with yasm) on x64. I'd love to
> run some performance tests so we can see whether this is worth investing in
> further.
>
> I figure I should try and run tprender tests, as in
>
> dist/bin/firefox -tprender -tp manifest -tpformat text

I don't know what tprender does so I'm not sure it's useful. If you post a patch I can get some tp4 (page load benchmark) numbers to compare against our current baseline.

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

Trender isn't going to help because that's just repaint speed -- so will be after jpeg decode has already been done. I think you'll have a hard time coming up with an in-browser benchmark, though decoding a number of (very) large images is probably the best you can do..

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Okay. I filed adding timers for the image decoder as bug 580518.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Some preliminary performance results, on Linux x64. We're comparing unvectorized libjpeg to libjpeg-turbo's vectorized 64-bit code. In brief, it appears that the part of decoding that was vectorized runs 2.3x as fast as the unvectorized part.

These benchmarks were run on a quad-core Xeon W3520 (2.67GHz) running Ubuntu 10.04, kernel 2.6.32-23-generic.

Methodology:

$ perf record -ig dist/bin/firefox -tp manifest
    (manifest contains the path to a local copy of [1], which weighs in at 3,888 × 2,592 pixels, 2.95 MB)
$ perf report -g graph

If you're going to try this yourself, note that you need to save the result of perf report before rebuilding. Your generated perf.data will be invalid (and unreadable) as soon as you modify the binaries.

perf only tells us how long we spend in a function relative to all the other functions in the program. To work around this, I added a spin loop to the startup sequence. The loop always does the same amount of work, so we can compare the fraction of our time spent spinning to the fraction of time spent in the JPEG decoder to determine the speedup.

Original:

    11.93% jpeg_idct_islow
     8.94% SpinBriefly()
     8.12% decode_mcu
     7.59% qcms_transform_data_rgb_out_lut_sse2
     5.71% ycc_rgb_convert
     2.37% 0x000000001d2170 (ld.so)
     2.31% jpeg_fill_bit_buffer
     1.97% __pthread_mutex_unlock_internal

libjpeg-turbo:

    10.84% SpinBriefly()
     9.49% qcms_transform_data_rgb_out_lut_sse2
     7.04% decode_mcu
     6.29% 0x00000000f90c10 (libxul.so)
     2.41% 0x000000001c8e80 (ld.so)
     2.36% nsJPEGDecoder::OutputScanlines(int*)
     2.14% __pthread_mutex_unlock_internal
     1.97% pthread_mutex_lock

I think the unidentified libxul function in the libjpeg-turbo profile is libjpeg-turbo's vectorized jpeg_idct_islow routine. If so, we can conclude that the speedup for this function is (11.93/8.94) / (6.29/10.84) = 2.30. The speedup for the JPEG decoder as a whole is of course smaller.

[1] http://commons.wikimedia.org/wiki/File:Schloss_Lichtenstein_04-2010.jpg

Revision history for this message
In , Dcommander (dcommander) wrote :

Justin,

The comparison you made would only be valid if the two tests were equal in terms of total execution time, but my read of the above is that the amount of work done was the same but the execution time for both tests was different. Is that correct?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

The assumption is that SpinBriefly() takes the same amount of time in both tests. I don't actually know that -- all I know is that it does the same work. It's the same code in both cases. But with the assumption, I think we can compare execution times by normalizing the reported percentages against the percentage reported for SpinBriefly.

To illustrate, say we knew that SpinBriefly() takes 1s. Then in the original case, we could say that 8.94% = 1s, so 11.93% = (11.93/8.94)s = 1.3s. We'd similarly compute that the 6.29% reported for libjpeg-turbo counts for (6.29/10.84)s = .58s. And then we'd conclude that our speedup is 1.3s / .58s = 2.3 as before. But of course we don't need to know how long SpinBriefly is in order to do this computation.

Revision history for this message
In , Dcommander (dcommander) wrote :

Fair enough. My other question is why the slow IDCT is the only accelerated function. We've established that the accelerated Huffman decode can't be used, but aren't there other functions that can be accelerated using libjpeg-turbo?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #39)
> My other question is why the slow IDCT is the only accelerated function.

Good question. Here's where the other functions in the profile live:

qcms_transform_data_rgb_out_lut_sse2 : gfx/qcms/qcmsint.c
decode_mcu : jpeg/jdhuff.c
nsJPEGDecoder::OutputScanlines(int*) : libpr0n/decoders/jpeg/nsJPEGDecoder.cpp

So only the slow IDCT and decode_mcu are functions in libjpeg. I was actually testing with libjpeg-turbo's jdhuff implementation here, and you can see that decode_mcu is substantially faster with libjpeg-turbo: (8.12/8.94) / (7.04/10.84) = 1.40x speedup.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

On 32-bit, yasm .8 (which is what Debian is shipping) hits an infinite loop. yasm 1.0 seems to work.

I imagine we wouldn't be willing to require yasm 1.0. But maybe we could disable JPEG SIMD if the machine's yasm isn't new enough.

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

Pretty sure we can require yasm 1.0, especially if there are easily installable packages for debian/ubuntu/fedora around (even if they're not stock provided).

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Debian unstable is still on yasm 0.8, fwiw. 1.0 was released April 8.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

*** Bug 416157 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #41)
> On 32-bit, yasm .8 (which is what Debian is shipping) hits an infinite loop.
> yasm 1.0 seems to work.

Can we work around the infinite loop?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #45)
> Can we work around the infinite loop?

I think we can. jccol{mmx,ss2}.asm includes jcclr{mmx,ss2}.asm multiple times, and I think this is what's causing the hang. We should be able to split up jccol into separate files, one for each include of jcclr.

Do the libjpeg-turbo guys have any thoughts on whether or not this would be a good idea? I'd prefer to diverge from upstream as little as possible, of course.

Revision history for this message
In , Dcommander (dcommander) wrote :

I don't have a problem with making those changes to the libjpeg-turbo source, but prototype it first on your end and make sure it fixes the problem.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I successfully got jccol{mmx,ss2} to assemble, but now yasm is hanging in jcsammmx.

I think yasm 0.8 may be a game of whack-a-mole we don't want to play.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(I should add: The reason for the hang in jcsammmx is less obvious than the other files. I tried collapsing the includes, which aren't as tricky as the jccol includes, to no avail.)

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

(In reply to comment #48)
> I think yasm 0.8 may be a game of whack-a-mole we don't want to play.

Yup. Add a dependency to yasm 1.0. Worst case we build a static binary of it and include it as part of our build setup for linux.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Macports is shipping yasm 1.0.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Just got this working on OSX 64. Seeing a speedup similar to what was observed on Linux 64:

Original:

  3.8% SpinBriefly()
  7.8% jpeg_idct_islow()

libjpeg-turbo:

  5.4% SpinBriefly()
  4.2% decoding

Speedup: (7.8/3.8) / (4.2/5.4) = 2.6.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

yasm 1.0 on OSX 10.6 64-bit seems to be emitting incorrect 32-bit code. At least, it works when I use nasm, but the yasm build decodes all jpegs as garbage.

The nasm flags I've used are:

  -fmacho -DMACHO -DPIC

and the yasm flags I've tried are:

  -fmacho -DMACHO -DPIC -rnasm -pnasm

and

  -fmacho32 -DMACHO -DPIC -rnasm -pnasm

I'm looking into this, but I have basically no idea why this is happening. Any ideas would be appreciated...

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #53)
> I'm looking into this, but I have basically no idea why this is happening. Any
> ideas would be appreciated...

Do you know what the difference is in the generated code?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #54)
> Do you know what the difference is in the generated code?

Well...here's *a* difference I've been looking at, in _jsimd_rgb_ycc_convert_sse2 (jccolss2.asm):

nasm:
  leal 0x00001bf3(%ecx,%ebp,8),%ebx

yasm:
  leal 0xffffffd3(%ecx,%ebp,8),%ebx

Here's more context:

..@12.adjust:
  pushl %ebp
  xorl %ebp,%ebp
  leal 0x00001bf3(%ecx,%ebp,8),%ebx <---
..@12.ref:
  popl %ebp
  movl %ebx,0xffffff7c(%ebp)
  movl 0x08(%eax),%ecx
  testl %ecx,%ecx
  je 0x00000412

Just the leal line is different between the two assemblers.

Here's the preprocessed source which is generating this. It's the same for both yasm and nasm.

..@26.adjust:
  push ebp
  xor ebp,ebp
  db 0x8D,0x9C <----
  jmp near const_base <----
..@26.ref:
  pop ebp
  mov dword [ebp-(8-(0))*16-4],ebx
  mov ecx, dword[(eax)+8]
  test ecx,ecx
  jz near .return

Before we try and understand how a db plus a jmp combined to form a leal, let's look at the unprocessed source, from jcclrss2.asm:

  get_GOT ebx ; get GOT address
  movpic POINTER [gotptr], ebx ; save GOT address
  mov ecx, JDIMENSION [img_width(eax)]
  test ecx,ecx
  jz near .return

get_GOT is a macro defined in jsimdext.inc. Here's the whole thing:

  ; At present, nasm doesn't seem to support PIC generation for Mach-O.
  ; The PIC support code below is a little tricky.

          SECTION SEG_CONST
  const_base:

  %define GOTOFF(got,sym) (got) + (sym) - const_base

  %imacro get_GOT 1
          ; NOTE: this macro destroys ecx resister.
          call %%geteip
          add ecx, byte (%%ref - $)
          jmp short %%adjust
  %%geteip:
          mov ecx, POINTER [esp]
          ret
  %%adjust:
          push ebp
          xor ebp,ebp ; ebp = 0
  %ifidni %1,ebx ; (%1 == ebx)
          ; db 0x8D,0x9C + jmp near const_base =
          ; lea ebx, [ecx+ebp*8+(const_base-%%ref)] ; 8D,9C,E9,(offset32)
          db 0x8D,0x9C ; 8D,9C
          jmp near const_base ; E9,(const_base-%%ref)
  %%ref:
  %else ; (%1 != ebx)
          ; db 0x8D,0x8C + jmp near const_base =
          ; lea ecx, [ecx+ebp*8+(const_base-%%ref)] ; 8D,8C,E9,(offset32)
          db 0x8D,0x8C ; 8D,8C <----
          jmp near const_base ; E9,(const_base-%%ref) <----
  %%ref: mov %1, ecx
  %endif ; (%1 == ebx)
          pop ebp
  %endmacro

It appears that we're taking the %1 == ebx branch in the macro above (compare the arguments to db).

The apparent key point here is that the db / jmp lines marked with arrows become a leal instruction. Why? I dunno. But perhaps this is what yasm is interpreting differently than nasm. I'll keep looking after lunch.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I put an e-mail into the yasm-devel list about this.

Revision history for this message
In , Chris-pearce (chris-pearce) wrote :

(In reply to comment #29)
> Can you get it building with YASM? YASM is already a build requirement
> everywhere except Win32. It claims to support nasm-compatible syntax.

YASM won't work on Win32. YASM doesn't output object files which can link with /SAFESEH, so we can't link YASM object files into xul.dll. For libvpx we worked around this by converting the YASM assembly to MASM syntax using http://mxr.mozilla.org/mozilla-central/source/media/libvpx/yasm2masm-as.sh

Ideally we'd convince the YASM developers to fix their win32 safeseh section generation, or submit a patch which fixes that ourselves upstream.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Given that the code barely works under yasm, I'm not hopeful that we'll be able to easily translate to masm, unless the syntax for macros and whatnot is very close. But it's definitely worth looking at.

For now, I'm just going to use nasm on OSX-32. I think a new-enough version of nasm ships on OSX by default. (My understanding is that the version of nasm that ships isn't new enough to build libjpeg-turbo on 64-bit, but there, yasm works fine.) It would be really nice to get yasm to work, though.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

ted, bsmedberg, cpearce: Do you have any advice as to how to proceed here wrt Windows builds? It looks like the options are:

* Fix yasm so it works with /SAFESEH or otherwise work around this issue
* Ship nasm in Windows build package (assuming that some version of nasm works on Windows)
* Translate the libjpeg-turbo code to a msam-compatible syntax
* Ship libjpeg-turbo on Linux/Mac and use our current libjpeg implementation on Windows

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

I'm actually leaning towards the first option since we're going to be distributing yasm ourselves anyway...

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Fixing yasm certainly sounds to me like the Right Thing to do. It looks like there's some support for SAFESEH [1] [2], but perhaps it's insufficient [3].

I don't have a good grasp of what the issue is even after doing some reading, so I'm not sure I'm the right person to try and hack the assembler. But if nobody has time to hack it themselves, I can try, provided someone can sit down with me and explain what's going on.

[1] http://www.tortall.net/projects/yasm/ticket/130
[2] http://www.tortall.net/projects/yasm/changeset/2032
[3] http://www.tortall.net/projects/yasm/ticket/139

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

cpearce probably has the best idea at the moment

Revision history for this message
In , Dcommander (dcommander) wrote :

In response to the second option (shipping NASM on Windows), NASM does work fine to build libjpeg-turbo on Windows, but I don't know whether it also suffers from the /SAFESEH problem. As with other platforms, you'll need NASM 2.05 or later to build the 64-bit version.

In response to the third option, you'd be responsible for maintaining any such translation, since it is of no use to the upstream project.

Revision history for this message
In , Chris-pearce (chris-pearce) wrote :

(In reply to comment #59)
> * Fix yasm so it works with /SAFESEH or otherwise work around this issue

Ideally we should fix YASM, as we need a working YASM for libvpx as well.

YASM has partial support for SafeSeh, but it doesn't go far enough. YASM ticket 139 seems to diagnose the problem. I don't know much about object formats or assemblers, so I'd need to get up to speed before I could be much help fixing YASM. Maybe an email to the YASM mailing list be helpful?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Looks like the yasm devs the OSX-32 bug in their trunk [1]. I'll spin a build and see for myself on Monday.

http://www.tortall.net/projects/yasm/changeset/2345

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

The SAFESEH issue may also have been fixed this weekend. Hooray!

http://www.tortall.net/projects/yasm/changeset/2343

Revision history for this message
In , Ted Mielczarek (ted-mielczarek) wrote :

FWIW, I agree with roc. We want to use YASM to build libvpx, so we should fix YASM and require it on Windows. bug 570477 is on file for adding it to MozillaBuild. If we can get a working version (even a nightly build or equivalent) I'd be happy to include it in MozillaBuild. I'd even spin a 1.5.1 release just for that, as long as you can provide me the binaries.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Success on Mac32! Speedup of jpeg_idct_islow is

(8.2/3.2) / (2.8/4.7) = 4.3

I'm not so good at reading Shark traces, but it looks like ycc_rgb_convert is also much faster:

  (8.3/3.2) / (1.3/4.7) = 9.3

But maybe the trace is somehow under-reporting for libjpeg-turbo. In any case, I think it's faster. :)

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I managed to get xperf running on Windows, but khuey and I can't figure out how to get it to load symbols from my build. The article at [1] seems to describe only the settings for retrieving symbols from our symbol server.

I'm going to see if I have more luck with a different profiler (jst suggested eqtime), but if anyone has suggestions for making this work with xperf, I'm all ears.

[1] https://developer.mozilla.org/en/Profiling_with_Xperf

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

jrmuizel to the rescue. I have symbols on Windows.

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

Feel free to poke me if you have problems with xperf, but for symbols, note the bits in that article about making sure you have the latest xperf and the "For a local firefox build" line -- or did that not work?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Benchmarks for Win32.

Methodology:

xperf -on latency -stackwalk profile
dist/bin/firefox -tp manifest (same manifest as before)
xperf -d out.etl

xperf outputs the raw sample counts, which is kind of nice. There's some variance in these (I've observed differences of 10-20% between runs), but it shouldn't be a big deal.

Unless I'm incorrectly counting some functions as belonging to libjpeg (please correct me if I am!), the speedup with libjpeg-turbo is 2.7x.

libjpeg-turbo:

  qcms_transform: 89 samples
  SpinBriefly: 72
  decode_mcu_slow: 50
  jsimd_idct_islow_sse2: 30
  jpeg_fill_bit_buffer: 19
  nsJPECDecoder::OutputScanlines: 15
  decompress_onepass: 12
  fetch_pixel_x8r8g8b8: 9
  jsimd_ycc_rgb_convert_sse2: 8

  JPEG:
    jsimd_idct_islow_sse2: 30
    jsimd_ycc_rgb_convert_sse2: 8
  TOTAL: 38

original:

  qcms_transform: 65 samples
  SpinBriefly: 65
  decode_mcu: 62
  ycc_rgb_convert: 47
  nsJPEGDecoder::OutputScanlines: 19
  dct_8x8_inv_16s: 18
  jpeg_fill_bit_buffer: 17
  bits_image_fetch_transformed: 16
  jpeg_idct_islow_sse2: 9
  decompress_onepass: 7
  ownpj_QuantInv_8x8_16s: 7
  ownpj_Add128_x8x_16s8u: 6
  fetch_pixel_x8r8g8b8: 6
  ...
  ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R: 2

  JPEG:
    ycc_rgb_convert: 47
    dct_8x8_inv_16s: 18
    bits_image_fetch_transformed: 16
    jpeg_idct_islow_sse2: 9
    ownpj_QuantInv_8x8_16s: 7
    ownpj_Add128_x8x_16s8u: 6
    ...
    ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R: 2
  TOTAL: 103

103/38 = 2.7

tags: added: needs-packaging
summary: - Evalutate and package libjpeg-turbo
+ [needs-packaging] libjpeg-turbo
Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463214
Patch v1 - Part 1: Removing libjpeg

The full patch weighs in at 3.8MB, which is too large for Bugzilla. I've therefore split it into three parts: Removing libjpeg, adding libjpeg-turbo, and build changes.

There's little that's interesting in this patch; just a lot of - signs.

In the interests of full disclosure: I haven't tested this particular patch on all platforms. It's working locally, and I'll of course push to try once the assembler is updated on the remote boxes.

I think we probably want configure to try to assemble the optimized code only if it finds a sufficiently new version of yasm -- otherwise, people with outdated versions are going to be confused by the hangs and incorrect code generation. I have an XXX in configure.in for this; I'm not really sure how to do it.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463217
Patch v1 - Part 2: Adding libjpeg-turbo

Unfortunately this part is still too big for bugzilla. Sorry about gzipping it.

Interesting bits here are probably just modules/libjpeg-turbo/Makefile.in and modules/libjpeg-turbo/simd/Makefile.in.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Fwiw, if we can create three self-contained patches in this order: add libjpeg-turbo, build changes, remove libjpeg and push them like that, that would be better than the order listed in comment 73, right? And may be better than pushing a single mega-patch, though it's hard to tell.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463225
Patch v1 - Part 3: Changes to the build

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

Is there a good reason to not put libjpeg-turbo on top of libjpeg? Personally I'd prefer that and it should make the patch smaller.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I thought that it was kind of weird that jpeg got its own top-level directory, which is why I moved it to modules.

But I could move jpeg to modules/libjpeg-turbo and then put the libjpeg-turbo source on top of it, if we wanted. That would probably reduce the patch size...

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #78)
> I thought that it was kind of weird that jpeg got its own top-level directory,
> which is why I moved it to modules.
>
> But I could move jpeg to modules/libjpeg-turbo and then put the libjpeg-turbo
> source on top of it, if we wanted. That would probably reduce the patch
> size...

I would suggest doing the reverse. Patch libjpeg-turbo onto of jpeg in the current location, and then we can move it to saner place in a different bug.

Revision history for this message
In , Gervase Markham (gerv-mozilla) wrote :

brendan is king of repo locations, but my understanding is that jpeg-at-top-level is historical accident, and it could easily be somewhere more sane. (It's just that when we were on CVS, moving it was a big pain.)

Gerv

Revision history for this message
In , Ted Mielczarek (ted-mielczarek) wrote :

You should get an imglib peer to r+ the add/remove bits. I can review the build system changes.

Revision history for this message
In , Bobbyholley+bmo (bobbyholley+bmo) wrote :

Have you run reftests on this patch?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I'll run them now.

Revision history for this message
In , Bobbyholley+bmo (bobbyholley+bmo) wrote :

(In reply to comment #77)
> Is there a good reason to not put libjpeg-turbo on top of libjpeg? Personally
> I'd prefer that and it should make the patch smaller.

In my opinion, it depends on whether the patch is useful or not. If the delta is reasonable enough to make it useable in the case of "oh hey, something broke when we upgraded to libjpeg-turbo", then we should patch it. But if it's not a useful diff, I think we should first remove it and then add it.

At the moment I'd probably prefer to put this in modules/libimg/jpeg, even if we eventually want to get rid of libimg (so that at least in the mean time, everything's in the same place). Joe might disagree though.

Given that this review is going to be more or less a rubber stamp, I'd like to at least know that it passes all our tests on all platforms first. Specifically, I'm concerned about our jpeg reftests, little-endian architectures like ARM, and architectures where we're not running the asm you're testing with. As such, I'm going to cancel review until we've got all that sorted out.

How much code changed? Do we need a security audit for this? (Not like the one we got for LCMS did us any good...grumble grumble). Dveditz, can you weigh in?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #84)
> Given that this review is going to be more or less a rubber stamp, I'd like to
> at least know that it passes all our tests on all platforms first.

I think that's fair. Releng is working on getting the requisite yasm version out to the build servers, and once it's out there, I'll push to try and see what happens.

Revision history for this message
In , Benjamin Smedberg (Mozilla) [:bs] (benjamin-smedbergs) wrote :

You mean for tryservers? I don't think we can push this to mozilla-central until there's a mozillabuild with the version of YASM we want people to use.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463272
Patch v1 - Patching libjpeg-turbo on top of libjpeg

To the question of how much has changed, the answer is: A lot, but not everything. :)

The vast majority of changes are in the simd folder, which contains 22000 lines of assembly. We're of course not using all of that.

We could slim down the libjpeg-turbo patch a bit by removing files we don't need, but I'm not sure it's worth messing with.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #86)
> You mean for tryservers? I don't think we can push this to mozilla-central
> until there's a mozillabuild with the version of YASM we want people to use.

Yes, I think we're blocking on bug 570477.

Revision history for this message
In , Bobbyholley+bmo (bobbyholley+bmo) wrote :

ok, I think we separate remove/add patches. a 2.4 meg jpeg library is bigger than we'd like it to be (the current one is 1.5 megs), but not quite worth making it harder to apply upstream patches by yanking things out.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #87)
> Created attachment 463272 [details]
> Patch v1 - Patching libjpeg-turbo on top of libjpeg
>
> To the question of how much has changed, the answer is: A lot, but not
> everything. :)

The diff doesn't look too unwieldy. I can review this if you like.

Revision history for this message
In , Bobbyholley+bmo (bobbyholley+bmo) wrote :

(In reply to comment #90)

> The diff doesn't look too unwieldy. I can review this if you like.

If you care enough that you're willing to give it a non-rubber-stamp review, we can do it your way. ;-)

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #91)
> (In reply to comment #90)
>
> > The diff doesn't look too unwieldy. I can review this if you like.
>
> If you care enough that you're willing to give it a non-rubber-stamp review, we
> can do it your way. ;-)

I think we should actually require it. libjpeg-turbo hasn't had nearly the same security exposure as libjpeg so we should at least go through to look for areas of increased risk.

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

My understanding is that we don't want to add new stuff to modules/.

For video/audio we created a new toplevel directory media/ to contain third-party libraries (media/libogg, media/libvpx, etc). For gfx we've put third-party libraries under gfx/ (gfx/cairo, gfx/harfbuzz, gfx/ycbcr, etc). I think this policy is working well.

So I suggest creating a new toplevel directory for image codecs, say images/. Move libpr0n and libpng there as well.

Revision history for this message
In , Vladimir Vukicevic (vvuk) wrote :

This is a bit bikeshed-dy, but the fewer toplevel dirs we have the better; jpeg, png, etc. would fit in fine under 'media', I think.

Revision history for this message
In , Bobbyholley+bmo (bobbyholley+bmo) wrote :

(In reply to comment #93)
> My understanding is that we don't want to add new stuff to modules/.
>
> For video/audio we created a new toplevel directory media/ to contain
> third-party libraries (media/libogg, media/libvpx, etc). For gfx we've put
> third-party libraries under gfx/ (gfx/cairo, gfx/harfbuzz, gfx/ycbcr, etc). I
> think this policy is working well.
>
> So I suggest creating a new toplevel directory for image codecs, say images/.
> Move libpr0n and libpng there as well.

I think we're all on the same page here, but I don't think it's fair to make this patch implement that change. We've already got libpng in modules/libimg, and so I think it makes sense to keep libjpeg with libpng until we figure out the right place for both of them. This is something I plan on getting to "soonish" (ie, after the branch).

Revision history for this message
In , Pavlov (pavlov) wrote :

agreed on media/ for image decoders

Revision history for this message
In , Olli-pettay (olli-pettay) wrote :

Don't move existing code before hg (including hg web) can handle it properly, please.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

I've filled bug 584894 for moving the jpeg code.

Revision history for this message
In , Roc-ocallahan (roc-ocallahan) wrote :

(In reply to comment #96)
> agreed on media/ for image decoders

OK. This also might turn out to be handy if we ever start using a video codec for decoding static images as well (I've heard claims that VP8 can compress better than JPEG for example.)

(In reply to comment #95)
> I think we're all on the same page here, but I don't think it's fair to make
> this patch implement that change.

Sure.

(In reply to comment #97)
> Don't move existing code before hg (including hg web) can handle it properly,
> please.

Do you know if that's being worked on?

Revision history for this message
In , Olli-pettay (olli-pettay) wrote :

> (In reply to comment #97)
> > Don't move existing code before hg (including hg web) can handle it properly,
> > please.
>
> Do you know if that's being worked on?
http://mercurial.selenic.com/bts/issue1576 has been open for awhile.

Revision history for this message
Brian Murray (brian-murray) wrote :

*** This is an automated message ***

This bug is tagged needs-packaging which identifies it as a request for a new package in Ubuntu. As a part of the managing needs-packaging bug reports specification, https://wiki.ubuntu.com/QATeam/Specs/NeedsPackagingBugs, all needs-packaging bug reports have Wishlist importance. Subsequently, I'm setting this bug's status to Wishlist.

Changed in ubuntu:
importance: Undecided → Wishlist
Revision history for this message
In , Brendan-mozilla (brendan-mozilla) wrote :

(In reply to comment #93)
> So I suggest creating a new toplevel directory for image codecs, say images/.
> Move libpr0n and libpng there as well.

Sounds good to me (in case anyone cares :-P).

/be

Revision history for this message
In , Brendan-mozilla (brendan-mozilla) wrote :

Or under media -- comment 99 first para reply makes a good point.

Cc'ing djc about the Hg issue.

/be

Revision history for this message
In , Joe-drew (joe-drew) wrote :

I'd prefer we put libjpeg-turbo in jpeg/ initially, and then move it later.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

We're failing some of the jpeg reftests tests on Linux.

In all but one case, I can't distinguish the libjpeg-turbo image from the reference. The one exception is jpg-cmyk-2.jpg, where the libjpeg-turbo image looks a bit sharper than the reference. I can put the rendered image up here if anyone is curious.

How should we deal with these reftest failures? Do we just declare libjpeg-turbo to be the new gold standard of jepg decoding and change the references, or do we investigate what's causing the differences?

Revision history for this message
In , Joe-drew (joe-drew) wrote :

We definitely need to investigate what is causing these reftest failures. If there are bugs, they are likely to be very subtle, and so I'm afraid it'll be some hard slogging.

Start by finding out the magnitude of the differences. (1 pixel value - say r=125 to r=126 - can often be ignored as rounding differences, though not without determining where those rounding differences came from.) Use a screen magnifier that lets you view the pixel values under the cursor, or use something like Gimp that lets you sample any pixel on the screen.

Also, the reftest analyzer has code that'll circle the differences between two images. That might be helpful!

Finally, this doesn't need to block. I'll take a patch (on a relatively short leash), but I'd release without libjpeg-turbo.

Revision history for this message
In , Dirkjan Ochtman (dirkjan-ochtman) wrote :

smaug, bsmedberg: issue1576 is the only remaining issue for moving in hg, right? I'll try to conjure up some round tuits to get that into 1.6.3.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Comment on attachment 463225
Patch v1 - Part 3: Changes to the build

Canceling build review, since I need to at least move some things around.

I'll still need some help figuring out how to get configure to reject yasm binaries older than whatever version we require (likely to be 1.0.2).

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463675
Comparison of libjpeg-turbo (left) and libjpeg rendering jpg-size-5x5.jpg

Interesting. There's actually a substantial difference in how libjpeg-turbo and libjpeg render this test image.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463677
Patch v2 - libjpeg-turbo added on top of libjpeg in /jpeg directory

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463692
Method dispatch fix

Ah. The rendering difference is due to what appears to be a typo in libjpeg-turbo.

Updated patch now passes the jpeg reftests locally. I'll roll up a patch with this fix and submit this patch upstream in a sec.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 463695
Patch v2.1 (v2 plus method dispatch fix)

Revision history for this message
In , Nomis101 (nomis101) wrote :

I've tried to build FF/TB with this patch, but it failed because of a missing yasm (yasm: no such file or directory). After installing yasm via MacPorts it now builds just fine. So, does this mean, with this patch, yasm gets a new dependency to build mozilla applications? Or will yasm be implemented into the mozilla source?

On Mac OS X JPEGs are OK with this patch (quick visual test) if you build FF in x86_64. But if I build FF/TB in i386, than all JPEGs are misrendered.
http://img291.imageshack.us/img291/3319/i386q.jpg

I've applied patch v2.1 and build with the 10.6 SDK.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #112)
> After installing yasm via MacPorts it now builds just fine. So, does this mean, with this patch, yasm gets a new dependency to build mozilla applications?

Please see comment #50.

> On Mac OS X JPEGs are OK with this patch (quick visual test) if you build FF in x86_64. But if I build FF/TB in i386, than all JPEGs are misrendered.
> http://img291.imageshack.us/img291/3319/i386q.jpg

See comment #65.

Revision history for this message
In , Nomis101 (nomis101) wrote :

Oh, seems I overlooked that...

> > On Mac OS X JPEGs are OK with this patch (quick visual test) if you build FF in x86_64. But if I build FF/TB in i386, than all JPEGs are misrendered.
> > http://img291.imageshack.us/img291/3319/i386q.jpg
>
> See comment #65.
Thanks, this fixed it. :-)

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

*** Bug 242145 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

*** Bug 244978 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Right now you can build FF to use the system libjpeg. I wonder if we'll want to disable this with this patch. libjpeg-turbo has a number of features we can use to make other code faster (see e.g. bug 584652, maybe bug 571739). We could write code to conditionally call into libjpeg-turbo only if it's enabled, but since we won't be testing with libjpeg, it's probably safer just to assume we have libjpeg-turbo.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 464561
Patch v1.3

Updated to tip of libjpeg-turbo svn, which includes the method dispatch fix.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Comment on attachment 464561
Patch v1.3

Tagging jrmuizel for review. Jeff, how do you think we should proceed here?

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

Comment on attachment 464561
Patch v1.3

I'm probably not going to have time to get to this review that quickly as this is lower priority than some of the other work I'm doing. One comment I do have so far is that you should fix up the licensing documentation to match the code that we're including. i.e. the LGPL and WXWindows licenses should not be needed.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I can remove LICENSE.txt and LGPL.txt. The only other place I see wxWindows and (L)GPL referenced is in README-turbo, but I'm not sure it's worth patching README-turbo to remove the references to those licenses.

Do you agree?

Revision history for this message
In , Dcommander (dcommander) wrote :

I should really clarify the language in README-turbo anyhow. What I mean that file to say is that the overall distribution of LJT is covered by wxWindows if the distribution uses the accelerated Huffman codec but not otherwise.

Revision history for this message
Götz Christ (g-christ) wrote :

Fedora 14 will use libjpeg-turbo by default.

From http://fedoraproject.org/wiki/Features/libjpeg-turbo
"libjpeg-turbo is at least twice faster in JPEG compression/decompression than original libjpeg on platforms with MMX/SSE instruction set. It has same API/ABI like original libjpeg and also runs on non-SSE platforms where is around 25% faster."

"libjpeg-turbo has same API/ABI as libjpeg so no package needs to be rebuilt "

This should be considered for Maverick.

Revision history for this message
In , Fxtech (fxtech) wrote :

Todays win32 build works !

http://<email address hidden>/tryserver-win32/

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

(In reply to comment #123)
> Todays win32 build works !
>
> http://<email address hidden>/tryserver-win32/

That tryserver build doesn't include libjpeg-turbo. See http://hg.mozilla.org/try/log?rev=1bb8d265ded9

Revision history for this message
In , Fxtech (fxtech) wrote :

ops.. ..that's the reason becouse it work

Revision history for this message
In , Djcater+bugzilla (djcater+bugzilla) wrote :

Now that bug 563088 has landed, I think this bug is more important. Changing back to a tab that hasn't been active for a while, I can visibly see images being re-decoded. If the speed gains here really are significant then this will help.

Revision history for this message
spaetz (spaetz) wrote :

libjpeg-turbo is a drop-in replacement for libjpeg, so +1

Revision history for this message
In , Bobbyholley+bmo (bobbyholley+bmo) wrote :

jlebar - can you throw together a try build with libjpeg-turbo based onto trunk? We'd like to see how noticeable the post-discarding redecode speedup is tabbing back to:

http://flavors.me/johnath
and
http://demos.hacks.mozilla.org/openweb/CSSMAKESUSICK/

after waiting a period longer than image.mem.discarding_timeout_ms * 2 milliseconds.

If it's significant enough, we may re-evaluate the blocking status of libjpeg-turbo.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I just pushed to try, but then I remembered that the build machines are still running the old, broken version of yasm. I believe we'll get working Linux and Mac 64-bit builds, and maybe we'll get a working Linux 32-bit build.

It might just be easier for you to pull the patch, install yasm 1.1, and build it yourself.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

My push to try is yielding debug, not opt, builds for some reason. I'll try and get this sorted out...

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Builds are up: http://<email address hidden>/

The Mac 32-bit build is busted -- it'll run but decode images as garbage. The Linux 32- and 64-bit and the Mac 64-bit builds should all work fine.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 471713
Patch v1.3.1

Fixed typo in configure.in.

Revision history for this message
In , Nomis101 (nomis101) wrote :

Nice patch, this also speeds up displaying jpeg-picture attachments in Thunderbird, especially for small files. I now see them immediately. For bigger attachments its nearly the same. (I've ported the config changes from this patch to c-c for my build)

Revision history for this message
In , Swsnyder (swsnyder) wrote :

FYI, v1.0.1 was released on September 9th.

http://sourceforge.net/projects/libjpeg-turbo/files/

Revision history for this message
ghepeu (ghepeu) wrote :

I second the request.

I'm usually very skeptic about purported performance improvements, but I did my tests on a couple of machines and I think that given the ubiquity of JPEG pictures the upgrade to libjpeg-turbo could be one of the most important user-visible speed improvements in the last few releases of Ubuntu.

I compiled libjpeg-turbo 1.0.1 and selected with LD_LIBRARY_PATH the new libjpeg.so.62.0.0 instead of the system library provided by Ubuntu 10.04.

On a Athlon 4850e (dual core 64bit cpu, 32 bit userspace) working at 2.5 Ghz, converting a not-so-large RPG PPM image (7200x4500) to and from JPEG (average of 5 runs):

time convert image.ppm image.jpg

libjpeg-classic real 2.709s user 2.364s sys 0.318s
libjpeg-turbo real 1.646s user 1.301s sys 0.317s

time convert image.jpg image.ppm

libjpeg-classic real 2.130s user 1.703s sys 0.359s
libjpeg-turbo real 1.292s user 0.872s sys 0.356s

On a Acer Aspire One 532h netbook with a Atom N450 cpu (64 bit userspace) working at 1.67 Ghz, the same test gives:

time convert image.ppm image.jpg

libjpeg-classic real 5.763s user 5.996s sys 0.770s
libjpeg-turbo real 2.612s user 2.900s sys 0.720s

time convert image.jpg image.ppm

libjpeg-classic real 4.588s user 3.816s sys 0.858s
libjpeg-turbo real 2.602s user 1.692s sys 0.868s

Occasionally I use the netbook to stream video from the integrated webcam to another pc with this gstreamer pipeline:

gst-launch-0.10 v4l2src ! video/x-raw-yuv,width=640,height=480 ! ffmpegcolorspace ! jpegenc ! multipartmux ! udpsink host=192.168.1.2 port=5000

With the frequency forced to 1GHz the cpu usage is:

libjpeg-classic 37-38%
libjpeg-turbo 10-11%

With the frequency forced to 1.67Ghz the cpu usage is:

libjpeg-classic 22-23%
libjpeg-turbo 6-7%

More importantly, when the default ondemand governor is used, with libjpeg-turbo the frequency stays mostly at 1 Ghz, while with libjpeg-classic it divides almost equally between 1 Ghz and 1.67Ghz, with the obvious consequences in terms of battery life.

On this machine more than on the faster desktop pc the difference between libjpeg-turbo and libjpeg-classic is clearly visible: when nautilus is thumbnailing the images in a directory, when I open an image with oeg, when I'm looking at a f-spot gallery, when I change wallpaper, etc. etc. The upgrade should be considered before 10.10.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Jeff, ping me before you review this and I'll update to whatever is libjpeg-turbo's tip of trunk at that time.

Revision history for this message
In , Dcommander (dcommander) wrote :

Quick note-- the libjpeg-turbo subversion trunk is used for the latest & greatest (and not necessarily stable) code. I would suggest continuing to base your in-tree copy of LJT on 1.0.1. I will create a 1.0.x stable branch as soon as it is needed (i.e. as soon as any bug fixes come in for LJT 1.0.1.)

Revision history for this message
rogerdpack (rogerpack2005) wrote :

+1 for wishing this were used as the default libjpeg distro so the various programs could benefit from it (zoneminder, for instance, benefits greatly).

Revision history for this message
Don Cady (doncady) wrote :

I concur. I know this is ancillary, but libjpeg-turbo sigificantly improved/decreased my proc usage on my zoneminder install.
Also, It looks like Mozilla is working on bundling this in FF now: https://bugzilla.mozilla.org/show_bug.cgi?id=573948
See that bug for their licensing considerations, as this is apparently under the 'wxWindows Library License'.

Changed in debian:
status: Unknown → New
Revision history for this message
Lenin (gagarin) wrote :

i hope it gets into debian and ubuntu soon. and i hope people don't start to do what mozilla wants to do (bundle more software into their huge monster of software). can't wait to run feh faster on my computers to view pretty large photo images...

Revision history for this message
David Robert Lewis (afrodeity) wrote :

more speed, lower profile, greater productivity please.

Revision history for this message
In , Thewolf86 (thewolf86) wrote :

Is this going to make into FF4? If so, can it be marked as blocking final at least? Don't mean to rush of course. Just wish set my expectations appropriately. ;-)

Revision history for this message
In , Benjamin Smedberg (Mozilla) [:bs] (benjamin-smedbergs) wrote :

This bug is *not* blocking, as the flag indicates. And at this point, I really don't think we should take it. We actually need to ship at some point.

Revision history for this message
williamts99 (williamts99) wrote :

This might need some more testing, I installed the latest libjpeg-turbo and could not open pdf documents with document viewer. Reverted to libjpeg62 and the problem went away.

Revision history for this message
In , jorgequinonez (jorgequinonez) wrote :

Does anyone foresee that the libjpeg-turbo code will be added to FF 4.1 as a feature in the future?

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #138)
> Does anyone foresee that the libjpeg-turbo code will be added to FF 4.1 as a
> feature in the future?

I think that's when it's likely to land.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

This isn't yet reviewed, and before it gets reviewed, it needs to be updated to the latest version of the library. It looks like the post2.0 tracking bug tracks bugs which are ready to land, so I'm removing that dependency.

Revision history for this message
In , Dcommander (dcommander) wrote :

FYI:

Stable branch for libjpeg-turbo 1.0.x has been created:

https://libjpeg-turbo.svn.sourceforge.net/svnroot/libjpeg-turbo/branches/1.0.x

This is probably what you want to keep your in-tree copy synced with for now, unless you need any of the features in 1.1.

Revision history for this message
Lenin (gagarin) wrote :

did you install from source or are there any debian (source) packages around?

Revision history for this message
Lenin (gagarin) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

If anyone is seeing the same problem as williamts99 (who is not subscribed to this bug), could you please provide a copy of /proc/cpuinfo from the affected system, and a copy of an example pdf that the problem is reproducible with?

If possible, please also try building a package from the git branch at "git://git.linaro.org/people/tomgall/meego/libjpeg-turbo.git linaroize" to see if the problem is still reproducible for you.

I'm having no problems here viewing pdfs with this package built for amd64. My suspicion is that the only problem is for users trying to use this on pre-SSE x86 systems... which is not surprising since most of the improvements over jpeg upstream here involve hand-coded assembly to take advantage of vector instructions.

Revision history for this message
Tom Gall (tom-gall) wrote :

It's also available in my ppa:tom-gall/packages.

Like Steve, I haven't been able to replicate any issues but thus far I've been testing on armel.

Changed in firefox:
importance: Unknown → Medium
status: Unknown → In Progress
Revision history for this message
In , Bernesb (bernesb) wrote :

FYI: 1.1.0 version was released as stable some time ago

Revision history for this message
In , A2255930 (a2255930) wrote :

Bug on integration libjpeg-turbo in Google Chrome, already have status verified fixed...

http://code.google.com/p/chromium/issues/detail?id=48789

Revision history for this message
In , Lh-bennett (lh-bennett) wrote :

Maybe a different reviewer should be assigned to possibly get this in for Fx5.0?

According to the draft plan for the new development process, the 5.0 branch could be cut early(3 weeks).

Revision history for this message
In , Brendan-mozilla (brendan-mozilla) wrote :

(In reply to comment #144)
> Maybe a different reviewer should be assigned to possibly get this in for
> Fx5.0?

Jeff, you reviewing?

/be

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #145)
> (In reply to comment #144)
> > Maybe a different reviewer should be assigned to possibly get this in for
> > Fx5.0?
>
> Jeff, you reviewing?
>
> /be

I've completed a large chunk of the review. I haven't had time to get to the rest of it yet. I am hoping to for 5.

-Jeff

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I'll be starting full-time on 3/28, so I'll be able to update the patch and whatnot then.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :
Download full text (5.8 KiB)

Comment on attachment 471713
Patch v1.3.1

I did not review the simd code as that's more work and lower risk because it's mostly just optimized versions of existing functions. I did go through the rest of the stuff, hopefully I was thorough enough (there was a lot of change).

A few of these comments are on Mozilla changes, the others are on the upstream code. I don't consider it critical that the upstream stuff be addressed before landing.

>diff --git a/jpeg/MOZCHANGES b/jpeg/MOZCHANGES
>--- a/jpeg/MOZCHANGES
>+++ b/jpeg/MOZCHANGES
>@@ -1,10 +1,42 @@
>+ jchuff.c, jdhuff.c, jdhuff.h
>+ since the versions of these files in libjpeg-turbo are also under the
>+ wxWindows license. (It would have been nicer to revert them to the new
>+ libjepg-8b code, but that doesn't easily integrate with libjepg-turbo.)
s/jepg/jpeg/ (both of them)

>diff --git a/jpeg/jpegint.h b/jpeg/jpegint.h
>--- a/jpeg/jpegint.h
>+++ b/jpeg/jpegint.h
>@@ -364,17 +364,17 @@ EXTERN(void) jinit_color_deconverter JPP
> /* Utility routines in jutils.c */
> EXTERN(long) jdiv_round_up JPP((long a, long b));
>-EXTERN(long) jround_up JPP((long a, long b));
>+EXTERN(size_t) jround_up JPP((size_t a, size_t b));

This change of type here is sort of interesting. It is extended to 64 bits on
Windows and changed to unsigned.
The checkin comment for the change was "Bleepin' Windows uses LLP64, not LP64",
however it seems like the only code that needs the new size is code added by to jpeg-6b.
(jmemmgr.c and huffman encoder/decoder not included in this patch)

This change removes the type parallelism between jround_up and its sister function jdiv_round_up,
it's also sort of an abuse of the size_t type because the function is used for more than just
sizes of objects. Some callers haven't been updated and still do jround_up((long) val)

I'd suggest using a less general align-to-power-of-two function in jmemmgr.c, I didn't investigate
why the huffman code needs to round 64bit values

>diff --git a/jpeg/jchuff.c b/jpeg/jchuff.c
>--- a/jpeg/jchuff.c
>+++ b/jpeg/jchuff.c
>@@ -14,16 +14,19 @@
> #include "jchuff.h" /* Declarations shared with jcphuff.c */
>
>+/* MOZILLA CHANGE: libjpeg-turbo doesn't define INLINE in its config file, so
>+ * we define it here. */
>+#define INLINE

This doesn't seem like a very good definition of INLINE, using
the one removed by this patch seems like a better idea.

>diff --git a/jpeg/jcphuff.c b/jpeg/jcphuff.c
>--- a/jpeg/jcphuff.c
>+++ b/jpeg/jcphuff.c
>@@ -218,17 +218,16 @@ dump_buffer (phuff_entropy_ptr entropy)
>
>-INLINE
> LOCAL(void)
> emit_bits (phuff_entropy_ptr entropy, unsigned int code, int size)
> /* Emit some bits, unless we are in gather mode */
>@@ -271,17 +270,16 @@ flush_bits (phuff_entropy_ptr entropy)
>
>-INLINE
> LOCAL(void)
> emit_symbol (phuff_entropy_ptr entropy, int tbl_no, int symbol)

I'm not sure why INLINE needs to go...

>diff --git a/jpeg/jpeglib.h b/jpeg/jpeglib.h
>--- a/jpeg/jpeglib.h
>+++ b/jpeg/jpeglib.h
>@@ -1,58 +1,41 @@
> /*
> * jpeglib.h
> *
> * Copyright (C) 1991-1998, Thomas G. Lane.
>+ * Copyright (C) 2009, D. R. Commander.
> * This file is part of the Independent JPEG Group's software.
> * For conditions of di...

Read more...

Revision history for this message
In , Bernesb (bernesb) wrote :
Download full text (3.1 KiB)

I strongly suggest updating libjpeg-turbo from 1.0.0 to 1.1.0 before taking time on complete reviewing. It will be wiser because many changes where made.

Significant changes since 1.0.0:

[1] The Huffman decoder will now handle erroneous Huffman codes (for instance, from a corrupt JPEG image.) Previously, these would cause libjpeg-turbo to crash under certain circumstances.

[2] Fixed typo in SIMD dispatch routines which was causing 4:2:2 upsampling to be used instead of 4:2:0 when decompressing JPEG images using SSE2 code.

[3] configure script will now automatically determine whether the INCOMPLETE_TYPES_BROKEN macro should be defined.

Significant changes since 1.0.1"

[1] Added emulation of the libjpeg v7 and v8b APIs and ABIs. See README-turbo.txt for more details. This feature was sponsored by CamTrace SAS.

[2] Created a new CMake-based build system for the Visual C++ and MinGW builds.

[3] TurboJPEG/OSS can now compress from/decompress to grayscale bitmaps.

[4] jpgtest can now be used to test decompression performance with existing JPEG images.

[5] If the default install prefix (/opt/libjpeg-turbo) is used, then 'make install' now creates /opt/libjpeg-turbo/lib32 and /opt/libjpeg-turbo/lib64 sym links to duplicate the behavior of the binary packages.

[6] All symbols in the libjpeg-turbo dynamic library are now versioned, even when the library is built with libjpeg v6b emulation.

[7] Added arithmetic encoding and decoding support (can be disabled with configure or CMake options)

[8] Added a TJ_YUV flag to TurboJPEG/OSS which causes both the compressor and decompressor to output planar YUV images.

[9] Added an extended version of tjDecompressHeader() to TurboJPEG/OSS which allows the caller to determine the type of subsampling used in a JPEG image.

[10] Added further protections against invalid Huffman codes.

Significant changes since 1.0.90 (1.1 beta1):

[1] The algorithm used by the SIMD quantization function cannot produce correct results when the JPEG quality is >= 98 and the fast integer forward DCT is used. Thus, the non-SIMD quantization function is now used for those cases, and libjpeg-turbo should now produce identical output to libjpeg v6b in all cases.

[2] Despite the above, the fast integer forward DCT still degrades somewhat for JPEG qualities greater than 95, so TurboJPEG/OSS will now automatically use the slow integer forward DCT when generating JPEG images of quality 96 or greater. This reduces compression performance by as much as 15% for these high-quality images but is necessary to ensure that the images are perceptually lossless. It also ensures that the library can avoid the performance pitfall created by [1].

[3] Ported jpgtest.cxx to pure C to avoid the need for a C++ compiler.

[4] Fixed visual artifacts in grayscale JPEG compression caused by a typo in the RGB-to-chrominance lookup tables.

[5] The Windows distribution packages now include the libjpeg run-time programs (cjpeg, etc.)

[6] All packages now include jpgtest.

[7] The TurboJPEG dynamic library now uses versioned symbols.

[8] Added two new TurboJPEG API functions, tjEncodeYUV() and tjDecompressToYUV(), to replace the somewhat hackish...

Read more...

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Per comment 147, I intend to update to the new version before checking in. That sound good to you, Jeff?

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #150)
> Per comment 147, I intend to update to the new version before checking in.
> That sound good to you, Jeff?

That's fine. An interdiff would to have for a quick look over. Also, one thing that I forgot to mention in the review comments. I'd prefer that we only include the files that are needed for actually building libjpeg-turbo and not checking in all of the extra programs and documentation.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

>diff --git a/jpeg/jpeglib.h b/jpeg/jpeglib.h
>-#ifdef XP_OS2
>-/*
>- * On OS/2, the system will have defined RGB_* so we #undef 'em to avoid warnings
>- * from jmorecfg.h.
>- */
>-#ifdef RGB_RED
>- #undef RGB_RED
>-#endif
>-#ifdef RGB_GREEN
>- #undef RGB_GREEN
>-#endif
>-#ifdef RGB_BLUE
>- #undef RGB_BLUE
>-#endif

> This is probably needed for OS/2 but I'd rather it be fixed upstream.

Do you mean upstream of libjpeg-turbo, i.e., in libjpeg? libjpeg-turbo is a fork of an old version of libjpeg, so there's really no "upstream" to speak of.

Revision history for this message
In , Swsnyder (swsnyder) wrote :

(In reply to comment #148)
> Comment on attachment 471713 [details]
> Patch v1.3.1
[snip]
> >diff --git a/jpeg/jcdctmgr.c b/jpeg/jcdctmgr.c
> >--- a/jpeg/jcdctmgr.c
> >+++ b/jpeg/jcdctmgr.c
> > /*
> >+ * Find the highest bit in an integer through binary search.
> >+ */
> >+LOCAL(int)
> >+flss (UINT16 val)
> >+{
>
> It would be good to use compiler intrinsics like BitScanReverse and
> __builtin_clz here when they're available.
[snip]

https://sourceforge.net/tracker/?func=detail&aid=3249446&group_id=303195&atid=1278160

It's an improvement in both speed and size, but probably not enough to make a compelling case for inclusion of the patch. We've seen better results in JS & NSPR because they both examine 32 bits. The libjpeg-turbo code only looks at 16 bits so there are less savings to be had from the BSR instruction.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #152)
> Do you mean upstream of libjpeg-turbo, i.e., in libjpeg? libjpeg-turbo is a
> fork of an old version of libjpeg, so there's really no "upstream" to speak of.

I meant upstream in libjpeg-turbo instead of patching our local copy.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Oh, I see; that's a change we made, but which isn't in libjpeg-turbo.

Until it's fixed upstream, why don't we just guard the #define's in jmorecfg.h? We already keep that file out of sync with libjpeg-turbo's mainline.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #155)
> Oh, I see; that's a change we made, but which isn't in libjpeg-turbo.
>
> Until it's fixed upstream, why don't we just guard the #define's in jmorecfg.h?
> We already keep that file out of sync with libjpeg-turbo's mainline.

That sounds fine.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 522445
Patch v2.0. Part 1: Upgrade to libjpeg-turbo v1.1.0

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 522446
Patch v2.0. Part 2: Address review comments.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 522449
Interdiff v1.3.1 -> v2.0 part 1

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 522459
Interdiff v1.3.1 -> v2.0 part 1 without deleted files

This interdiff is the same as attachment 522449, but it doesn't include the files I deleted, in an attempt to make it easier to read.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

DRC, what would be a good medium for handling Jeff's remaining review comments?

I'm going to push the latest patches to try now.

Revision history for this message
In , Dcommander (dcommander) wrote :

Created attachment 522544
Proposed upstream patch for addressing concern about jround_up() argument types.

Revision history for this message
In , Dcommander (dcommander) wrote :

I'll address the review comments that are relevant to the upstream code here:

(1) jround_up() argument types: this was the result of a bad game of Celebrity Whack-a-Mole that I played while trying to port the code to Win64. Jeff's idea of creating a local, power-of-2-specific round-up function for jmemmgr.c is a better idea, and I've done so in a new patch (attachment 522544) (actually, I made it a macro instead of a function.) Please let me know if that looks like a reasonable solution.

(2) RGB_{RED|GREEN|BLUE} macro conflicts on OS/2 (OS/2? Really?): This seems like an innocuous enough patch, although I'd rather have it in jmorecfg.h instead of jpeglib.h. I'd also like to know more of the back story as to why OS/2 is defining those macros. Is this a generic problem that anyone building libjpeg-turbo on OS/2 would encounter? My lingering doubt here is that libjpeg supports OS/2, so I wonder why they haven't encountered this problem on that platform. Is Mozilla including some header file that causes those macros to be defined?

(3) bitscan instructions: I tested the proposed patch to make jcdctmgr.c use GCC 3.4+ compiler intrinsics to replace flss() with bitscan instructions, but I cannot observe any measurable speedup from this (tested several different image types and platforms, as well as 32-bit and 64-bit.) If it were even 2% faster, I would agree to include it, but a performance patch that causes no appreciable increase in performance does nothing but needlessly obfuscate the code.

(4) The tables in jccolor.c are part of the optimized RGB-to-grayscale conversion routines. These are pre-computed R-to-luminance, G-to-luminance, and B-to-luminance conversion values. The problem with this approach is that it loses a bit of accuracy due to round-off, so I recently checked in code to the SVN trunk (1.2 working version) that uses new SIMD routines to perform RGB-to-grayscale conversion. This is faster and eliminates the round-off issues, as well as the need for the tables in jccolor.c and the #ifdef in rgb_gray_convert().

(5) The rest of the concerns regarding jccolor.c have already been addressed in trunk as well (pulling rgb_red, rgb_green, rgb_blue out of the loop, etc.)

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

> (2) RGB_{RED|GREEN|BLUE} macro conflicts on OS/2 (OS/2? Really?): This seems
> like an innocuous enough patch, although I'd rather have it in jmorecfg.h
> instead of jpeglib.h.

Attachment 522446 puts the #undef's in jmorecfg.h, right above the #defines.

Revision history for this message
In , Dcommander (dcommander) wrote :

And just FYI, another thing that I'm currently being contracted to do is investigate rewriting the performance optimizations in j{c|d}huff*. This will hopefully allow me to re-license those files in libjpeg-turbo 1.2.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

Comment on attachment 522544
Proposed upstream patch for addressing concern about jround_up() argument types.

>+#define PAD(size, align) (((size) + (align) - 1) & (~((align) - 1)))

I usually prefer functions over macros for the additional type safety and to avoid evaluating arguments twice (e.g. PAD(x, a++))

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #163)
> (2) RGB_{RED|GREEN|BLUE} macro conflicts on OS/2 (OS/2? Really?): This seems
> like an innocuous enough patch, although I'd rather have it in jmorecfg.h
> instead of jpeglib.h. I'd also like to know more of the back story as to why
> OS/2 is defining those macros. Is this a generic problem that anyone building
> libjpeg-turbo on OS/2 would encounter? My lingering doubt here is that libjpeg
> supports OS/2, so I wonder why they haven't encountered this problem on that
> platform. Is Mozilla including some header file that causes those macros to be
> defined?

I'd be fine with dropping this altogether. If OS/2 still needs it, we can always add it back.

>
> (3) bitscan instructions: I tested the proposed patch to make jcdctmgr.c use
> GCC 3.4+ compiler intrinsics to replace flss() with bitscan instructions, but I
> cannot observe any measurable speedup from this (tested several different image
> types and platforms, as well as 32-bit and 64-bit.) If it were even 2% faster,
> I would agree to include it, but a performance patch that causes no appreciable
> increase in performance does nothing but needlessly obfuscate the code.

It's probably worth adding a comment to flss saying that it's not performance critical enough to justify using compiler intrinsics.

> (4) The tables in jccolor.c are part of the optimized RGB-to-grayscale
> conversion routines. These are pre-computed R-to-luminance, G-to-luminance,
> and B-to-luminance conversion values. The problem with this approach is that
> it loses a bit of accuracy due to round-off, so I recently checked in code to
> the SVN trunk (1.2 working version) that uses new SIMD routines to perform
> RGB-to-grayscale conversion. This is faster and eliminates the round-off
> issues, as well as the need for the tables in jccolor.c and the #ifdef in
> rgb_gray_convert().

That sounds like a good solution to me.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #165)
> And just FYI, another thing that I'm currently being contracted to do is
> investigate rewriting the performance optimizations in j{c|d}huff*. This will
> hopefully allow me to re-license those files in libjpeg-turbo 1.2.

This sounds great.

Revision history for this message
In , Dcommander (dcommander) wrote :

Created attachment 522583
New proposed upstream patch for addressing concern about jround_up() argument types.

Revision history for this message
In , Dcommander (dcommander) wrote :

(In reply to comment #166)
> >+#define PAD(size, align) (((size) + (align) - 1) & (~((align) - 1)))
>
> I usually prefer functions over macros for the additional type safety and to
> avoid evaluating arguments twice (e.g. PAD(x, a++))

My experience has been that the compiler does a good job of optimizing out that duplication of effort, and the lack of overhead of the additional function call usually makes up for it, but in the specific case we're referring to, it doesn't matter enough to worry about, and using a function call is more consistent with the way it worked before. Re-submitted as attachment 522583.

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #170)
> (In reply to comment #166)
> > >+#define PAD(size, align) (((size) + (align) - 1) & (~((align) - 1)))
> >
> > I usually prefer functions over macros for the additional type safety and to
> > avoid evaluating arguments twice (e.g. PAD(x, a++))
>
> My experience has been that the compiler does a good job of optimizing out that
> duplication of effort, and the lack of overhead of the additional function call
> usually makes up for it, but in the specific case we're referring to, it
> doesn't matter enough to worry about, and using a function call is more
> consistent with the way it worked before. Re-submitted as attachment 522583

I was referring to the fact that PAD(x, a++) does a++ twice which is not what the caller is likely to expect. Regardless, I like the new version better.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

There are unfortunately reftest failures on Windows with this latest patch. Looking into it.

http://tbpl.mozilla.org/?tree=MozillaTry&rev=fceae635d617
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1301360568.1301363938.582.gz

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Ah, the orange is due to unexpected success. That's not so bad. :)

I think we can just back out bug 582850.

http://hg.mozilla.org/mozilla-central/rev/4643426a1523

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

Yep, and this should close bug 477728

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Created attachment 522674
Patch v2.0 Part 3: Re-enable reftests on Windows.

Revision history for this message
In , Swsnyder (swsnyder) wrote :

Why is support for arithmetic encoding/decoding disabled?

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

Don't we need to use the libjpeg v8 ABI in order to get this?

Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

(In reply to comment #176)
> Why is support for arithmetic encoding/decoding disabled?

We haven't decided whether to enabled this yet. This is a separate issue and needs a separate bug as well as a lot more thought/research.

Revision history for this message
In , Dcommander (dcommander) wrote :

(In reply to comment #177)
> Don't we need to use the libjpeg v8 ABI in order to get this?

No, it works when emulating the v6b ABI as well. It just adds two additional functions without breaking compatibility with the rest of the ABI.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :
Revision history for this message
In , Jmuizelaar (jmuizelaar) wrote :

Apparently this causes OOM in yasm 0.8.something we may need to up the min version

Revision history for this message
In , Gavin Sharp (gavin-sharp) wrote :

More specifically, with yasm 0.8.0.2194, which is the latest available by default with Ubuntu 10.04. yasm 1.1.0.2352 works fine.

Revision history for this message
In , Olli-pettay (olli-pettay) wrote :

Fedora 13 has old yasm too

Revision history for this message
In , Caillon (caillon) wrote :

(Though, honestly we'd really prefer to make it buildable with nasm, or gas. But that's another bug...)

Changed in firefox:
status: In Progress → Fix Released
Revision history for this message
In , Dcommander (dcommander) wrote :

NOTE: libjpeg-turbo trunk now contains a version of jdhuff.c and jdhuff.h (Huffman decoder) with the optimizations re-written and re-licensed under the same BSD-style license as the rest of libjpeg. Looking into jchuff* as well, but I figured you were probably more interested in the decoder than the encoder.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

I've filed bug 650899 for updating libjpeg-turbo.

Revision history for this message
In , RobertSayre (sayrer) wrote :

Jesse, why did you nominate this bug for tracking-firefox5?

Revision history for this message
In , Jruderman (jruderman) wrote :

Because bsmedberg told me to in http://groups.google.com/group/mozilla.dev.planning/msg/bfc2fabaecacdd97. Also because I think it should be listed on https://wiki.mozilla.org/Features/Release_Tracking to make sure it gets the right security reviews, license file additions, changelog entries, crash stats scrutiny, and whatever else people follow that page for.

Revision history for this message
In , Dcommander (dcommander) wrote :

NOTE: libjpeg-turbo trunk now contains a version of jchuff.c
(Huffman encoder) with the optimizations re-written and re-licensed under the
same BSD-style license as the rest of libjpeg.

Revision history for this message
In , Justin-lebar+bug (justin-lebar+bug) wrote :

... and the bug for tracking that is bug 650899.

Revision history for this message
In , Dcommander (dcommander) wrote :

That bug is for tracking the decoder. I'm talking about the encoder now.

Revision history for this message
In , Bsterne (bsterne) wrote :

FYI, the Chrome team has apparently done a security review of the library:
http://code.google.com/p/chromium/issues/detail?id=48789#c3

Revision history for this message
In , Jpr-mozilla (jpr-mozilla) wrote :

Security team finished fuzzing as planned, we are good here.

Revision history for this message
Oibaf (oibaf) wrote :

It looks like libjpeg-turbo is in Ubuntu since oneiric.

Changed in ubuntu:
status: New → Fix Released
Changed in debian:
status: New → Fix Committed
Changed in debian:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.