performance critical libyuv built with Os
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
firefox (Debian) |
Fix Released
|
Unknown
|
|||
firefox (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Imported from Debian bug http://
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--aN5BOcfPQ1wkI
Content-Type: multipart/mixed; boundary=
protected-
From: Julian Taylor <email address hidden>
To: Debian Bug Tracking System <email address hidden>
Message-ID: <email address hidden>
Subject: performance critical libyuv built with Os
--SrbvOHijIbL53
Content-Type: text/plain; charset=utf-8
Content-Language: en-GB
Content-
Package: firefox
Version: 53.0.is.52.0.2-1
Severity: normal
libyuv which is a performance critical library for firefix is built with
-Os which is horrible for performance for it.
In particular row_common.cc which contains the generic parts of the
color transformation code:
See:
https:/
=3D53.0.
/usr/bin/g++ -std=3Dgnu++11 -o row_common.o -c ... -fPIC
-DMOZILLA_CLIENT -include
/<<
=2Edeps/
-Wc++11-compat -Wempty-body -Wignored-
-Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code
-Wwrite-strings -Wno-invalid-
-Wno-error=
-Wno-error=
-Wformat -Werror=
-fno-delete-
-fno-rtti -ffunction-sections -fdata-sections -fno-exceptions
-fno-math-errno -pthread -pipe -g -freorder-blocks -Os
-fomit-
/<<
The problematic part is the YuvPixel function which is called in loops
and in turn calls tiny clamp functions.
Os disables inlining so this causes massive overhead.
This is the top cpu profile on sites which e.g. display videos.
17.25% libxul.so [.] YuvPixel =E2=96=92
6.58% libxul.so [.] Clamp =E2=96=92
6.46% libxul.so [.] clamp255
The problem is not as bad as it looks as this generic code is only
executed on machines that do not have SSSE3, AVX2 or NEON (see
convert_argb.cc)
But there are still plenty useful cpus that do not have these
instruction sets and are crippled by the compiler flags used.
Is it possible to compile this library with O3 to allow the compiler to
vectorize it with the best available generic instruction set (e.g. SSE2
on x64).
cheers,
Julian Taylor
--SrbvOHijIbL53
--aN5BOcfPQ1wkI
Content-Type: application/
Content-
Content-
-----BEGIN PGP SIGNATURE-----
iQI7BAEBCAAlBQJ
CRAyzEqsAodW/
utj6vgrZ3hDEefG
Y458uBPG860j7AE
tApXdeM+
CYNVPH426/
S2XZsLkmcxFRb8J
Eetuu4rX4dGLdGF
nfzuM5m+
QYqEKYaGLVMa3A3
2UWbr/hd3+
QT3mIXWjBmf0f52
=P1R4
-----END PGP SIGNATURE-----
--aN5BOcfPQ1wkI
Changed in firefox (Debian): | |
importance: | Undecided → Unknown |
Changed in firefox (Debian): | |
status: | New → Confirmed |
Changed in firefox (Debian): | |
status: | Confirmed → Fix Released |
Changed in firefox (Ubuntu): | |
status: | New → Fix Released |