slow math sin function for some values on amd64
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| GLibC |
In Progress
|
Wishlist
|
||
| glibc (Ubuntu) |
Medium
|
Unassigned |
Bug Description
Hello.
I have found that math sin function is unreasonable slow (400us on Athlon64 A2 4800+) for some values. It only happens on 64bit distribution.
Used versions:
Kubuntu 7.10 amd64 gutsy gutter
linux kernel 2.6.24
libc6 2.6.1-1ubuntu10
libc6-dev 2.6.1-1ubuntu10
Some of those values: -------
0.9334058229264
2.3328432680770
3.7439477503636
3.9225160069792
4.0711651639931
4.7858438478542
5.9840767662578
Short program for testing: -------
#include <stdlib.h>
#include <sstream>
#include <iostream>
#include <iomanip>
#include <math.h>
#include <sys/time.h>
using namespace std;
int main(int argc, char** argv) {
volatile union {
double dbl;
unsigned char hex[sizeof(
} value;
if (argc == sizeof(double)+1) {
for (int i=0; i < sizeof(double); i++) {
int tmp;
s >> hex >> tmp;
}
} else if (argc == 2) {
double tmp;
s >> tmp;
value.dbl = tmp;
} else {
cout << "usage: sintest 00 b0 6b e3 75 de ed 3f\n"
" sintest 0.9334058229264
return (EXIT_FAILURE);
}
cout.
cout << value.dbl << endl;
cout << "start\n";
struct timeval time1, time2;
gettimeofda
for (int i=0; i < 10000; i++) {
volatile double out = sin(value.dbl);
}
gettimeofda
long long diftime = 1000000ll * (time2.tv_sec - time1.tv_sec) +
cout << "end: " << diftime / 1000000ll << '.' << setw(6) << setfill('0') << diftime % 1000000ll << " s" << endl;
return (EXIT_SUCCESS);
}
Petr Cervenka
Most of the double routines in libm come from IBM accurate matematical library,
which ensures <= 0.5ulp error. Trigonometric etc. functions are computed using
floating point computations, but if the possible error from that is too high, it
uses slower multiprecision computation to guarantee ultimate precise result.
Guess you just picked some worst-case values.
i386 uses the non-precise hardware instructions instead, so doesn't guarantee
the <= 0.5ulp precision.
Subject: Re: Slow sine function for special values on AMD64
- second attempt
On Thu, 21 Feb 2008, jakub at redhat dot com wrote:
> which ensures <= 0.5ulp error. Trigonometric etc. functions are
> computed using floating point computations, but if the possible error
> from that is too high, it uses slower multiprecision computation to
> guarantee ultimate precise result. Guess you just picked some worst-case
> values.
Note that the crlibm developers were willing to contribute their code, an
advantage of which is *much* better worst-case performance.
Yeah, I'm aware of crlibm, I think if it proves itself that it won't be much
slower on average, has the same ultimate precision guarantees and faster
worst-cases, I don't see a reason why it can't be integrated. It will be a lot
of work to integrate it though.
|
#7 |
Is there any compile flag or #define, which can disable the <=0.5 ulp precision
and the math sin function will use only the fast built-in fp intructions?
For our real-time software it is necessary to be "quick", the ultra precision
has low priority.
Now we are using a workaround: I can put the original argument to long double
variable and call sinl function with long double result. Both, the new argument
and the result, have to be volatile to disable the compiler optimization of it
(probably uses the "fast" sin instead).
Results of sin(0.933405822
-------
distr function value result_type printf_format
-------
32 sin 0.8036514043877
32 sinl 0.8036514043877
32 sinl 0.8036514043877
64 sin 0.8036514043877
(~ -5.551115123125
64 sinl 0.8036514043877
(~ +5.551115123125
64 sinl 0.8036514043877
|
#8 |
I'm not the only one with such problems:
http://
I assume that for the 64-bit distribution (x86_64), it should use sin and sinf
from i386 arch (sysdeps\
sinl implementation is explicit x86_64. But the sin and sinf are now used as
software versions (IBM library). And it's usually bit slower, sometimes MUCH
MORE slower (1000x).
IBM library is perhaps only emergency implementation (if there is no hw support)
and it's not used for "better" (<= 0.5ULP) precision.
"The First Step is to Admit You Have a Problem!"
Matthias Klose (doko) wrote : | #1 |
please could you recheck with 8.04 (hardy), and 8.10 (intrepid)?
Changed in glibc: | |
status: | New → Incomplete |
Petr Cervenka (grugh) wrote : | #2 |
I have tried it with Kubuntu KDE4 8.04 AMD64 and the error is still there. But it's as I expected, because the error is in glibc library and it has nothing to do with kubuntu or ubuntu. Link to my bug post to glibc (ignored successfully lately):
http://
Sorry for posting it here. It was when I didn't know the true source of the error.
Thank you for your effort anyway.
Changed in glibc: | |
status: | Unknown → Confirmed |
Changed in glibc: | |
importance: | Undecided → Medium |
status: | Incomplete → Triaged |
Changed in glibc: | |
importance: | Unknown → Medium |
|
#9 |
Confirmed with current sources. Suspending until a faster correctly rounding implementation (such as that proposed in http://
Changed in glibc: | |
status: | Confirmed → Incomplete |
|
#10 |
FWIW, the function now runs much faster after the multiple precision improvements. The worst case is only about a 100 times slower now instead of 1000 times.
I've not looked yet, but I think there is a case for capping maximum precision for worst case computation for sin (and all trigonometric functions) as well, so this could get even better.
|
#11 |
Opening this since I've been working on improvements to the multiple precision bits that should have positive effect here. In fact as I mentioned in comment 7, improvements are already evident.
Since optimization patches can go on forever, I'm going to put a cap on it for the resolution of this bug. The cap is to implement findings of [1] if applicable.
[1] http://
Changed in glibc: | |
status: | Incomplete → In Progress |
Changed in glibc: | |
importance: | Medium → Wishlist |
I have also come across a very similar issue on i7 Intel platforms, please see bug 16531. Calls to cos can take around 0.15 ms, 1000 times their normal time, which is a serious problem for the real-time system we are developing.
*** Bug 16531 has been marked as a duplicate of this bug. ***
|
#14 |
(In reply to John Wilkinson from comment #9)
> I have also come across a very similar issue on i7 Intel platforms, please
> see bug 16531. Calls to cos can take around 0.15 ms, 1000 times their normal
> time, which is a serious problem for the real-time system we are developing.
The default libm functions never guarantee constant runtime. You will have this same problem for many of the functions provided by the library.
However we are working on enhancing libm to include something like what you're looking for. Please have look at and comment:
https:/
|
#15 |
Really the issue for sin/cos/sincos is the same, so retitling the bug.
|
#16 |
*** Bug 14412 has been marked as a duplicate of this bug. ***
(In reply to Carlos O'Donell from comment #11)
> However we are working on enhancing libm to include something like what
> you're looking for. Please have look at and comment:
> https:/
Thanks that looks useful. Is there a release schedule?
|
#18 |
(In reply to John Wilkinson from comment #14)
> (In reply to Carlos O'Donell from comment #11)
> > However we are working on enhancing libm to include something like what
> > you're looking for. Please have look at and comment:
> > https:/
>
> Thanks that looks useful. Is there a release schedule?
Not yet. I'll update the wiki when I can commit resources. That doesn't stop others from joining in the discussion, or adding notes to the wiki like use cases and requirements.
|
#19 |
Simple workaround to use fast computation is to use functions from spec. header similar to following:
#ifndef FAST_MATH_H
#define FAST_MATH_H
#include <cmath>
inline double fast_sin(long double x) {
return sinl(x);
}
inline double fast_cos(long double x) {
return cosl(x);
}
inline double fast_tan(long double x) {
return tanl(x);
}
inline double fast_asin(long double x) {
return asinl(x);
}
inline double fast_acos(long double x) {
return acosl(x);
}
inline double fast_atan(long double x) {
return atanl(x);
}
inline double fast_atan2(long double x, long double y) {
return atan2l(x, y);
}
inline double fast_sinh(long double x) {
return sinhl(x);
}
inline double fast_cosh(long double x) {
return coshl(x);
}
inline double fast_asinh(long double x) {
return asinhl(x);
}
inline double fast_acosh(long double x) {
return acoshl(x);
}
inline double fast_pow(long double x, long double y) {
return powl(x, y);
}
inline double fast_sqrt(long double x) {
return sqrtl(x);
}
inline double fast_exp(long double x) {
return expl(x);
}
inline double fast_log(long double x) {
return logl(x);
}
inline double fast_log10(long double x) {
return log10l(x);
}
#endif /* FAST_MATH_H */
I would like to repost my previously deleted bug by (lazy IMHO) projects/ glibc/build/ libc_pic. a projects/ glibc/build/ elf/librtld. map.o '-Wl,-(' projects/ glibc/build/ elf/dl- allobjs. os projects/ glibc/build/ libc_pic. a -lgcc '-Wl,-)' /home/inova/ projects/ glibc/build/ elf/librtld. mapT projects/ glibc/build/ libc_pic. a(init- first.os) :(.data+ 0x0): multiple multiple_ libcs' projects/ glibc/build/ elf/dl- allobjs. os:/home/ inova/projects/ glibc/src/ glibc-20080218/ elf/rtld. c:641: projects/ glibc/build/ libc_pic. a(dl-addr. os): In function inside_ object' : projects/ glibc/src/ glibc-20080218/ elf/dl- addr.c: 158: multiple inside_ object' projects/ glibc/build/ elf/dl- allobjs. os:/home/ inova/projects/ glibc/src/ glibc-20080218/ elf/dl- open.c: 700: inova/projects/ glibc/build/ elf/librtld. map] Error 1 inova/projects/ glibc/src/ glibc-20080218/ elf' inova/projects/ glibc/src/ glibc-20080218'
<email address hidden>. The math sin function is at least 1000x slower on 64bit
distributions for special numbers (and carlos dosn't care about it).
I can't try it with CVS head, because I cannot connect to cvs through our firewall.
But even when I tried the latest snapshot, I couldn't build it (maybe another bug):
a - elf/dl-vdso.os
: /home/inova/
gcc -nostdlib -nostartfiles -r -o
/home/inova/
/home/inova/
/home/inova/
-Wl,-Map,
/home/inova/
definition of `__libc_
/home/inova/
first defined here
/home/inova/
`_dl_addr_
/home/inova/
definition of `_dl_addr_
/home/inova/
first defined here
collect2: ld returned 1 exit status
make[2]: *** [/home/
make[2]: Leaving directory `/home/
make[1]: *** [elf/subdir_lib] Error 2
make[1]: Leaving directory `/home/
make: *** [all] Error 2
Please, anyone with 64bit distribution and glibc CVS head, could you try the
attached example and post the time results of it? (or help me to build the
snapshot...)
Thank you
====== Original bug report ======= ======= ======= ======= ======= === 883266296237707 1381 0x3fedde75e36bb000 916363144351635 128 0x4002a9a9bb38add0 453548097051680 088 0x400df39ae0cdf500 437411706487182 528 0x400f615012801950 289992091478779 912 0x401048df854fdc20 097982426639646 292 0x401324b43fe92fc0 002727968851104 379 0x4017efb1d1df52a0
The math sin(double) function is in 64bit distribution (Kubuntu 7.10 AMD64 and
Fedora - unknown version) unreasonable slow (~400 microseconds on Atlon64 X2
4800+!!!) for some special values. In 32bit distribution is everything fine.
I captured some of those values:
0.9334058229264
2.3328432680770
3.7439477503636
3.9225160069792
4.0711651639931
4.7858438478542
5.9840767662578
Example: 883266296237707 1381;
#include <math.h>
int main(int argc, char** argv) {
volatile double value = 0.9334058229264
volatile double out;
int i;
for (i=0; i < 20000; i++)
out = sin(value);
return 0;
}