$ clang++ main.cpp -o main -fno-omit-frame-pointer -O0 -lpthread -lbenchmark
$ perf record -g ./main
$ perf report -g 'graph,0.5,caller'
Result:
- with linux-tools-common 4.15.0-46.49:
Samples: 3K of event 'cpu-clock', Event count (approx.): 757250000
Children Self Command Shared Object Symbol
+ 99.67% 0.00% main libbenchmark.so.1.3.0 [.] 0x0000000000015b58
+ 99.67% 0.00% main main [.] _ZL5myfunRN9benchmark5StateE
+ 99.64% 0.17% main main [.] _ZL7caller1v
+ 99.50% 99.47% main main [.] _ZL22my_really_big_functionv
...
- with linux-tools-common 4.15.0-47.50 (from -proposed):
Samples: 3K of event 'cpu-clock', Event count (approx.): 755000000
Children Self Command Shared Object Symbol
+ 99.77% 0.00% main libbenchmark.so.1.3.0 [.] 0x0000000000015b58
+ 99.77% 0.00% main main [.] myfun
+ 99.70% 0.13% main main [.] caller1
+ 99.64% 99.64% main main [.] my_really_big_function
...
Tested the following on bionic:
$ cat main.cpp benchmark. h>
#include <benchmark/
#include <vector>
static __attribute__ ((noinline)) int my_really_ big_function( ) :DoNotOptimize( i % 5);
{
for(size_t i = 0; i < 1000; ++i)
{
benchmark:
}
return 0;
}
static __attribute__ ((noinline)) void caller1() :DoNotOptimize( my_really_ big_function( )); :DoNotOptimize( i % 5);
{
for(size_t i = 0; i < 1000; ++i)
{
benchmark:
benchmark:
}
}
static __attribute__ ((noinline)) void myfun(benchmark ::State& state) state.KeepRunni ng())
{
while(
{
caller1();
}
}
BENCHMARK(myfun);
BENCHMARK_MAIN();
$ clang++ main.cpp -o main -fno-omit- frame-pointer -O0 -lpthread -lbenchmark
$ perf record -g ./main
$ perf report -g 'graph,0.5,caller'
Result:
- with linux-tools-common 4.15.0-46.49:
Samples: 3K of event 'cpu-clock', Event count (approx.): 757250000 so.1.3. 0 [.] 0x0000000000015b58 chmark5StateE really_ big_functionv
Children Self Command Shared Object Symbol
+ 99.67% 0.00% main libbenchmark.
+ 99.67% 0.00% main main [.] _ZL5myfunRN9ben
+ 99.64% 0.17% main main [.] _ZL7caller1v
+ 99.50% 99.47% main main [.] _ZL22my_
...
- with linux-tools-common 4.15.0-47.50 (from -proposed):
Samples: 3K of event 'cpu-clock', Event count (approx.): 755000000 so.1.3. 0 [.] 0x0000000000015b58 big_function
Children Self Command Shared Object Symbol
+ 99.77% 0.00% main libbenchmark.
+ 99.77% 0.00% main main [.] myfun
+ 99.70% 0.13% main main [.] caller1
+ 99.64% 99.64% main main [.] my_really_
...
This problem seems fixed now.