Few things about the oprofile report:

a) I don't see much of in the output there. Was this done with
single node or multiple nodes (for galera overhead to be counted).

b) Were both the FK/non-FK done at same point during benchmark cycle? This is
because some of the symbols in non-FK like my_qsort are missing in FK one.
sql_rnd_with_mutex is there in both though.
Also, query cache seems to be enabled, which may be adding overhead.

c) A perf report may also be helpful.