I did some performance benchmarks with pybench on an ARMv7 board. To prevent any third party processes from interfering, the board was running Ubuntu in single user mode, and the stock glibc. I'll run another set of benchmarks with our glibc tuned with the proposed flags, and also do another set of benchmarks on my NSLU2 (XScale/ARMv5) to see what sorta performance hit we're going to see. With our current CFLAGS: * Round 1 done in 62.165 seconds. * Round 2 done in 62.229 seconds. * Round 3 done in 61.994 seconds. * Round 4 done in 61.616 seconds. * Round 5 done in 62.371 seconds. * Round 6 done in 63.191 seconds. * Round 7 done in 62.180 seconds. * Round 8 done in 62.165 seconds. * Round 9 done in 61.906 seconds. * Round 10 done in 62.977 seconds. Test minimum average operation overhead ------------------------------------------------------------------------------- BuiltinFunctionCalls: 1302ms 1317ms 2.58us 3.114ms BuiltinMethodLookup: 871ms 871ms 0.83us 3.645ms CompareFloats: 837ms 974ms 0.81us 4.171ms CompareFloatsIntegers: 963ms 1052ms 1.17us 3.112ms CompareIntegers: 657ms 659ms 0.37us 6.304ms CompareInternedStrings: 667ms 670ms 0.45us 16.016ms CompareLongs: 564ms 566ms 0.54us 3.641ms CompareStrings: 550ms 556ms 0.56us 10.780ms CompareUnicode: 548ms 551ms 0.74us 8.163ms ComplexPythonFunctionCalls: 1699ms 1750ms 8.75us 5.260ms ConcatStrings: 1017ms 1099ms 2.20us 6.975ms ConcatUnicode: 4336ms 4720ms 15.73us 4.897ms CreateInstances: 1454ms 1463ms 13.06us 4.242ms CreateNewInstances: 1267ms 1283ms 15.28us 3.627ms CreateStringsWithConcat: 742ms 750ms 0.75us 10.567ms CreateUnicodeWithConcat: 772ms 778ms 1.94us 4.172ms DictCreation: 695ms 695ms 1.74us 4.172ms DictWithFloatKeys: 1081ms 1086ms 1.21us 7.899ms DictWithIntegerKeys: 767ms 771ms 0.64us 10.566ms DictWithStringKeys: 668ms 672ms 0.56us 10.567ms ForLoops: 671ms 672ms 26.88us 0.685ms IfThenElse: 533ms 534ms 0.40us 7.898ms ListSlicing: 557ms 563ms 40.19us 0.642ms NestedForLoops: 764ms 771ms 0.51us 0.247ms NestedListComprehensions: 1129ms 1148ms 95.63us 1.012ms NormalClassAttribute: 761ms 771ms 0.64us 5.271ms NormalInstanceAttribute: 697ms 697ms 0.58us 5.278ms PythonFunctionCalls: 693ms 698ms 2.11us 3.125ms PythonMethodCalls: 1568ms 1574ms 7.00us 1.566ms Recursion: 1037ms 1043ms 20.86us 5.249ms SecondImport: 1380ms 1382ms 13.82us 2.052ms SecondPackageImport: 1408ms 1411ms 14.11us 2.052ms SecondSubmoduleImport: 1600ms 1602ms 16.02us 2.052ms SimpleComplexArithmetic: 1581ms 1584ms 1.80us 4.171ms SimpleDictManipulation: 778ms 782ms 0.65us 5.248ms SimpleFloatArithmetic: 1415ms 1418ms 1.07us 6.303ms SimpleIntFloatArithmetic: 659ms 660ms 0.50us 6.304ms SimpleIntegerArithmetic: 659ms 661ms 0.50us 6.305ms SimpleListComprehensions: 932ms 947ms 78.88us 1.015ms SimpleListManipulation: 642ms 646ms 0.55us 6.840ms SimpleLongArithmetic: 621ms 637ms 0.97us 3.111ms SmallLists: 984ms 1000ms 1.47us 4.172ms SmallTuples: 1038ms 1043ms 1.93us 4.700ms SpecialClassAttribute: 754ms 755ms 0.63us 5.272ms SpecialInstanceAttribute: 818ms 819ms 0.68us 5.277ms StringMappings: 1426ms 1428ms 5.66us 4.474ms StringPredicates: 1303ms 1326ms 1.89us 15.920ms StringSlicing: 809ms 864ms 1.54us 9.299ms TryExcept: 658ms 660ms 0.29us 7.895ms TryFinally: 1530ms 1532ms 9.58us 4.263ms TryRaiseExcept: 1196ms 1203ms 18.79us 4.172ms TupleSlicing: 728ms 733ms 2.79us 0.424ms UnicodeMappings: 703ms 705ms 19.59us 3.839ms UnicodePredicates: 1390ms 1396ms 2.58us 19.107ms UnicodeProperties: 1723ms 1729ms 4.32us 15.925ms UnicodeSlicing: 973ms 1467ms 2.99us 8.250ms WithFinally: 1457ms 1458ms 9.11us 4.259ms WithRaiseExcept: 1663ms 1681ms 21.02us 5.357ms ------------------------------------------------------------------------------- Totals: 60691ms 62279ms With the proposed CFLAGS: * Round 1 done in 60.513 seconds. * Round 2 done in 60.353 seconds. * Round 3 done in 61.784 seconds. * Round 4 done in 60.537 seconds. * Round 5 done in 60.090 seconds. * Round 6 done in 59.704 seconds. * Round 7 done in 60.323 seconds. * Round 8 done in 60.244 seconds. * Round 9 done in 60.026 seconds. * Round 10 done in 58.853 seconds. Average of 60.243 seconds per test run Test minimum average operation overhead ------------------------------------------------------------------------------- BuiltinFunctionCalls: 1234ms 1242ms 2.43us 1.986ms BuiltinMethodLookup: 846ms 867ms 0.83us 2.322ms CompareFloats: 918ms 1066ms 0.89us 2.654ms CompareFloatsIntegers: 876ms 974ms 1.08us 1.986ms CompareIntegers: 681ms 682ms 0.38us 3.988ms CompareInternedStrings: 694ms 694ms 0.46us 10.148ms CompareLongs: 548ms 548ms 0.52us 2.320ms CompareStrings: 564ms 564ms 0.56us 6.895ms CompareUnicode: 562ms 562ms 0.75us 5.259ms ComplexPythonFunctionCalls: 1632ms 1710ms 8.55us 3.338ms ConcatStrings: 960ms 1099ms 2.20us 5.021ms ConcatUnicode: 4146ms 4732ms 15.77us 3.836ms CreateInstances: 1411ms 1433ms 12.79us 2.719ms CreateNewInstances: 1296ms 1314ms 15.65us 2.496ms CreateStringsWithConcat: 760ms 763ms 0.76us 6.674ms CreateUnicodeWithConcat: 728ms 751ms 1.88us 2.655ms DictCreation: 644ms 645ms 1.61us 2.653ms DictWithFloatKeys: 914ms 916ms 1.02us 4.999ms DictWithIntegerKeys: 723ms 723ms 0.60us 6.674ms DictWithStringKeys: 690ms 690ms 0.57us 6.676ms ForLoops: 687ms 687ms 27.48us 0.460ms IfThenElse: 553ms 553ms 0.41us 4.999ms ListSlicing: 558ms 560ms 40.00us 0.653ms NestedForLoops: 780ms 781ms 0.52us 0.165ms NestedListComprehensions: 987ms 1044ms 87.01us 0.669ms NormalClassAttribute: 779ms 780ms 0.65us 3.342ms NormalInstanceAttribute: 719ms 721ms 0.60us 3.350ms PythonFunctionCalls: 713ms 715ms 2.17us 1.998ms PythonMethodCalls: 1509ms 1523ms 6.77us 1.025ms Recursion: 944ms 945ms 18.90us 3.322ms SecondImport: 1343ms 1346ms 13.46us 1.342ms SecondPackageImport: 1352ms 1355ms 13.55us 1.375ms SecondSubmoduleImport: 1589ms 1594ms 15.94us 1.373ms SimpleComplexArithmetic: 909ms 914ms 1.04us 2.764ms SimpleDictManipulation: 712ms 714ms 0.59us 3.474ms SimpleFloatArithmetic: 961ms 966ms 0.73us 4.156ms SimpleIntFloatArithmetic: 527ms 527ms 0.40us 4.162ms SimpleIntegerArithmetic: 527ms 527ms 0.40us 4.165ms SimpleListComprehensions: 874ms 887ms 73.88us 0.701ms SimpleListManipulation: 588ms 589ms 0.50us 4.509ms SimpleLongArithmetic: 610ms 611ms 0.93us 2.068ms SmallLists: 962ms 968ms 1.42us 2.768ms SmallTuples: 1024ms 1039ms 1.92us 3.112ms SpecialClassAttribute: 775ms 776ms 0.65us 3.344ms SpecialInstanceAttribute: 893ms 899ms 0.75us 3.349ms StringMappings: 1379ms 1396ms 5.54us 3.128ms StringPredicates: 2549ms 2549ms 3.64us 12.080ms StringSlicing: 743ms 846ms 1.51us 6.583ms TryExcept: 728ms 728ms 0.32us 5.211ms TryFinally: 1459ms 1461ms 9.13us 2.770ms TryRaiseExcept: 1217ms 1235ms 19.29us 2.763ms TupleSlicing: 650ms 679ms 2.59us 0.449ms UnicodeMappings: 689ms 692ms 19.22us 4.300ms UnicodePredicates: 1027ms 1028ms 1.90us 14.498ms UnicodeProperties: 1294ms 1305ms 3.26us 12.075ms UnicodeSlicing: 928ms 1316ms 2.69us 5.849ms WithFinally: 1377ms 1379ms 8.62us 2.769ms WithRaiseExcept: 1626ms 1631ms 20.39us 3.467ms That being said, the initial results, while having an improvement, are not very impressive, and I suspect we'll be seeing a reduction in performance on the NSLU2 due to being tuned against features its core doesn't get. I'll post more resorts once I have rebuilt glibc.