Hi Thomas~ Can you elaborate on your test? You wrote: "The first three results are 2*100, 2*50 and 2*20 processes exchanging 100k, 200k and 1M messages over a pipe. The last three results are 2*100, 2*50, and 2*20 threads exchanging 100k, 200k and 1M messages with pthread_mutex and pthread_cond." So, I'm guessing you want the test to be run like this: ./processtest 200 100000 ./processtest 100 200000 ./processtest 40 1000000 ./threadtest 200 100000 ./threadtest 100 200000 ./threadtest 40 1000000 Is that correct? Just want to be sure i'm running the same tests (Also, the code limits number of processes to max 100... so I just edited this allowing the max limit to be 200) Here's our results: 2.6.15.7-ubuntu1-custom-1000HZ_CLK #1 SMP Thu Jan 15 19:06:30 PST 2009 x86_64 GNU/Linux (ubuntu 6.06.2 server LTS with clk_hz set to 1000HZ) min:0.004ms|avg:0.004-0.271ms|mid:0.005ms|max:42.049ms|duration:34.029s min:0.004ms|avg:0.004-0.138ms|mid:0.035ms|max:884.865ms|duration:33.105s min:0.004ms|avg:0.004-0.042ms|mid:0.004ms|max:2319.621ms|duration:62.438s min:0.005ms|avg:0.010-0.026ms|mid:0.012ms|max:1407.923ms|duration:92.132s min:0.005ms|avg:0.011-0.029ms|mid:0.013ms|max:1539.929ms|duration:97.034s min:0.005ms|avg:0.010-0.031ms|mid:0.013ms|max:18669.095ms|duration:176.555s 2.6.24-23-server #1 SMP Thu Nov 27 18:45:02 UTC 2008 x86_64 GNU/Linux (default ubuntu 64 8.04 server LTS at default 100HZ clock) min:0.004ms|avg:0.034-0.357ms|mid:0.324ms|max:39.789ms|duration:43.390s min:0.004ms|avg:0.006-0.149ms|mid:0.131ms|max:79.430ms|duration:39.288s min:0.004ms|avg:0.046-0.057ms|mid:0.052ms|max:52.427ms|duration:64.481s min:0.005ms|avg:0.006-0.650ms|mid:0.330ms|max:22.120ms|duration:60.142s min:0.005ms|avg:0.053-0.309ms|mid:0.276ms|max:21.560ms|duration:62.353s min:0.004ms|avg:0.033-0.123ms|mid:0.112ms|max:22.007ms|duration:131.029s Linux la 2.6.24.6-custom #1 SMP Thu Jan 15 23:34:10 UTC 2009 x86_64 GNU/Linux (ubuntu 8.04 server LTS with clk_hz custom set to 1000HZ) min:0.004ms|avg:0.054-0.364ms|mid:0.332ms|max:24.524ms|duration:42.522s min:0.004ms|avg:0.125-0.156ms|mid:0.144ms|max:13.171ms|duration:33.573s min:0.004ms|avg:0.046-0.058ms|mid:0.052ms|max:13.005ms|duration:64.388s min:0.005ms|avg:0.006-0.594ms|mid:0.302ms|max:13.481ms|duration:61.105s min:0.005ms|avg:0.109-0.336ms|mid:0.307ms|max:13.345ms|duration:65.000s min:0.002ms|avg:0.070-0.130ms|mid:0.120ms|max:13.137ms|duration:133.786s Side notes... we have been experiencing problems with MySQL specifically with sync-binlog=1 and log-bin on and performing high volume of concurrent transactions. Although we run raid-1 with battery cache on... our throughput is horrible. For some reason, we have found that by increasing the CONFIG_HZ=1000 from 100 in the kernel, we get much higher throughput. Otherwise our benchmarks just sit around and have trouble context switching. #CONFIG_HZ_100=y #CONFIG_HZ=100 #change to: CONFIG_HZ_1000=y CONFIG_HZ=1000 I do not know if the problems we are experiencing with the clock are related to this bug listed here. However, I did want to submit our feed back showing the difference in kernels where our bottleneck runs better. We use sysbench for our test (with vmstat -S M 3, iostat -dx 3, and mpstat 3 to monitor.. all part of sysstat suite). FYI, Here are our sysbench commands (be sure to change your mysql username and password and create the database sbtest): You can get sysbench here: http://sysbench.sourceforge.net/ Compile it like: ./configure --with-mysql --with-mysql-include=/usr/share/include --with-mysql-lib=/usr/share/lib make make install Prepare it: ./sysbench --num-threads=50 --test=oltp --oltp-test-mode=complex --oltp-table-size=100000 --oltp-distinct-ranges=0 --oltp-order-ranges=0 --oltp-sum-ranges=0 --oltp-simple-ranges=0 --oltp-point-selects=0 --oltp-range-size=0 --mysql-table-engine=innodb --mysql-host=127.0.0.1 --mysql-user=ROOT --mysql-password=PASSWORD prepare Run it: ./sysbench --num-threads=50 --test=oltp --oltp-test-mode=complex --oltp-table-size=100000 --oltp-distinct-ranges=0 --oltp-order-ranges=0 --oltp-sum-ranges=0 --oltp-simple-ranges=0 --oltp-point-selects=0 --oltp-range-size=0 --mysql-table-engine=innodb --mysql-host=127.0.0.1 --mysql-user=ROOT --mysql-password=PASSWORD run The important line of output is read/write requests per second, and total time. === 2.6.15.7-ubuntu1-custom-1000HZ_CLK #1 SMP Thu Jan 15 19:06:30 PST 2009 x86_64 GNU/Linux (ubuntu 6.06.2 server LTS with clk_hz custom set to 1000) read/write requests: 50000 (2394.13 per sec.) total time: 20.8844s vmstat -S M 3 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 9043 142 559 0 0 1 30341 5020 25659 6 15 78 1 iostat -dx 3 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 4320.74 0.00 4836.12 0.00 73254.85 0.00 36627.42 15.15 4.93 1.02 0.16 77.02 === === 2.6.24-23-server #1 SMP Thu Nov 27 18:45:02 UTC 2008 x86_64 GNU/Linux (default ubuntu 64 8.04 server LTS at default 100HZ clock) read/write requests: 50000 (434.33 per sec.) total time: 115.1207s vmstat -S M 3 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 1506 109 100 0 0 155 5011 531 4532 5 3 91 1 iostat -dx 3 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 951.67 30.67 551.00 274.67 12021.33 21.14 1.18 2.03 1.60 93.00 === === Linux la 2.6.24.6-custom #1 SMP Thu Jan 15 23:34:10 UTC 2009 x86_64 GNU/Linux (ubuntu 8.04 server LTS with clk_hz custom set to 1000) read/write requests: 50003 (2680.47 per sec.) total time: 18.6546s vmstat -S M 3 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 0 1710 46 73 0 0 1296 27104 3474 31095 5 3 82 9 iostat -dx 3 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 2432.33 159.33 2576.00 1632.00 40066.67 15.24 1.95 0.71 0.35 94.47 === Note: Our servers are 2xIntel xeon 5110 dual core 1.6GHz with 15k SAS Raid-1 2+GB Ram Not sure if this feedback is helping or not; my hope is that it is relevant to what you are trying to fix. My personal opinion is that the kernel should scale a little more uniform than 434.33 per sec versus 2680.47 per sec... seems to be a large difference. Even though 100Hz clock setting is recommended for servers... it's seems this could actually not be ideal for anyone running a MySQL server that needs safe transaction support via sync-binlog=1 (at least... that's what we are finding for high insert/update load). Perhaps you can look at sysbench as there are a number of test for threads, fileio, and etc to determine if this can expose the kernel issues in a different way? Any feedback about an ideal kernel and kernel config for servers is much appreciated as these are no doubt difficult to debug.