The status update. We repeated tests with:
* increased net_ticktime=500 (from 60 default),
* with adjusted sysctls, see http://pastebin.com/6L2pM5ya
* with increased rx tx buffers = 4096,
but results were the same: starting from some moment of time, there were a packet loss, despite on the fact that max throughput was only ~3-4Gbit/s from 10Gbit/s. Alexander Nevenchannyy (anevenchannyy) found the RC of packet loss. This caused by wrong IRQ balance. He discovered that during a particular iperf test there is a limit reached for throughput ~3-4Gbit and IRQ load was >91% for a single CPU core:
The status update. We repeated tests with: pastebin. com/6L2pM5ya
* increased net_ticktime=500 (from 60 default),
* with adjusted sysctls, see http://
* with increased rx tx buffers = 4096,
but results were the same: starting from some moment of time, there were a packet loss, despite on the fact that max throughput was only ~3-4Gbit/s from 10Gbit/s. Alexander Nevenchannyy (anevenchannyy) found the RC of packet loss. This caused by wrong IRQ balance. He discovered that during a particular iperf test there is a limit reached for throughput ~3-4Gbit and IRQ load was >91% for a single CPU core:
root@node-9:~# iperf --client 192.168.0.4 --format m --nodelay --len 8k --time 10 --parallel 1 -u -i1 -b10000M ------- ------- ------- ------- ------- ------- ------- ---- ------- ------- ------- ------- ------- ------- ------- ----
-------
Client connecting to 192.168.0.4, UDP port 5001
Sending 8192 byte datagrams
UDP buffer size: 0.20 MByte (default)
-------
[ 3] local 192.168.0.6 port 52336 connected with 192.168.0.4 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 531 MBytes 4452 Mbits/sec
[ 3] 1.0- 2.0 sec 528 MBytes 4425 Mbits/sec
[ 3] 2.0- 3.0 sec 535 MBytes 4485 Mbits/sec
[ 3] 3.0- 4.0 sec 536 MBytes 4498 Mbits/sec
[ 3] 4.0- 5.0 sec 538 MBytes 4514 Mbits/sec
[ 3] 5.0- 6.0 sec 538 MBytes 4517 Mbits/sec
[ 3] 6.0- 7.0 sec 535 MBytes 4490 Mbits/sec
[ 3] 7.0- 8.0 sec 537 MBytes 4507 Mbits/sec
[ 3] 8.0- 9.0 sec 538 MBytes 4517 Mbits/sec
[ 3] 9.0-10.0 sec 539 MBytes 4524 Mbits/sec
[ 3] 0.0-10.0 sec 5356 MBytes 4492 Mbits/sec
... and load stats was:
05:31:48 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
05:31:49 PM all 2.44 0.00 0.67 0.00 0.00 7.16 0.00 0.00 0.00 89.72
05:31:49 PM 0 7.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 92.86
05:31:49 PM 1 3.03 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00 95.96
05:31:49 PM 2 4.12 0.00 3.09 0.00 0.00 0.00 0.00 0.00 0.00 92.78
05:31:49 PM 3 3.92 0.00 1.96 0.98 0.00 0.00 0.00 0.00 0.00 93.14
05:31:49 PM 4 0.00 0.00 0.00 0.00 0.00 91.40 0.00 0.00 0.00 8.60
05:31:49 PM 5 4.04 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00 94.95
05:31:49 PM 6 1.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.99
05:31:49 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:31:49 PM 8 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
05:31:49 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:31:49 PM 10 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 98.00
05:31:49 PM 11 2.97 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 96.04
With one of CPU cores overloaded:
52 root 20 0 0 0 0 R 49.4 0.0 2:47.76 ksoftirqd/4
Hence, won't fix. This is not a rabbitmq mnesia issue, we have to address CPU load for IRQ balance, see http:// natsys- lab.blogspot. ru/2012/ 09/linux- scaling- softirq- among-many- cpu.html