Comment 4 for bug 1329029

Revision history for this message
Jeff Lane  (bladernr) wrote :

So I have half the equation solved.

The problem outside of iperf with having multiple NICs on the same
subnet is that the kernel routes things funny.

So you get things like this:

ubuntu@critical-maas:~$ sudo arping -I eth0 10.0.0.123
ARPING 10.0.0.123 from 10.0.0.1 eth0
Unicast reply from 10.0.0.123 [00:30:48:65:5E:0C] 0.745ms
Unicast reply from 10.0.0.123 [00:30:48:65:5E:0C] 0.779ms
Unicast reply from 10.0.0.123 [00:30:48:65:5E:0C] 0.757ms

ubuntu@critical-maas:~$ sudo arping -I eth0 10.0.0.128
ARPING 10.0.0.128 from 10.0.0.1 eth0
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0C] 0.887ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0C] 0.901ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0C] 0.849ms

As you can see, I have 2 ethernet devices on my 1U but when I arping
their addresses from another box, the MAC from eth0 replies... this
poisons the arp table and can cause all sorts of fun when sending a
ton of packets.

So did a LOT of playing around today and found the magical set of proc
settings to fix this:

net.ipv4.conf.all.arp_announce=1
net.ipv4.conf.all.arp_ignore=2

SHOULD work alone on older kernels, hoping the 3.2 in 12.04, maybe the
3.X in 12.04.4.

However that may not be enough... later kernels also changed the
behaviour of rp_filter so you have to set that too:

net.ipv4.conf.all.rp_filter=0

After setting these three on Trusty, we NOW get things correct:
ubuntu@critical-maas:~$ sudo arping -I eth0 10.0.0.128
[sudo] password for ubuntu:
ARPING 10.0.0.128 from 10.0.0.1 eth0
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0D] 0.937ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0D] 0.888ms
Unicast reply from 10.0.0.128 [00:30:48:65:5E:0D] 0.844ms

Now the correct physical device is responding to pings... so to confirm this:
ubuntu@supermicro:~$ netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 47106 0 0 0 1483445 0
0 0 BMRU
eth1 1500 0 42805 0 0 0 8179 0
0 0 BMRU
lo 65536 0 0 0 0 0 0 0
0 0 LRU
ubuntu@supermicro:~$ sudo ping -c 10000 -I eth1 -f 10.0.0.1
PING 10.0.0.1 (10.0.0.1) from 10.0.0.128 eth1: 56(84) bytes of data.

--- 10.0.0.1 ping statistics ---
10000 packets transmitted, 10000 received, 0% packet loss, time 2496ms
rtt min/avg/max/mdev = 0.166/0.216/0.367/0.010 ms, ipg/ewma 0.249/0.216 ms
ubuntu@supermicro:~$ netstat -ni
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 57046 0 0 0 1503462 0
0 0 BMRU
eth1 1500 0 52811 0 0 0 18179 0
0 0 BMRU
lo 65536 0 0 0 0 0 0 0
0 0 LRU

Notice NOW that when I ping out eth1, all outgoing and incoming
packets are on eth1, no longer split between eth0 and eth1.

The next problem is iperf binding... I've tried a couple times with -B
but all outgoing packets STILL seem to be going out eth0 (which is why
you see the TX-OK count for eth0 so high.