Comment 4 for bug 1832021

Revision history for this message
David Ames (thedac) wrote : Re: Checksum drop of metadata traffic on isolated provider networks with DPDK

Further testing shows the provider network is irrelevant. With DPDK and an isolated network (qdhcp only no qrouter) either GRE or provider, any traffic initiated by the qdhcp netns, including response traffic, gets an incorrect TCP checksum.

This packet gets put on the "wire" and it is the VM that drops the packet due to an invalid TCP checksum.

In a DPDK isolated network environment from the qdhcp netns you can see this in action with an arbitrary netcat call:

nc -vz $VM_IP 73 (Any TCP port)

tcpdump on the VM side and you can see

22:06:31.424716 IP (tos 0x0, ttl 64, id 14532, offset 0, flags [DF], proto TCP (6), length 60)
    172.20.0.2.39784 > 172.20.0.6.73: Flags [S], cksum 0x585f (incorrect -> 0x0437), seq 114680395, win 26880, options [mss 8960,sackOK,TS val 1660321155 ecr 0,nop,wscale 7], length 0
22:06:39.616633 IP (tos 0x0, ttl 64, id 14533, offset 0, flags [DF], proto TCP (6), length 60)
    172.20.0.2.39784 > 172.20.0.6.73: Flags [S], cksum 0x585f (incorrect -> 0xe436), seq 114680395, win 26880, options [mss 8960,sackOK,TS val 1660329347 ecr 0,nop,wscale 7], length 0
22:06:55.744502 IP (tos 0x0, ttl 64, id 14534, offset 0, flags [DF], proto TCP (6), length 60)
    172.20.0.2.39784 > 172.20.0.6.73: Flags [S], cksum 0x585f (incorrect -> 0xa536), seq 114680395, win 26880, options [mss 8960,sackOK,TS val 1660345475 ecr 0,nop,wscale 7], length 0

So the VM sees the response traffic from the qdhcp netns but drops it because the TCP checksum is invalid.

When we turn on DVR and create a virtual router (unused) the qrouter netns does not have this problem. I have not root caused why but there are a number of other iptables settings in the qrouter netns that are not in the qdhcp that may be required.

So what we are looking for is differences in the setup of the qdhcp netns from the qrouter netns.