Comment 3 for bug 1832021

Revision history for this message
David Ames (thedac) wrote : Re: Checksum drop of metadata traffic on isolated provider networks

We de-duplicated this bug as we have narrowed the focus. This is DPDK specific.

When using isolated provider networks AND DPDK metadata is dropped due to incorrect TCP checksum. Specifically when the provider interface is a DPDK interface.

It is temporarily mitigated by adding the iptables rule:
iptables -t mangle -A OUTPUT -o ns-+ -p tcp --sport 80 -j CHECKSUM --checksum-fill
However this is not sustainable as any restart of openvswitch will clear this setting.

Here is an example of a tcpdump in the qdhcp netns:

    172.20.0.23.50060 > 169.254.169.254.80: Flags [S], cksum 0xb3c1 (correct), seq 3293184694, win 29200, options [mss 1460,sackOK,TS val 78090881 ecr 0,nop,wscale 7], length 0
23:36:43.932728 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)

    169.254.169.254.80 > 172.20.0.23.50060: Flags [S.], cksum 0x0057 (incorrect -> 0x2b02), seq 2915867972, ack 3293184695, win 28960, options [mss 1460,sackOK,TS val 4009971593 ecr 78090881,nop,wscale 7], length 0

Continuing with re-transmissions of the same. Note the incorrect cksum.

With the mangle rule in place:

    172.20.0.4.46706 > 169.254.169.254.80: Flags [S], cksum 0xbf25 (correct), seq 4115688639, win 29200, options [mss 1460,sackOK,TS val 2390126443 ecr 0,nop,wscale
7], length 0
23:40:01.510745 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    169.254.169.254.80 > 172.20.0.4.46706: Flags [S.], cksum 0xe3f3 (correct), seq 1829822633, ack 4115688640, win 28960, options [mss 1460,sackOK,TS val 812998113 e
cr 2390126443,nop,wscale 7], length 0
23:40:01.510919 IP (tos 0x0, ttl 64, id 38572, offset 0, flags [DF], proto TCP (6), length 52)
    172.20.0.4.46706 > 169.254.169.254.80: Flags [.], cksum 0x82fb (correct), seq 1, ack 1, win 229, options [nop,nop,TS val 2390126443 ecr 812998113], length 0
23:40:01.510974 IP (tos 0x0, ttl 64, id 38573, offset 0, flags [DF], proto TCP (6), length 229)
    172.20.0.4.46706 > 169.254.169.254.80: Flags [P.], cksum 0x5e40 (correct), seq 1:178, ack 1, win 229, options [nop,nop,TS val 2390126443 ecr 812998113], length 1
77: HTTP, length: 177
        GET /openstack HTTP/1.1
        Host: 169.254.169.254
        User-Agent: Cloud-Init/19.1-1-gbaa47854-0ubuntu1~18.04.1
        Accept-Encoding: gzip, deflate
        Accept: */*
        Connection: keep-alive

23:40:01.553043 IP (tos 0x0, ttl 64, id 62471, offset 0, flags [DF], proto TCP (6), length 52)
    169.254.169.254.80 > 172.20.0.4.46706: Flags [.], cksum 0x8219 (correct), seq 1, ack 178, win 235, options [nop,nop,TS val 812998156 ecr 2390126443], length 0
23:40:02.036984 IP (tos 0x0, ttl 64, id 62472, offset 0, flags [DF], proto TCP (6), length 252)
    169.254.169.254.80 > 172.20.0.4.46706: Flags [P.], cksum 0xd303 (correct), seq 1:201, ack 178, win 235, options [nop,nop,TS val 812998640 ecr 2390126443], length
 200: HTTP, length: 200
        HTTP/1.1 200 OK
        Content-Type: text/plain; charset=UTF-8
        Content-Length: 83
        Date: Thu, 13 Jun 2019 23:40:02 GMT

        2012-08-10
        2013-04-04
        2013-10-17
        2015-10-15
        2016-06-30
        2016-10-06
        2017-02-22
        latest[!http]

We can provide tcpdump pcap files and any other information required.