Comment 5 for bug 1785189

Revision history for this message
Lu lei (lei-lu) wrote :

hi, miguel ,liu yulong. I'm sorry to reopen this bug again. I have deeply tested it again, when set mtu value to 64kb,The egress bandwidth limit of the VM and the routing gateway will be invalid. Test results are shown in the attachment.
I asked my colleague to read the kernel tc police code and give an explanation. Indeed, if the packet size is greater than the mtu set by the police, it will be dropped in the function tcf_act_police.
......
        if (qdisc_pkt_len(skb) <= police->tcfp_mtu) {
                if (!police->rate_present) {
                        spin_unlock(&police->tcf_lock);
                        return police->tcfp_result;
                }

                now = ktime_get_ns();
                toks = min_t(s64, now - police->tcfp_t_c,
                             police->tcfp_burst);
                if (police->peak_present) {
                        ptoks = toks + police->tcfp_ptoks;
                        if (ptoks > police->tcfp_mtu_ptoks)
                                ptoks = police->tcfp_mtu_ptoks;
                        ptoks -= (s64) psched_l2t_ns(&police->peak,
                                                     qdisc_pkt_len(skb));
                }
                toks += police->tcfp_toks;
                if (toks > police->tcfp_burst)
                        toks = police->tcfp_burst;
                toks -= (s64) psched_l2t_ns(&police->rate, qdisc_pkt_len(skb));
                if ((toks|ptoks) >= 0) {
                        police->tcfp_t_c = now;
                        police->tcfp_toks = toks;
                        police->tcfp_ptoks = ptoks;
                        spin_unlock(&police->tcf_lock);
                        return police->tcfp_result;
                }
        }

        police->tcf_qstats.overlimits++;
        if (police->tcf_action == TC_ACT_SHOT)
                police->tcf_qstats.drops++;
......

I inserted some code with jprobe here and observed it.
......
       if (qdisc_pkt_len(skb) > police->tcfp_mtu) {
                printk(KERN_INFO "gerald: qdisc skb len: %d, mtu %d, packet is too big\n", qdisc_pkt_len(skb), police->tcfp_mtu);
                printk(KERN_INFO "gerald: skb len: %d, data_len: %d\n", skb->len, skb->data_len);
                if (police->tcf_action == TC_ACT_SHOT)
                        printk(KERN_INFO "gerald: drop it, total %d\n", police->tcf_qstats.drops);
        }
......

After running iperf3, MTU is 2K, most of the packets are dropped, 4K, 8K all the way to 64K. Because ip fragmentation will cut the packet into small packets according to the size of MTU.

The information I printed as follow:
MTU 2K
[1251697.476412] gerald: skb len: 2962, data_len: 2896
[1251697.476415] gerald: qdisc skb len: 3028, mtu 2048, packet is too big
[1251697.476416] gerald: skb len: 2962, data_len: 2896
[1251697.476419] gerald: qdisc skb len: 3028, mtu 2048, packet is too big
[1251697.476421] gerald: skb len: 2962, data_len: 2896
[1251697.476423] gerald: qdisc skb len: 3028, mtu 2048, packet is too big
[1251697.476425] gerald: skb len: 2962, data_len: 2896
[1251697.477361] gerald: qdisc skb len: 4542, mtu 2048, packet is too big
[1251697.477364] gerald: skb len: 4410, data_len: 4344

MTU 8K
[1251822.171764] gerald: qdisc skb len: 9084, mtu 8192, packet is too big
[1251822.171767] gerald: skb len: 8754, data_len: 8688
[1251822.171776] gerald: qdisc skb len: 9084, mtu 8192, packet is too big
[1251822.171778] gerald: skb len: 8754, data_len: 8688
[1251822.171781] gerald: qdisc skb len: 9084, mtu 8192, packet is too big
[1251822.171783] gerald: skb len: 8754, data_len: 8688
[1251822.171786] gerald: qdisc skb len: 9084, mtu 8192, packet is too big
[1251822.171787] gerald: skb len: 8754, data_len: 8688

MTU 64K
1251890.037405] gerald: qdisc skb len: 68130, mtu 65536, packet is too big
[1251890.037406] gerald: skb len: 65226, data_len: 63704
[1251890.037410] gerald: qdisc skb len: 68130, mtu 65536, packet is too big
[1251890.037412] gerald: skb len: 65226, data_len: 63704
[1251890.037416] gerald: qdisc skb len: 68130, mtu 65536, packet is too big
[1251890.037417] gerald: skb len: 65226, data_len: 63704
[1251890.037421] gerald: qdisc skb len: 68130, mtu 65536, packet is too big
[1251890.037423] gerald: skb len: 65226, data_len: 63704

   After finished ip fragmentation, the packets in qdisc are still larger than MTU and it used to judge the size of skb in qdisc. Because the traffic control of kernel will re-send some information to the packet, used for packet scheduling.So the packet size will get bigger again. I'm not sure if the packet sent to the virtual NIC will have some more information into the packet... It may be that these reasons lead to the increase of the packet after the fragmentation is completed.
  So the packet size in qdisc will exceed the MTU, and the drop should be the tc police itself to calculate whether the packet exceeds the MTU.
  Because the maximum TCP packet is 64K, plus the kernel traffic control and virtual NIC (this is still uncertain) plus the information packet size will probably reach the 68130 as seen in the above log, if you want to avoid the packet is blocked by the tc police MTU block
The tTU police's MTU is set to 70K, and the insurance is set to 80K. I set tc police MTU value to 70K. The bandwidth limit seen by iperf is normal.