Floatingip and router bandwidth speed limit failure

Bug #1785189 reported by Lu lei
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Medium
Unassigned

Bug Description

Environment version: centos7.4
Neutron version: newton(also in pike,queen)

I have added these L3 QoS patches into newton branch:
https://review.openstack.org/#/c/453458/
https://review.openstack.org/#/c/424466/
https://review.openstack.org/#/c/521079/

But I don't think these patch is useful. For large bandwidths, the speed limit does not work at all.As long as the router speed limit, floatingip speed limit, scp file has been falling from 2Mbps, and finally interrupted. The iperf test is extremely unstable, sometimes 10 Mbps, sometimes 0bps.

For example,The rate limit rule of the router is limited to 1 Gbps, router netns is iperf client,
controller code is iperf server. Here is test result:

[root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 tc -s -p filter show dev qg-d2e58140-fa
filter parent 1: protocol ip pref 1 u32
filter parent 1: protocol ip pref 1 u32 fh 800: ht divisor 1
filter parent 1: protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid :1 (rule hit 7557 success 7525)
  match IP src 172.18.0.133/32 (success 7525 )
 police 0x15a rate 1024Mbit burst 100Mb mtu 2Kb action drop overhead 0b
ref 1 bind 1

 Sent 12795449 bytes 8549 pkts (dropped 969, overlimits 969)

iperf tests:
[root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 iperf3 -c 172.18.0.4 -i 1
Connecting to host 172.18.0.4, port 5201
[ 4] local 172.18.0.133 port 51674 connected to 172.18.0.4 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 119 KBytes 972 Kbits/sec 18 2.83 KBytes
[ 4] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec 5 2.83 KBytes
[ 4] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec 5 2.83 KBytes
[ 4] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec 5 2.83 KBytes
[ 4] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec 5 2.83 KBytes
[ 4] 5.00-6.00 sec 63.6 KBytes 522 Kbits/sec 37 2.83 KBytes
[ 4] 6.00-7.00 sec 1.64 MBytes 13.7 Mbits/sec 336 4.24 KBytes
[ 4] 7.00-8.00 sec 1.34 MBytes 11.2 Mbits/sec 279 2.83 KBytes
[ 4] 8.00-9.00 sec 1.96 MBytes 16.5 Mbits/sec 406 2.83 KBytes
[ 4] 9.00-10.00 sec 334 KBytes 2.73 Mbits/sec 75 2.83 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 5.44 MBytes 4.56 Mbits/sec 1171 sender
[ 4] 0.00-10.00 sec 5.34 MBytes 4.48 Mbits/sec receiver

iperf Done.

It is normal to use the command to delete the tc rule and do the bandwidth test.

[root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 tc filter del dev qg-d2e58140-fa parent 1: prio 1 handle 800::800 u32
[root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 tc -s -p filter show dev qg-d2e58140-fa
[root@node-1 ~]# ip netns exec qrouter-bf800d13-9ce6-4aa7-9259-fab54ec5ac05 iperf3 -c 172.18.0.4 -i 1
Connecting to host 172.18.0.4, port 5201
[ 4] local 172.18.0.133 port 47530 connected to 172.18.0.4 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 88.2 MBytes 740 Mbits/sec 1 407 KBytes
[ 4] 1.00-2.00 sec 287 MBytes 2.41 Gbits/sec 354 491 KBytes
[ 4] 2.00-3.00 sec 1.04 GBytes 8.94 Gbits/sec 1695 932 KBytes
[ 4] 3.00-4.00 sec 1008 MBytes 8.45 Gbits/sec 4233 475 KBytes
[ 4] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec 1542 925 KBytes
[ 4] 5.00-6.00 sec 1008 MBytes 8.45 Gbits/sec 4507 748 KBytes
[ 4] 6.00-7.00 sec 1.05 GBytes 9.06 Gbits/sec 1550 798 KBytes
[ 4] 7.00-8.00 sec 1.06 GBytes 9.08 Gbits/sec 1251 933 KBytes
[ 4] 8.00-9.00 sec 1.02 GBytes 8.77 Gbits/sec 3595 942 KBytes
[ 4] 9.00-10.00 sec 1024 MBytes 8.59 Gbits/sec 3867 897 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 8.54 GBytes 7.33 Gbits/sec 22595 sender
[ 4] 0.00-10.00 sec 8.54 GBytes 7.33 Gbits/sec receiver

iperf Done.

I am not sure if it is an individual phenomenon or someone else has encountered it.

Lu lei (lei-lu)
information type: Private Security → Public
Revision history for this message
LIU Yulong (dragon889) wrote :
Changed in neutron:
status: New → Invalid
Revision history for this message
LIU Yulong (dragon889) wrote :

Marked as invalid because the upstream neutron stable/newton does not support such feature.

Lu lei (lei-lu)
Changed in neutron:
status: Invalid → New
Revision history for this message
Miguel Lavalle (minsel) wrote :

Hi,

As already pointed out by Liu Yulong, Newton release does not support this feature. Marking bug invalid. Please don't change it again unless there is an explanation and further details added to this bug justifying that action

Changed in neutron:
status: New → Invalid
Revision history for this message
Lu lei (lei-lu) wrote :

hi, miguel ,liu yulong. I have tested it again. when sets mtu value to 64kb, everything goes well. Thx

Revision history for this message
Lu lei (lei-lu) wrote :
Download full text (5.2 KiB)

hi, miguel ,liu yulong. I'm sorry to reopen this bug again. I have deeply tested it again, when set mtu value to 64kb,The egress bandwidth limit of the VM and the routing gateway will be invalid. Test results are shown in the attachment.
I asked my colleague to read the kernel tc police code and give an explanation. Indeed, if the packet size is greater than the mtu set by the police, it will be dropped in the function tcf_act_police.
......
        if (qdisc_pkt_len(skb) <= police->tcfp_mtu) {
                if (!police->rate_present) {
                        spin_unlock(&police->tcf_lock);
                        return police->tcfp_result;
                }

                now = ktime_get_ns();
                toks = min_t(s64, now - police->tcfp_t_c,
                             police->tcfp_burst);
                if (police->peak_present) {
                        ptoks = toks + police->tcfp_ptoks;
                        if (ptoks > police->tcfp_mtu_ptoks)
                                ptoks = police->tcfp_mtu_ptoks;
                        ptoks -= (s64) psched_l2t_ns(&police->peak,
                                                     qdisc_pkt_len(skb));
                }
                toks += police->tcfp_toks;
                if (toks > police->tcfp_burst)
                        toks = police->tcfp_burst;
                toks -= (s64) psched_l2t_ns(&police->rate, qdisc_pkt_len(skb));
                if ((toks|ptoks) >= 0) {
                        police->tcfp_t_c = now;
                        police->tcfp_toks = toks;
                        police->tcfp_ptoks = ptoks;
                        spin_unlock(&police->tcf_lock);
                        return police->tcfp_result;
                }
        }

        police->tcf_qstats.overlimits++;
        if (police->tcf_action == TC_ACT_SHOT)
                police->tcf_qstats.drops++;
......

I inserted some code with jprobe here and observed it.
......
       if (qdisc_pkt_len(skb) > police->tcfp_mtu) {
                printk(KERN_INFO "gerald: qdisc skb len: %d, mtu %d, packet is too big\n", qdisc_pkt_len(skb), police->tcfp_mtu);
                printk(KERN_INFO "gerald: skb len: %d, data_len: %d\n", skb->len, skb->data_len);
                if (police->tcf_action == TC_ACT_SHOT)
                        printk(KERN_INFO "gerald: drop it, total %d\n", police->tcf_qstats.drops);
        }
......

After running iperf3, MTU is 2K, most of the packets are dropped, 4K, 8K all the way to 64K. Because ip fragmentation will cut the packet into small packets according to the size of MTU.

The information I printed as follow:
MTU 2K
[1251697.476412] gerald: skb len: 2962, data_len: 2896
[1251697.476415] gerald: qdisc skb len: 3028, mtu 2048, packet is too big
[1251697.476416] gerald: skb len: 2962, data_len: 2896
[1251697.476419] gerald: qdisc skb len: 3028, mtu 2048, packet is too big
[1251697.476421] gerald: skb len: 2962, data_len: 2896
[1251697.476423] gerald: qdisc skb len: 3028, mtu 2048, packet is too big
[1251697.476425] gerald: skb len: 2962, data_len: 2896
[1251697.477361] gerald: qdisc skb len: 4542, mtu 2048, packet is too big
[1251697.477364] gerald: skb len: 4410, da...

Read more...

Revision history for this message
Lu lei (lei-lu) wrote :

Here is attachment.

description: updated
Changed in neutron:
status: Invalid → New
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Thanks for detailed report.
So You are suggesting that increase of hardcoded mtu=64kbit for filters in l3_tc_lib: https://github.com/openstack/neutron/blob/master/neutron/agent/linux/l3_tc_lib.py#L120 up to e.g. 80kb should help to solve that issue? Am I understand correct You?

Revision history for this message
Lu lei (lei-lu) wrote :

hi, slawek. To set mtu value to 70kb is enough.

Lu lei (lei-lu)
Changed in neutron:
assignee: nobody → Lu lei (lei-lu)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/596637

Changed in neutron:
status: New → In Progress
Changed in neutron:
assignee: Lu lei (lei-lu) → Slawek Kaplonski (slaweq)
Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → Brian Haley (brian-haley)
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Uhmm, I'm not sure if what I posted is the same issue, in our case the speed is not being limited.

Changed in neutron:
assignee: Brian Haley (brian-haley) → nobody
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

As commented in the patch (https://review.openstack.org/#/c/596637):

I tried to reproduce this bug injecting packets both from a VM and the router namespace.

If the transmission is limited (VM interface MTU), there are no re-transmissions or packets dropped. When executing iperf from the router namespace, I manually limit the packet size:

  iperf3 -c 192.168.222.2 --set-mss 2048 -l 2048 # MTU 2048

I tested several MTU sizes (2K, 9K and 64K), always limiting the write buffer size to a maximum of 64K. When the MSS size is fixed and the write buffer of iperf is limited, I don't see any performance reduction (BW is keep to the limit defined in the filter) and the number of retransmissions limited to a reasonable number. Another considerations are that there is no IP fragmentation and there are no requests back to reduce the window size (PMTUD is a router functionality)

About the packet size increase detected in [1], all I can say is that TC filter marks the packets using fwmarks. Those marks are part of the skb packet, not the IP packet or the ethernet frame itself [2].

Because no IPv4 packet must exceed the 64KB limitation, increasing this number in the filter makes no sense.

[1] https://bugs.launchpad.net/neutron/+bug/1785189/comments/5
[2] https://docs.huihoo.com/hpc-cluster/linux-virtual-server/HOWTO/LVS-HOWTO.fwmark.html#LVS-HOWTO.fwmark

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/596637
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.