qos max bandwidth rules not working for neutron trunk ports

Bug #1639186 reported by Luis Tomas Bolivar
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
In Progress
Low
Rodolfo Alonso

Bug Description

When using Qos together with Neutron trunk ports the max bandwidth limits are not applied, neither for ovs-hybrid, nor for ovs-firewall

The reason is that a new ovs bridge is created to handle the trunk (parent + subport) ports.
For instance:
    Bridge "tbr-c5402c58-3"
        Port "tpt-e739265b-2b"
            Interface "tpt-e739265b-2b"
                type: patch
                options: {peer="tpi-e739265b-2b"}
        Port "qvoe739265b-2b"
            Interface "qvoe739265b-2b"
        Port "spt-17c950c4-f5"
            tag: 101
            Interface "spt-17c950c4-f5"
                type: patch
                options: {peer="spi-17c950c4-f5"}
        Port "tbr-c5402c58-3"
            Interface "tbr-c5402c58-3"
                type: internal

Then, the _set_egress_bw_limit_for_port (https://github.com/openstack/neutron/blob/master/neutron/agent/common/ovs_lib.py#L553) is applied to tpi-e739265b-2b or spi-17c950c4-f5 (depending on if the qos rule is applied to the parent or the subport ports, respectively). However, these are of patch type, i.e., they are fully virtual and the kernel does not know about them, therefore the QoS rules are not applied.

To reproduce it:
- Enable QoS devstack local.conf:
    enable_plugin neutron https://github.com/openstack/neutron
    enable_service q-qos
- Enable trunk in neutron.conf:
    service_plugins = ... qos,trunk

- Create QoS rule
- Apply the qos rule to either parent or subport ports
- Test bandwidth limit (e.g., with iperf)

Tags: qos trunk
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Thanks for reporting, one way to fix this could be to change the patch ports into internal ports, that should come with a little throughput degradation compared to no flows.

Changed in neutron:
status: New → Confirmed
importance: Undecided → Low
milestone: none → ocata-1
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

compared to no flows = compared to patch ports. O:)

Changed in neutron:
assignee: nobody → Luis Tomas Bolivar (ltomasbo)
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Ajo: if we switch to internal ports, would everything still work with in conjunction with dpdk? Those are patch port for a reason.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am not sure I fully understand the implications of Ajo's proposal.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

@armax, yes, you're right, I also remember pointing out that patch ports were a better solution in those cases during the specs review.

Luis Tomas tried, and internal can't be used to link ovs bridges, he's trying veths now.

I'm sure veths wouldn't work with DPDK, and they also come with performance degradation, because the packets will jump in/out (ovs), veth, in/out(ovs) ... :/

But apparently there is people interested in containers without dpdk, who want to set qos constraints on the ports.

We need to think about how to satisfy the case if veths finally works. I guess here it boils down to two options:

1) Having a config flag to set them all as veths if the operator is interested in this specific case, which would work, but would not be very friendly to mixed environments, and it adds another config switch (puaghhh)

2) Signaling (somehow) trunk ports that we want the specific port wired as a veth, may be through a callback (BEFORE_QOS_BWLIMIT_APPLIED???), so if trunk ports is there will drop the patch and put a veth in place. If trunk port is not there, that would be ignored. And if there is any other extension which uses patch ports, we also give them the opportunity to move down to veths.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Ajo: thanks for the feedback. This isn't so much DPDK vs non-DPDK, but userspace vs kernel-space. What would it take to have QoS work with patch ports? I am reluctant to add complexity in the agent because of an OVS gap. The last time we did that we know how we ended up (hint hint)!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/397788

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

FWIW I read in [1] that QoS/DPDK were attention of major improvements. The release notes [2] for 2.6 state that QoS functionality with sample egress-policer implementation was enabled for DPDK. Can someone confirm what exactly is the gap caused by patch ports on 2.6?

Taken DPDK aside, if this boils down to QoS does not work with patch ports, what would it take to make it work with patch ports? I appreciate the use case, even in presence of older versions of open vswitch, but we need to have a clear understanding of the entire problem space before we rush into stop-gap solutions that are difficult to move away from (because they impose a migration cost).

[1] http://openvswitch.org/support/ovscon2016/7/0930-pettit.pdf (slide 3)
[2] http://openvswitch.org/releases/NEWS-2.6.0

Revision history for this message
Luis Tomas Bolivar (ltomasbo) wrote :

Hi Armando, I fully agree with you, this is not the desired solution, and the solution should be provided by OVS and then included into neutron. I did not mean to push this as a solution, it was just to try if that could work for the use case I have (QoS with trunk ports). That was the reason I just included the tag 'related-bug' instead of close-bug. But perhaps I should not have included any tag at all.

As regards to your comments:
- I see the 'egress-policer implementation for DPDK', but as this is from the openvswitch release notes, does this mean egress from the OVS bridge perspective or from the VM (and in that case ingress to the OVS bridge)? If the second, then I need to figure out how to make use of this at neutron level. If the former, then there is still the gap for the VM egress bw.

- Note I was not targeting any DPDK. What I tried is QoS with Trunk ports, i.e., trying to apply a max bw to one of the subports. The problem here was that, in normal QoS, the egrees-policer bw rule (ingress for OVS view) is applied on the tap device that connects the VM to the bridge (or the veth in case of non ovs-firewall). However, for the trunk port scenario, what is connected to the ovs br-int is, instead of the tap or veth device, a patch port that in turn connect to the ovs trunk-bridge that connects to the VM. And it seems the QoS rules are applied at the kernel level while patch port are fully virtual, so the kernel does not know them.

- A possible use case of QoS and trunk ports could be a kubernetes or OpenShift deployment inside openstack, where you want to leverage neutron functionality by using kuryr. In such a case, you may want some of the containers deployed inside the VMs to have bw limitations. Another use case would be for VNFs, where instead of having multiple vNICs, VLAN-Aware-VMs needs to be used. In that case, it will also be desired to have some QoS control for those VNF VMs.

Changed in neutron:
milestone: ocata-1 → ocata-2
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Luis: thanks for your feedback. You did nothing wrong, I only cautiously blocked the patch, for fear or accidental merge :)

I understand both the root cause and the use case we're discussing here. As for QoS capabilities in 2.6, my suggestion would be to reach out to the OVS/OVN community, if you have not done so already, to fully appreciate how to realize the use case.

Falling back on veth pairs would create another variable in the test matrix and it means giving up to security groups as provided by OVS firewall. I'd rather not go down that path, we should not work around existing limitations of the OVS platform. That has been costly in the past, and we should have learned our lesson by now.

Revision history for this message
Russell Bryant (russellb) wrote :

I just did a quick review of this issue and how trunk ports and QOS work in OVN. OVN implements both trunk ports and QOS in a different way and does not suffer from this issue.

QOS is implemented as queues on egress interfaces and we have flows that set the proper queue id based on the source of the packet.

It should work if you try qos with devstack and networking-ovn if anyone wants to poke around at how it works. http://docs.openstack.org/developer/networking-ovn/testing.html

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Then one could argue that another potential fix for this incompatibility is to make QoS management look a bit more OVN's.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/397788
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Changed in neutron:
milestone: ocata-2 → ocata-3
Changed in neutron:
milestone: ocata-3 → ocata-rc1
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Something similar to what russelb could be done if we get queues in place, which is also necessary to implement OVS minimum bandwidth.

Changed in neutron:
assignee: Luis Tomas Bolivar (ltomasbo) → nobody
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

Ralonsh and I will talk about this on the PTG.

Changed in neutron:
milestone: ocata-rc1 → pike-1
Changed in neutron:
milestone: pike-1 → pike-2
QunyingRan (ran-qunying)
Changed in neutron:
assignee: nobody → QunyingRan (ran-qunying)
assignee: QunyingRan (ran-qunying) → nobody
Changed in neutron:
milestone: pike-2 → none
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/839523

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/839523
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.