[RFE] support a port-behind-port API

Bug #1730845 reported by Omer Anson
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Wishlist
Unassigned

Bug Description

This RFE requests a unified API for a port-behind-port behaviour. This behaviour has a few use-cases:

* MACVLAN - Identify that a port is behind a port using Allowed Address Pairs, and identifying the behaviour based on MAC.

* HA Proxy behind Amphora - Identify that a port is behind a port using Allowed Address Pairs and identifying the behaviour based on IP.

* Trunk Port (VLAN aware VMs) - Identify that a port is behind a port using the Trunk Port API and identifying the behaviour based on VLAN tags.

This RFE proposes to extend the Trunk Port API to support the first two use-cases. The rationale is that in an SDN environment, it makes more sense to explicitly state the intent, rather than have the implementation infer the intent by matching Allowed Address Pairs and other existing ports.

This will allow implementations to handle these use cases in a simpler, flexible, and more robust manner than done today.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Hi Omer,

1) What is "port-behind-port behaviour"?

2) How do you envision the extension of the API?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The RFE description is intriguing but it's rather abstract and hard to wrap my head around. Can you elaborate on what you mean by 'simpler, flexible and more robust manner' by identifying what are the complex, inflexible and brittle ways of addressing the use cases you presented?

Changed in neutron:
importance: Undecided → Wishlist
status: New → Incomplete
Revision history for this message
Omer Anson (omer-anson) wrote :

1. port-behind-port behaviour: A situation where one port is not bound, but is reachable by sending the packet to another port (possibly on a different network). The RFE contains three examples of this behaviour (MACVLAN, amphora, trunk ports).

2. I was thinking of extending the trunk API, and extending the segmentation type to include other values (MACVLAN, IPVLAN, etc.), and include other fields if necessary (Specifically, MACVLAN and IPVLAN can make do with no additional information, but future types might).

simpler, flexible, and more robust: Once the tenant (or other projects e.g., Octavia, Kuryr) specified what they *want* to do, rather than *how* to do it, then the API becomes simpler to understand (and us humans don't need to infer the topology as well. It will be given to us). This also allows the API to be more easily extended when new related features are needed (flexible).

Robustness comes from the implementation knowing exactly *what* needs to happen, rather than inferring it from the *how*. e.g. instead of inferring that if one port's IP/MAC is in another port's allowed address pairs, then traffic to the former should be passed to the latter, knowing that the former is 'behind' the latter allows for an implementation that is less dependant on heuristics, and therefore has less room for error.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Hi Omer,

To tell you the truth, I still find this pretty abstract. To help simple mortals like me understand this better, could we walk through a specific example, let's say based on macvlan:

1) what cli commands /api calls would be used today to implement it?

2) what cli commands / api calls would be used under the new api?

3) benefits comparing the new case vs. the old case

Thanks

Revision history for this message
Omer Anson (omer-anson) wrote :

So assume you have a container with port1 behind a VM port (or nested in the VM port) port2.

Today you'd create both ports, and then add one to the allowed address pair of the other:
openstack port create --network net1 port1
openstack port create --network net2 port2
openstack port set --allowed-address ip-address=<ip of port1> port2

In the new API, you would state explicitly that port1 is behind port2. e.g.
openstack port create --network net1 port1
openstack port create --network net2 port2
openstack nested-port create --child port2 --parent port1 --type ipvlan

ipvlan type is selected since the MAC was not specified. Otherwise, the two commands would be:
openstack port set --allowed-address ip-address=<ip of port1>,mac-address=<mac of port1> port2
and
openstack nested-port create --child port2 --parent port1 --type macvlan

Since this RFE also proposes to extend trunk's API, creating a trunk port can be rewritten as:
openstack nested-port create --child port2 --parent port1 --type vlan --segmentation-id 7

(It should go without saying that removing support for the old trunk API is *not* proposed)

The benefit is as I outlined above. Instead of the implementation having to heuristically infer the logic (and from my understanding of the DVR code, this is done specifically for lbaas), the desired behaviour is given explicitly.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Thank you for the explanation. It is much clearer now.

Do we actually need a new API? Based in your description above, we already can do all the things the API would do. Could this be achieved adding commands to the client and the client translating to the already existing APIs? If I am understanding correctly, that is what the API would do, right?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I would agree with Miguel here, and I am not sure I appreciate the strength of the proposed model. The command you propose maps exactly to a trunk one:

openstack nested-port create --child port2 --parent port1 --type vlan --segmentation-id 7

maps to:

openstack trunk create --parent-port port1 --subport port=port2,segmentation-type=vlan,segmentation-id=7

There's a reason why the segmentation type/ID is per subport (and that is to potentially open up to L2GW-type of scenarios), which seems to be the only difference with your proposal.

As for new segmentation types, the API today is not prescriptive to what types are allowed, and it should not be hard to add extra stuff to the sub-port resource (beyond today's segmentation type and ID). That said, I'd be wary to bloat the API without justifying with a solid use case.

Back to you Omer, ball is in your court :)

Revision history for this message
Omer Anson (omer-anson) wrote :

Starting with Armando's comment, yes. The RFE proposes to extend the trunk API. As such, the private case of trunk port should stay the same. The change in naming only exists to show that the API is generalised to support all types of nested ports: e.g., trunk, macvlan, or ipvlan. This is not where the strength of the model lies.

The strength of the model is providing objects similar to today's trunk and subport for macvlan and ipvlan, as they are used e.g. by octavia (HA proxy port behind amphora VM port), or kuryr and kubernetes (containers are nested as MACVLANs within a VM). The plan is not to translate these commands to Allowed Address Pairs commands. On the contrary - the plan is to move away from allowed address pairs, and encode everything in trunk and subport objects. The reason for this is to give the network implementation an idea of *what* to do, rather than *how* to do it. This will allow the implementations to implement the exact behaviour rather than guessing that this behaviour is wanted. Additionally, this API allows the nested ports (or subports) to behave as ports when attaching e.g., security groups, and floating IPs, rather than having an unbound port that happens to have the same IP/MAC as the allowed address pair on another port.

The existing API also raises some questions: e.g. How to handle IP overlap? With the proposed API, the specific port is given, and therefore it's network is also known. This problem doesn't exist.

The implementation in DVR for this behaviour is limited to Loadbalancer and LoadbalancerV2. With an explicit API, the system can know exactly when this scenario is relevant.

This also helps to debug the system. By knowing which port is referenced where, it is easier to keep track of everything rather than guessing that one port is accessed via another, since they both have the same IP (in effect).

Lastly, as I said above, using a heuristic to guess the intended behaviour is prone to making mistakes. Knowing the intention in advance allows for more robust solutions.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I think I understand partially the problem you are stating: the (ab)use of allowed address pairs to address a number of scenarios (e.g. VRRP with tenant instances) has been brittle at best. If you are calling for first class resources in the API to describe some of the scenarios you have in mind (e.g. container networking in virtualized environments) that would lead to a more declarative vs a more imperative/procedural approach to building the required topologies, then I am all in for it.

@Omer: do you think I am slowly getting at what you're hinting all along?

If so, I am starting to like your thinking :) but I see two issues with your current description:

a) We need a clearer/simpler use case(s) to adopt as framework to reason about the proposal of new API extensions.
b) We do not necessarily need to operate within the constraint of an existing API, such as the trunk one, unless the proposals/extensions slot in seamlessly. We should be able to judge more wisely here one we have identified cases at point a).

Thoughts?

Revision history for this message
Omer Anson (omer-anson) wrote :

> @Omer: do you think I am slowly getting at what you're hinting all along?
Yes. This is exactly it!

a) These are the scenarios I ran into when using allowed address pairs. This RFE refers only to the first two:

a.1. MACVLAN - The VM knows the packet's final destination by its destination MAC. e.g. container network within a VM

a.2. IPVLAN - The VM knows the packet's final destination by its destination IP (The MAC is the VM's). e.g. HAProxy behind Amphora VMs, as used in Octavia.

a.3. As mentioned by @Armando, load-balancing with VRRP, where both ports share the same VIP as an allowed address pair.

a.4. A port behaves as a router. By placing a subnet in a port's allowed address pair, the routing to that subnet is done via that port.

I was hoping that the clear/simple scenario would be the general case of nested ports, where one port (that is known to Neutron) is not actually bound, but reachable by sending the packet (perhaps with some mangling, e.g., adding VLAN tags, or changing the destination MAC address) to another port (also known to Neutron). This covers the three use-cases listed in the original bug description.

b) The idea to extend the trunk API was proposed due to the similarities in general behaviour (port behind port). The gist of this RFE is creating a declarative (and extensible) API for e.g. MACVLAN and IPVLAN scenarios, not necessarily enforcing this API on other features.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Omer, I think in principle this is similar to what I was stating in [1]: formalizing the relationship between these entities improves our ability to reason about logical artefacts and improve the robustness of the overall solution, which now relies on implicit connections between entities based on second-level attributes (e.g. IP, MACs).

My suggestion is to start without taking into consideration the constraints of operating within a predefined API only to see at a later date whether there's is an opportunity (while we iterate on the proposal) to 'refactoring' the existing approach like the trunk one.

[1] https://bugs.launchpad.net/neutron/+bug/1583694/comments/31

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
Miguel Lavalle (minsel)
Changed in neutron:
status: Expired → Incomplete
Miguel Lavalle (minsel)
Changed in neutron:
status: Incomplete → Confirmed
Akihiro Motoki (amotoki)
tags: added: rfe-confirmed
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

I believe we have to stick to the trunk port API as much as we can, and extend it to provide the new functionality.

Revision history for this message
Michael Johnson (johnsom) wrote :

I am not sure I see the application of this in the Octavia/amphora case.

Octavia uses allowed address pairs in a very simple use case. We need to allow a secondary IP address on a given port. This secondary IP address is then used for VRRP as the VIP of the load balancer. VRRP GARPs (Hey, I'm over here now!) this secondary IP on the instance that currently has ownership of the IP address to update the ARP tables on the hosts located on the subnet, directing future L2 packets to the new MAC.

We are not really stacking ports on top of each other or using IP based VLANs, we are simply adding a secondary IP address to a neutron port. The current mechanism for this is via the allowed address pairs implementation.

On linux the equivalent is adding a "eth0:1" secondary IP or enabling non-local binding in the kernel.

My interpretation of a.2 IPVLAN above would actually break our amphora VRRP implementation by bypassing or trying to "manage" the MAC that currently owns the IP address as opposed to continuing to use the GARP method we use today(also widely used outside of OpenStack). It is imperative that the IP migration between instances be autonomous and expedient to allow for a fast failover. We cannot call out to neutron to announce the migration of the IP from one instance to another as this would introduce excessive latency in the migration of the IP. This failover is sub-second.

Revision history for this message
Miguel Lavalle (minsel) wrote :

We discussed this RFE during the Dublin PTG. Please see the "Port behind port" section here: http://lists.openstack.org/pipermail/openstack-dev/2018-March/128183.html. Next step is to develop a spec

Revision history for this message
Ian Wells (ijw-ubuntu) wrote :

I don't know whether the bonding spec relates to this or not - I can't actually work it out in my head whether it would be upside down to this model - but it does seem to have that same parent-child property (multiple physical links and VIFs and one logical port and IP address, in this instance).

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Omer,

Is there still interest in pursuing this RFE? Per conversation in Dublin, next step is to put together a spec

Revision history for this message
Miguel Lavalle (minsel) wrote :

We haven't heard back from submitter for several months. I will marks this rfe as invalid. If there is still desire to pursue it, please mark it again as new and we will re-take the conversation

Changed in neutron:
status: Confirmed → Invalid
tags: removed: rfe rfe-confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.