Comment 0 for bug 1905295

Revision history for this message
Bence Romsics (bence-romsics) wrote :

I'd like to bring the following idea to the drivers' meeting. If this still looks like a good idea after that discussion, I'll open a spec so this can be properly commented on in gerrit. Until then feel free to comment here of course.

# Problem Description

A general router can be configured to connect and route to multiple external networks for higher availability and/or to balance the load. However the current Neutron API syntax allows exactly one external gateway for a router.

https://docs.openstack.org/api-ref/network/v2/?expanded=create-router-detail#create-router

{
    "router": {
        "name": "router1",
        "external_gateway_info": {
            "network_id": "ae34051f-aa6c-4c75-abf5-50dc9ac99ef3",
            "enable_snat": true,
            "external_fixed_ips": [
                {
                    "ip_address": "172.24.4.6",
                    "subnet_id": "b930d7f6-ceb7-40a0-8b81-a425dd994ccf"
                }
            ]
        },
        "admin_state_up": true
    }
}

However consider the following (simplified) network architecture as an example:

+--+ +--+
|R3| |R4|
+--+ +--+
 | \ / |
 | \ / |
 | X |
 | / \ |
 | / \ |
+--+ +--+
|R1| |R2|
+--+ +--+
 | \ / |
 | \ / |
 | X |
 | / \ |
 | / \ |
+--+ +--+
|C1| |C2| ...
+--+ +--+

Where C1, C2, ... are compute nodes, R1 and R2 are OpenStack-managed routers, while R3 and R4 are provider edge routers. Between R1-R2 and R3-R4 Equal Cost Multipath (ECMP) routing is used to utilize all links in an active-active manner. In such an architecture it makes sense to represent R1 and R2 as 2 logical routers with 2-2 external gateways, or in some cases (depending on other architectural choices) even as 1 logical router with 4 external gateways. But with the current API that is not possible.

# Proposed Change

Extend the router API object with a new attribute 'additional_external_gateways', for example:

{
   "router" : {
      "name" : "router1",
      "admin_state_up" : true,
      "external_gateway_info" : {
         "enable_snat" : false,
         "external_fixed_ips" : [
            {
               "ip_address" : "172.24.4.6",
               "subnet_id" : "b930d7f6-ceb7-40a0-8b81-a425dd994ccf"
            }
         ],
         "network_id" : "ae34051f-aa6c-4c75-abf5-50dc9ac99ef3"
      },
      "additional_external_gateways" : [
         {
            "enable_snat" : false,
            "external_fixed_ips" : [
               {
                  "ip_address" : "172.24.5.6",
                  "subnet_id" : "62da64b0-29ab-11eb-9ed9-3b1175418487"
               }
            ],
            "network_id" : "592d4716-29ab-11eb-a7dd-4f4b5e319915"
         },
         ...
      ]
   }
}

Edited via the following HTTP PUT methods with diff semantics:

PUT /v2.0/routers/{router_id}/add_additional_external_gateways
PUT /v2.0/routers/{router_id}/remove_additional_external_gateways

We keep 'external_gateway_info' for backwards compatibility. When additional_external_gateways is an empty list, everything behaves as before. When additional_external_gateways are given, then the actual list of external gateways is (in Python-like pseudo-code): [external_gateway_info] + additional_external_gateways.

Unless otherwise specified all non-directly connected external IPs are routed towards the original external_gateway_info. However this behavior may be overriden by either using (static) extraroutes, or by running () routing protocols and routing towards the external gateway where a particular route was learned from.

# Alternatives

1) Using 4 logical routers with 1 external gateway each. However in this case the API misses the information which (2 or 4) logical routers represent the same backend router.

2) Using a VRRP HA router. However this provides a different level of High Availability plus it is active-passive instead of active-active.

3) Adding router interfaces (since their number is not limited in the API) instead of external gateways. However this creates confusion by blurring the line of what is internal and what is external to the cloud deployment.