[RFE] allow to have no default route in DHCP host routes

Bug #1717560 reported by Thomas Morin
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Medium
Unassigned

Bug Description

When a user wants a VM with multiple interfaces, if DHCP gives a default route for all of the corresponding subnets, the actual interface that will actually be used as a default is not easily predictable (depends on the order in which interfaces are enabled in the VM, and on when DHCP offers are received and processed).

A solution to this can be to *not* set a default gateway on the subnets which we don't want to use as a default, but it is only applicable if there is no need to use these interfaces to reach one or more (non default) prefixes.

In the case where one interface needs to be the default and one or more other interfaces are used to reach other subnets via a router, what people most often do is have custom teaks via cloud-init that fix routing, but this is of course cumbersome.

This is an RFE for introducing an API extension for a new 'default_route' attribute on the subnet resource, this attribute would default to true (current behavior), and that could be set to false by a user whenever there is a need to *not* have a default route on the router.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

see also bug 1718954

Changed in neutron:
status: New → Confirmed
importance: Undecided → Wishlist
summary: - allow to have no default route in DHCP host routes
+ [RFE] allow to have no default route in DHCP host routes
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The proposal sounds problematic in that a subnet with no default route may lead a VM with a single interface to lack a default gateway :(

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

How does this work in 'real life'? At first I feel like we have to complicate our lives too much, though I appreciate the source of contention that may arise when a host connects to multiple networks.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I mean, there's steps one can take to address this issue, and I am not sure that an API to alter the behavior of the DHCP service is the right approach.

Changed in neutron:
status: Confirmed → Triaged
Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Armando,

For me the goal of this RFE is to open the discussion, then indeed perhaps the outcome will be a different proposition than my initial one. For the sake of the discussion let me answer, to see how far the proposition holds...

To comment #2: the intent is that 'default_route' would default to True, and only in the case where a subnet would *not* be used to connect to anything beyond it, would it be set to False by a tenant. In that case the tenant would know what to expect in terms of connectivity (ie. no connectivity beyond the subnet).

To comment #3:
A 'real life' scenario would be, assuming a VM needs connectivity to networks/subnets A, B and C, with A preferred for default routing. The subnet would be created with default_route=False for B and C. Does this complicate life a lot ?

To comment #4:
Let's see what would be the alternative to avoid cumbersome network scripts in VMs...
A generic way for linux cloud distros to choose which interface should be the default based on a standard way of controlling this through user-data ? ( but how long before this is usable generically in the wild...?)
Other ideas ?

Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

I do not understand how your new option would behave different than setting "gateway=None" on your subnet. You can still attach a router to the subnet and use host-routes via that router if you do so. This sequence of commands seems to do just the right thing it seems:

openstack network create priv3
openstack subnet create --network priv3 --subnet-range 10.10.42.0/24 --gateway none --host-route destination=10.10.43.0/24,gateway=10.10.42.1 priv3a --allocation-pool start=10.10.42.2,end=10.10.42.252
openstack port create rp3 --fixed-ip subnet=priv3a,ip-address=10.10.42.1 --network priv3
openstack router create r3
openstack router add port r3 rp3

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Thomas: I may have articulated my point very poorly. My bad.

What I meant by 'real-life scenario' was: in a world without cloud orchestration and APIs, how would one go about solving the problem of ensuring traffic takes the right path when multiple interfaces are involved? It sounds like j-harbott said in CLI what I meant with words :)

I think there's a solution to this problem that does not require expressing the user intention via a flag API.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Ah yes, ok, my "custom teaks via cloud-init that fix routing" was too elliptic.

There are various way you can tweak things, and the details will depend on the distro, would possibly depend on whether DHCP is used or not. The ingredient to the recipes are typically:
- controlling the order in which interfaces are brought up (and waiting for one to br completely brought up before bringing the other one up)
- distro specific tweaks like (e.g. Centos [1], GATEWAYDEV=ethX in /etc/sysconfig/network, or RHEL DEFROUTE="no")
- letting the base system bring the network up and fix the default afterwards with "ip route add default dev x via 1.2.3.4"
- have two routes (0.0.0.0/1 and 128.0.0.0/1) on the preferred interface, that will override the default route

In these recipes, you often run into one issue: you would most often like to avoid having to hardcode network subnet addresses in your scripts, but the alternative consisting in relying on interface names and order doesn't always work (again distro specific).

In a context where could-init is used, there must be ways to do all this via cloud-init config, but I haven't explored (see eg.[2]).

In all the cases above, the solution remains something that people have to master, with distro-specific things and details to get right, versus something that openstack neutron would get right for you. Which is why I believe we could improve neutron to get this right for people.

[1] https://thornelabs.net/2014/09/03/configure-multiple-network-interfaces-on-an-openstack-instance.html
[2] https://github.com/coreos/bugs/issues/212

Revision history for this message
Lukas Stehlik (stelucz) wrote :

Hello,

default route is advertised even though you set "Disable gateway". Check following network_data.json http://paste.ubuntu.com/25725747/

This will cause that cloud-init (Ubuntu 16 and higher) will create following config: http://paste.ubuntu.com/25725748/

The default route from ens3 interface is ignored, but only because route command fails with "0.0.0.0" as gateway.

Thus option for no default route has purpose.

see https://bugs.launchpad.net/nova/+bug/1718954

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

> default route is advertised even though you set "Disable gateway". Check following network_data.json > http://paste.ubuntu.com/25725747/

> This will cause that cloud-init (Ubuntu 16 and higher) will create following config:
> http://paste.ubuntu.com/25725748/

(I am surprised that a default route is still present even with disable_gateway=True, this is certainly worth clarifying/solving, we are lucky that the corresponding default route can be ignored, but this really is a separate issue:

The contexts I'm looking at are the contexts where, we don't want a gateway to be the default, but we *do* *want* a gateway on the subnet, because the subnet needs to be used to reach some destinations; "disable_gateway" does not address this need.

Revision history for this message
Lukas Stehlik (stelucz) wrote :

Thomas: +1

Yes, I know, that's the reason why i have initiated bug (created by Scott) https://bugs.launchpad.net/neutron/+bug/1718954 - we have probably same scenario or at least similar one.

I just wanted to point out that setting "Disable gateway" is not solution for this case.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Ok, Lukas we're in line.

Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

i skimmed all comments but failed to understand what's wrong with j-harbott's example.

Revision history for this message
Lukas Stehlik (stelucz) wrote :

Problem is that the default route is advertised even though you set "Disable gateway".

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

We revived the discussion about this request during today's drivers meeting. That prompted me to look into this more closely. I must admit I have not tried this myself yet in practice, but I wonder if the dhcp extra option on ports can come to the rescue here [1,2,3]. If I understand this correctly, it should be possible to clear the dhcp-option = option:router on a given port and that should allow dnsmasq to stop advertising the default route. Should that not work, my point being that we can address your use case without coming up with a new API but leverage the existing bits we have. Have you by any chance looking into that already?

Thanks,
Armando

[1] https://specs.openstack.org/openstack/neutron-specs/specs/api/extra_dhcp_options__extra-dhcp-opt_.html
[2] https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/extra_dhcp_opt.py
[3] http://git.openstack.org/cgit/openstack/neutron/tree/doc/source/admin/archives/use.rst#n179

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Thomas: ping?

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Thomas,

We reviewed this RFE during meeting of December 22nd. Do you have any feedback to the alternative proposed by Armando? Still interested on this?

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Thomas,

We discussed this RFE during today drivers meeting. Any feedback? Still interested on this? If we don't hear back from you by January 11th, we will conclude you are not interested anymore

Revision history for this message
Brian Haley (brian-haley) wrote :

Just adding a comment since there is one other issue here - since neutron port security is adding anti-spoofing rules, there is still a need for some kind of configuration on the instance, since trying to use the IP of one interface on another would lead to packets being dropped on egress. This can be worked-around with routing rules and sysctl settings, but is more likely to happen in this situation.

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

(Sorry for the delayed answer, I took some time off for the end and beginning of year.)

@Armando: your suggestion is very interesting, I was not aware of the extra-dhcp-opt extension, and beyond the fact that it may be a mean to do what I was suggest we would do with an additional extension, it would actually be a better solution because the choice would not be per-network, but per-port, hence allowing full control by tenant of the behaviors they want, including when using shared networks

So I add it to my list to test this, and I'll report back here.

[@Bryan: I don't think this usually is a problem, the kernel IP stack (in a VM) selects the source IP based on the outgoing interface (unless forced at socket creation, but I believe this is rarely done -- when it is, the tenant still would have the possibility to tune allowed address pairs or toggle port security). ]

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Thomas: happy new year! Glad I was able to give you a homework ;)

Look forward to hearing from you!

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

@armando: I've done a bit of testing, and while I could successfully change the routers options via DHCP, I could not have it omitted by using an empty value.

$ neutron port-create net2 --extra-dhcp-opt opt_name=router,opt_value=20.0.0.111

=> the 'routers' option in the resulting DHCP lease is set to 20.0.0.111

$ neutron port-create net2 --extra-dhcp-opt opt_name=router,opt_value=

+-----------------------+-------------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------------+
[..]
| extra_dhcp_opts | ip_version='4', opt_name='router', opt_value='' |
[...]
+-----------------------+-------------------------------------------------------------------------+

This results in dnsmasq complaining:

Jan 11 10:13:01 tm-devstack-master-02 dnsmasq[6833]: bad IP address at line 3 of /opt/stack/data/neutron/dhcp/133c5a3e-4502-4643-8685-b661426d2aa7/opts

line 3 being:
tag:f44d7673-e1a9-4a2b-80dd-c78406a3944b,option:router,

And, of course, the 'routers' option in the resulting DHCP lease remains set to the default gateway IP for the network.

I replaced the problematic line in dsnmasq opts file, as follows:

tag:f44d7673-e1a9-4a2b-80dd-c78406a3944b,option:router

(without the ',')

This results in actually removing the 'routers' option from the DHCP lease!

I believe that to use of the approach you propose, it would possibly be sufficient to change the code at [1] so that the end comma is avoided when the option value is empty. Perhaps we'll have to be careful as whether to do that only when the option is None and not when it is an empty string (but that would raise a secondary issue of how to provide a None value via the CLI, which I couldn't achieve in my tests, possibly just because I don't know how, or possibly because no CLI client supports that).

[1] http://git.openstack.org/cgit/openstack/neutron/tree/neutron/agent/linux/dhcp.py#n1037

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@Thomas, thanks for following up on it. Let me mull over this a bit, I wonder if there's something else we don't recall/misusing, but the initial result is certainly promising!!

Nice one.

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Thomas,

So fixing the piece of code in the agent would be sufficient? In that case, we can transform this RFE into a normal bug and just go ahead and fix it....

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

@Miguel: yes, I'd agree.
@Armanda: did your mulling over bring over something ?

tags: added: rfe-triaged
Revision history for this message
YAMAMOTO Takashi (yamamoto) wrote :

i guess it's nice to have a tempest scenario for this to ensure the consistent behavior among implementations.

Revision history for this message
Miguel Lavalle (minsel) wrote :

Discussed briefly during today's drivers meeting. Reclassifying as a normal bug so a fix can be submitted for it

Changed in neutron:
importance: Wishlist → Medium
status: Triaged → Confirmed
Miguel Lavalle (minsel)
tags: removed: rfe rfe-triaged
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: Confirmed → Won't Fix
Revision history for this message
kay (kay-diam) wrote :

Looks like I reported a similar bug report: https://bugs.launchpad.net/neutron/+bug/1979528
It would be nice to have this bug fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.