Bug #1888256 “Neutron start radvd and mess up the routing table ...” : Bugs : neutron

Revision history for this message

Brian Haley (brian-haley) wrote on 2020-07-20:

#1

What type of network is this? Is it marked external? I'm just wondering if a neutron router is supposed to be attached at all.

It could also be that a bug was introduced since I'm not sure this specific case is tested.

tags:

removed: ra-mode

Revision history for this message

Peter (fazy) wrote on 2020-07-20:

#2

Download full text (5.9 KiB)

This is a shared, but not external router (so no NAT-ing). We allocate public v4/v6 to instance from it.
Actually, we use it, as a "VPS" network (it's a network of admin user anyway)

So, we have routers here, because we use v4 DHCP and metadata from it.

This is a shared, but not external router (so no NAT-ing). We allocate public v4/v6 to instance from it.
Actually, we use it, as a "VPS" network (it's a network of admin user anyway)

So, we have routers here, because we use v4 DHCP and metadata from it.

for metadata, we use the qrouter based one, with static route on IPv4:

(don't be confused about the "Flat" naming.. it's a VLAN based network, it's just the name)

Regards:
 Peter

Revision history for this message

Brian Haley (brian-haley) wrote on 2020-07-24:

#3

Sorry for the slow response.

I have more questions as I'm trying to understand the use case.

1) You have a shared provider network, but instead of being external it's internal.

2) You allocated public v4/v6 addresses to instances on it, so don't require NAT.

3) What purpose does the neutron router perform? Is it routing between subnets on this network, or multiple shared internal networks? It almost doesn't seem like you need the neutron router.

When I tried to recreate this using a subnet I created with ra-mode=None/address-mode=slaac, and then adding an interface to a router I get:

Error: Failed to add interface: Bad router request: IPv6 subnet 6c7c4a89-15fd-4627-b30f-92f306c8e11f configured to receive RAs from an external router cannot be added to Neutron Router.. Neutron server returns request_ids: ['req-ef172dc4-b5ea-4edc-84f1-53173287b4bc']

I could successfully add interfaces for IPv6 subnets with None/None for the modes, and I don't see radvd advertising a prefix, so I'm not sure how you did this yet.

Revision history for this message

Peter (fazy) wrote on 2020-07-25:

#4

1) Yes. We want to have a "simple" or as we call "Flat" network, which can be used by all projects, and gives public v4/v6 addresses. (next to our "Smart" networks, where dual qrouter, floating IP, VPNaaS etc available, but no IPv6 now)
Since the project administrators cannot manage ports, they got new IP with new instance.

2) Yes.

3) We use the router only for metadata in our "Flat" named networks. (cannot remember why, but we use this method from kilo. ).
As you can see, the "Flat1-subnet-v4" has a static route: (destination='169.254.169.254/32', gateway='193.224.218.251' ) where the 193.224.218.251 is the floating IP of the qrouter.

Maybe I misunderstand you, but I should try with None/None?

I thought, the address-mode=slaac will make the neutron allocate the proper address (which calculated from the MAC) and it's a must for the iptables rules in the back of the qrouters.

However, I used the documentation (https://docs.openstack.org/neutron/rocky/admin/config-ipv6.html) and it's clearly says with RA mode=none, and address mode=slaac: "Guest instance obtains IPv6 address from non-OpenStack router using SLAAC."

For some reason, the radvd process spawned with the qrouter with this configuration, and I cannot really understand, why.

The another odd thing is that our RegionOne, which upgraded from Kilo to Rocky in a past few years works with the old networks in this way. (our Flat1 and Flat2 which created in Kilo) Our new "Flat3" named network in RegionOne, and the two new "Flat" in RegionTwo behave this way only.

Revision history for this message

Brian Haley (brian-haley) wrote on 2020-07-26:

#5

I don't think you should attach the router, in which case the dhcp-agent should add a route in it's reply to use it for metadata. I don't think you'll need to change dhcp_agent.ini, but if this fails you can change enable_isolated_metadata=True.

As far as the docs go, I thought they were correct, but this bug is making me think we need to revisit them to double-check.

Revision history for this message

Peter (fazy) wrote on 2020-07-26:

#6

If I remember well, the router based metadata had been choosen because of High Availability considerations.
As I mentinoned, we started with Kilo in 2015.
Our design consists two dedicated network node, which runs the neutron-{ovs,l3,dhcp,metadata} agents.

And... I may not remember well with the next part (or it's just not true now, and may never was :) )

So, the DHCP high availability granted by running two independent DHCP agents (1-1 qdhcp on the network nodes/network with different IP addresses)

The router (1-1 router on network nodes/network) however only has one, but floating IP address with VRRP.

In DHCP based metadata, we (at least back in Kilo) not able to add both qDHCP instances with double static routes to the guests, therefor the HA was not granted. (or we missed something back then)

With the router based metadata, we added one static route with the "side" qrouter pair floating IP, so if one of our network node restarted/died/etc the another one got the floating IP by keepalived, and the metadata worked just fine.

Since our setup runs on production, this kind of change (qrouter to qdhcp metadata) is hard, beacuse this config change will have impact on all of our networks (but not impossible)

That's the reason, why I want to figure out our qrouter - radvd problem.

Maybe our unsuccessful tries with qdhcp metadata was based on a bug [*], which with Rocky just works fine, or we missed something with the configuration.

[*] https://bugzilla.redhat.com/show_bug.cgi?id=1256816

Back to the radvd problem:
I tried to understand the neutron code, and the radvd behaviour, and I may have a guess...
Neutron relevant code (in master, it's the same): https://github.com/openstack/neutron/blob/8c80267bb6699c86e10aade13c54b715e1eae1bf/neutron/agent/linux/ra.py

I've a few toughts:

1)
AdvSendAdvert on|off
A flag indicating whether or not the router sends periodic router advertisements and responds to router solicitations.

This option no longer has to be specified first, but it needs to be on to enable advertisement on this interface.

If I'm right, this option make radvd to "work" and because my ra_mode is none, therefor no prefix specification will generated in the radvd.conf, and the radvd possibly listen and advertise on all interfaces, which has IPv6 address.
Since this parameter hardcoded to Jinja template, it cannot be avoided.

2) _spawn_radvd(self, radvd_conf) function.
There are no condition, to check the ra_mode, so even if I set it to none, the process will be spawned.