Comment 3 for bug 1957189

Revision history for this message
Yusuf Güngör (yusuf2) wrote (last edit ): Re: DVR Router Update Error

Hi Rodolfo, sorry for late response.

Thanks you for your test.

We have observed something interesting.

Yes, L3 agent mode of the controllers is "dvr_snat" and the compute nodes "dvr". We also upgraded to Wallaby but issue still persist.

Can you try when a vlan provider network attached as GW to the router? Provider Network subnet pool and Tenant Network subnet pool also resides on the same address scope to use bgp. Details: https://docs.openstack.org/neutron/wallaby/admin/config-bgp-dynamic-routing.html

You are right, in our case qrouter ns have the routes but does not have the ip rules ("$ ip rule")

Also, fip ns does not have routes at all!

Assume there exist 5 network node, max_l3_agents_per_router:3 and dvr ha router running on network node 01, 02 and 03. Router have a vlan provider network as GW. Provider Network and Tenant Network have subnets from two different subnet pools which in the same address scope.

When attaching a vxlan tenant network to router, then tenant network cidr route is updated in the fip netns for the network nodes which does not have the router (Network node 04 and 05) but not updated for network node 01,02 and 03 (which have the router) We have seen the "Starting router update for..." and "Finished a router update..." logs for all network node l3 agents. But somehow l3 agent ignore to add qrouter netns ip rules and fip netns ip routes for attached tenant network.

! The first network attachment to router after router creation may success but next attach operations have this situation.

Restarting the l3 agent fixes the all this issues!

Also when detaching the network, we also getting errors because l3 agent tries to remove routes which are not already added.

We are using dvr, address scopes and bgp. Only dns request of instances NAT'ted over controller nodes. If all of the dhcp agents resides on the network nodes which does not have that fip ns routes then our instances fail for dns queries.

Scenario

BGP Provider Network
  10.10.10.0/24

BGP Tenant Network
  10.10.12.0/24

# openstack address scope create --ip-version 4 test-scope
# openstack subnet pool create --pool-prefix 10.10.10.0/24 --address-scope test-scope test-provider-subnet-pool-01
# openstack subnet pool create --pool-prefix 10.10.12.0/24 --address-scope test-scope test-tenant-subnet-pool-01

# Create Provider Network and Subnet
# openstack network create \
--external \
--provider-network-type vlan \
--provider-physical-network physnet1 \
--provider-segment 118 test-provider-net-01

openstack subnet create test-provider-subnet-01 \
--network test-provider-net-01 \
--subnet-pool test-provider-subnet-pool-01 \
--prefix-length 24

===> We also changes the allocation pool and default gw after subnet creation. Because some of these IPs used by router. (For BGP)
    --allocation-pool start=10.10.10.20,end=10.10.10.253
    GW: 10.10.10.12

# Create vxlan tenant network and subnet
# openstack network create tenant-vxlan-net
# openstack subnet create tenant-vxlan-subnet-01 \
--network tenant-vxlan-net \
--subnet-pool test-tenant-subnet-pool-01 \
--prefix-length 26

# Create a router and set external gw as provider network. Also attach test-vxlan net to this router
# openstack router create test-router-01
# openstack router add subnet test-router-01 tenant-vxlan-subnet-01
# openstack router set --external-gateway test-provider-net-01 test-router-01