L3 HA router ports 'host' field do not point to the active router replica

Bug #1494866 reported by xin wu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Undecided
Unassigned

Bug Description

We are using kilo. In our setup, we have 3 neutron controllers, l3 agents are running on all the 3 neutron controllers. We make l3_ha = true in all the 3 neutron.conf.

We notice that when we attach a network to a router, the gateway namespace is allocated to a controller node which doesn't match the record in neutron db. Following is one example.

Create a router, a network, attach the network to the router.

1. neutron tells that the gateway ip 1.1.1.1 is at controller-1
[stack@c5220-01 ~]$ neutron port-show 3306c360-5a3d-4a08-aa92-017498758963
+-----------------------+--------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | overcloud-controller-1.localdomain |
| binding:profile | {} |
| binding:vif_details | {"port_filter": true, "ovs_hybrid_plug": true} |
| binding:vif_type | ovs |
| binding:vnic_type | normal |
| device_id | 934f0b90-2d98-4d54-b9ca-5222aac2199d |
| device_owner | network:router_interface |
| extra_dhcp_opts | |
| fixed_ips | {"subnet_id": "463c2f0c-5d56-4abb-8b30-8450d8306f46", "ip_address": "1.1.1.1"} |
| id | 3306c360-5a3d-4a08-aa92-017498758963 |
| mac_address | fa:16:3e:72:34:4c |
| name | |
| network_id | 98f125b6-6d4d-4417-a0b3-e8d9ff530d6f |
| security_groups | |
| status | ACTIVE |
| tenant_id | 4ef11838925940eb9d177ae9345711ee |
+-----------------------+--------------------------------------------------------------------------------+

2. However, the gateway ip is at controller-2
[heat-admin@overcloud-controller-2 ~]$ sudo ip netns exec qrouter-934f0b90-2d98-4d54-b9ca-5222aac2199d ifconfig
ha-6d47f13a-b7: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 169.254.192.6 netmask 255.255.192.0 broadcast 169.254.255.255
        inet6 fe80::f816:3eff:fe43:9b80 prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:43:9b:80 txqueuelen 1000 (Ethernet)
        RX packets 20 bytes 1638 (1.5 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 309 bytes 16926 (16.5 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 0 (Local Loopback)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

qg-22431202-eb: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.8.87.25 netmask 255.255.255.0 broadcast 0.0.0.0
        inet6 fe80::f816:3eff:febd:56ad prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:bd:56:ad txqueuelen 1000 (Ethernet)
        RX packets 36 bytes 2746 (2.6 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 43 bytes 2890 (2.8 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

qr-3306c360-5a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 1.1.1.1 netmask 255.255.255.0 broadcast 0.0.0.0
        inet6 fe80::f816:3eff:fe72:344c prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:72:34:4c txqueuelen 1000 (Ethernet)
        RX packets 95 bytes 5856 (5.7 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 90 bytes 4200 (4.1 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

3. On controller-1, there is no such ip
[heat-admin@overcloud-controller-1 ~]$ sudo ip netns exec qrouter-934f0b90-2d98-4d54-b9ca-5222aac2199d ifconfig
ha-7ff9abd2-bd: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 169.254.192.5 netmask 255.255.192.0 broadcast 169.254.255.255
        inet6 fe80::f816:3eff:fe9d:275c prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:9d:27:5c txqueuelen 1000 (Ethernet)
        RX packets 321 bytes 19678 (19.2 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 12 bytes 1008 (1008.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 0 (Local Loopback)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

qg-22431202-eb: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        ether fa:16:3e:bd:56:ad txqueuelen 1000 (Ethernet)
        RX packets 42 bytes 3360 (3.2 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1 bytes 110 (110.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

qr-3306c360-5a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        ether fa:16:3e:72:34:4c txqueuelen 1000 (Ethernet)
        RX packets 105 bytes 6456 (6.3 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1 bytes 110 (110.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

4. On controller-0, there is no such ip
[heat-admin@overcloud-controller-0 ~]$ sudo ip netns exec qrouter-934f0b90-2d98-4d54-b9ca-5222aac2199d ifconfig
ha-8dccf24a-2e: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 169.254.192.4 netmask 255.255.192.0 broadcast 169.254.255.255
        inet6 fe80::f816:3eff:fe98:83dd prefixlen 64 scopeid 0x20<link>
        ether fa:16:3e:98:83:dd txqueuelen 1000 (Ethernet)
        RX packets 1140 bytes 68618 (67.0 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 12 bytes 1008 (1008.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
        inet 127.0.0.1 netmask 255.0.0.0
        inet6 ::1 prefixlen 128 scopeid 0x10<host>
        loop txqueuelen 0 (Local Loopback)
        RX packets 0 bytes 0 (0.0 B)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 0 bytes 0 (0.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

qg-22431202-eb: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        ether fa:16:3e:bd:56:ad txqueuelen 1000 (Ethernet)
        RX packets 42 bytes 3244 (3.1 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1 bytes 110 (110.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

qr-3306c360-5a: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        ether fa:16:3e:72:34:4c txqueuelen 1000 (Ethernet)
        RX packets 1753 bytes 105336 (102.8 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 1 bytes 110 (110.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Tags: l3-ha
Revision history for this message
James Denton (james-denton) wrote :

If I'm not mistaken, VRRP mechanisms are determining which router has the qg and qr addresses configured at any given time. I don't believe the port is updated on a failover event or even after the initial election.

Revision history for this message
xin wu (xin-wu) wrote :

Is it possible to update the port table every time VRRP converges to a new active router? Right now, port table is telling the wrong information.

Revision history for this message
Assaf Muller (amuller) wrote :

Please use:
neutron l3-agent-list-hosting-router <router_id|router_name>

To see where is the active router replica hosted. That is where the IPs will be present.

Changed in neutron:
status: New → Invalid
Revision history for this message
Kevin Benton (kevinbenton) wrote :

Assaf, this is a legitimate bug. L3 HA should be updating the port binding details to correctly reflect where the port is being used. The OVS mech driver just happens to work with an incorrect host_id.

Changed in neutron:
status: Invalid → Confirmed
Revision history for this message
Kevin Benton (kevinbenton) wrote :

Otherwise L3 HA is not really compatible with ML2 drivers that push VLANs to ports based on host location (e.g. Big Switch, Arista).

tags: added: l3-ha
Assaf Muller (amuller)
summary: - router allocation doesn't match the record in neutron db when l3_ha is
- true
+ l3_ha router ports 'host' field do not point to the active router
+ replica
summary: - l3_ha router ports 'host' field do not point to the active router
+ L3 HA router ports 'host' field do not point to the active router
replica
Revision history for this message
Assaf Muller (amuller) wrote :

Since https://review.openstack.org/#/q/I8475548947526d8ea736ed7aa754fd0ca475cae2,n,z we actually do update the port bindings 'host' field when HA router states change. That patch was backported to Kilo, I am assuming the reporter observed this behavior on an older Kilo version.

Changed in neutron:
status: Confirmed → Fix Released
Revision history for this message
Assaf Muller (amuller) wrote :

Setting to fixed. If the reporter still sees this behavior with latest stable/kilo or master please re-open.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.