[SNAT][HA]snat traffic broken after restarting network nodes

Bug #1587831 reported by kingbird
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Medium
venkata anil

Bug Description

After restarting both network nodes (l3 agent_mode=dvr_snat) at same time, both snat namespaces on the nodes can't talk to each other, and promote itself as the active one. In this case, there are 2 active snat namespaces.

Then, once the one who actually takes SNAT traffic is done, the other one won't take over the responsibility.

[root@zk22-01 ~]# neutron router-list
+--------------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------+-------------+------+
| id | name | external_gateway_info | distributed | ha |
+--------------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------+-------------+------+
| c497892b-8ff4-441d-9f4e-43fd30401930 | rt | {"network_id": "c892d21d-fea9-4d4b-b5f6-276345c7901f", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "129df259-0104 | True | True |
| | | -400e-8c76-a4d9250eb9c9", "ip_address": "192.168.122.4"}]} | | |
+--------------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------+-------------+------+

[root@zk22-01 ~]# neutron l3-agent-list-hosting-router c497892b-8ff4-441d-9f4e-43fd30401930
+--------------------------------------+---------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+---------+----------------+-------+----------+
| be5526ce-ad40-46af-9dc8-898cf08ebe9b | zk22-01 | True | :-) | active |
| dcdfc230-c5d1-4dd3-b541-a6abac6531ba | zk22-02 | True | :-) | active |
+--------------------------------------+---------+----------------+-------+----------+

[root@zk22-01 ~]# ip netns exec snat-c497892b-8ff4-441d-9f4e-43fd30401930 tcpdump -nn -i ha-004331fc-9f
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ha-004331fc-9f, link-type EN10MB (Ethernet), capture size 65535 bytes
18:59:03.574554 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:05.575500 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:07.576432 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:09.577361 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:11.578293 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:13.579243 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20

[root@zk22-02 ~]# ip netns exec snat-c497892b-8ff4-441d-9f4e-43fd30401930 tcpdump -nn -i ha-dda33de1-3e
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ha-dda33de1-3e, link-type EN10MB (Ethernet), capture size 65535 bytes
18:59:15.918725 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:17.919038 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:19.920036 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:21.921004 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:23.922007 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20
18:59:25.923017 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype simple, intvl 2s, length 20

Revision history for this message
kingbird (zjbcareer) wrote :

After comparing the flows in br-tun before and after rebooting the network node, I found that some arp_responder flows related to the HA network are missed.

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

I suspect this is related to:

https://review.openstack.org/#/c/282874/

Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

In fact, I believe this is a duplicate of:

bug 1522980

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.