Liu, this is not for an HA router. Also, it is not for centralized FIPs.
1. This is a compute node, where l3_agent is in dvr_snat mode. We have multiple such nodes where l3-agent is in dvr_snat mode for regular failover
2. Router is a regular DVR router, not HA. We have no centralized FIPs.
3. There are VMs on the same node with and without floating IPs.
So to reproduce, have 2 or more nodes in DVR SNAT mode for l3-agent. These should also be compute nodes, so nova-compute, etc... is on same.
Create a DVR but non-HA router, so that one snat namespace gets scheduled to one of the 2+ nodes. Create a VM and some Floating IPs on each node, so qrouter namespace is created, fip namespace is created, and rfp/fpr link is created on all nodes.
At this point, snat has been scheduled to one of these dvr_snat nodes as well.
Now, restart l3-agent on one of the OTHER nodes.
You will see on init snat namespace gets created on these nodes, then deleted again in the code paths I listed before. The deletion code triggers deletion of gateway which ends up deleting rfp/fpr link between qrouter and FIP.
Prior to the fix, snat was not created then deleted on dvr_snat nodes that did not host snat router
Liu, this is not for an HA router. Also, it is not for centralized FIPs.
1. This is a compute node, where l3_agent is in dvr_snat mode. We have multiple such nodes where l3-agent is in dvr_snat mode for regular failover
2. Router is a regular DVR router, not HA. We have no centralized FIPs.
3. There are VMs on the same node with and without floating IPs.
So to reproduce, have 2 or more nodes in DVR SNAT mode for l3-agent. These should also be compute nodes, so nova-compute, etc... is on same.
Create a DVR but non-HA router, so that one snat namespace gets scheduled to one of the 2+ nodes. Create a VM and some Floating IPs on each node, so qrouter namespace is created, fip namespace is created, and rfp/fpr link is created on all nodes.
At this point, snat has been scheduled to one of these dvr_snat nodes as well.
Now, restart l3-agent on one of the OTHER nodes.
You will see on init snat namespace gets created on these nodes, then deleted again in the code paths I listed before. The deletion code triggers deletion of gateway which ends up deleting rfp/fpr link between qrouter and FIP.
Prior to the fix, snat was not created then deleted on dvr_snat nodes that did not host snat router