qrouter ns ip rules not deleted when fip removed from vm
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
High
|
Unassigned | ||
Queens |
Fix Released
|
High
|
Unassigned | ||
Rocky |
Fix Released
|
High
|
Unassigned | ||
Stein |
Fix Released
|
High
|
Unassigned | ||
Train |
Fix Released
|
High
|
Unassigned | ||
Ussuri |
Fix Released
|
High
|
Unassigned | ||
Victoria |
Fix Released
|
High
|
Unassigned | ||
neutron |
Fix Released
|
High
|
Edward Hope-Morley | ||
neutron (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Bionic |
Fix Released
|
High
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Unassigned | ||
Groovy |
Fix Released
|
High
|
Unassigned |
Bug Description
[Impact]
neutron-l3-agent restart causes partial loss of fip information such that fip removal from vm results in ip rules left behind which breaks external network access for that vm.
[Test Case]
* deploy openstack with dvr enabled
* create distributed router, network etc
* create a vm and attach a floating ip
* go to compute host on which vm is running and restart neutron-l3-agent
* tail -f /var/log/
* remove fip from vm
* run https:/
* should return with "nothing to do"
[Regression Potential]
the patch is reloading, on agent startup, information associated with floating ips, specifically the information needed to delete ip rules and rule priorities associated with a floating ip. Since that is essentially read-only I don't envisage a regression potential. When the l3-agent comes to use that information to delete the floating ip an error could occur if the information it is trying to delete no longer exists but that would not be a problem introduced by this patch so again, I don't envisage any potential for regressions from this patch since it doesn't change behavior in any way other than allowing the l3-agent to behave the same as if it hadn't been restarted.
[Other Info]
patched neutron l3 agent will reload info for *used* floating ips when restarted BUT if there are ip rules left behind from fips removed prior to using a pathed neutron then manual cleanup is still required and for that you can use https:/
-------
With Bionic Stein using dvr_snat if I add a floating ip to a vm then remove the floating ip, the corresponding ip rules in the associated qrouter ns local to the instance are not deleted which results in no longer being able to reach the external network because packets are still sent to the fip namespace (via rfp-/fpr-) e.g. in my compute host running a vm whose address is 192.168.21.28 for which i have removed the fip I still see:
# ip netns exec qrouter-
0: from all lookup local
32765: from 192.168.21.28 lookup 16
32766: from all lookup main
32767: from all lookup default
3232240897: from 192.168.21.1/24 lookup 3232240897
3232241231: from 192.168.22.79/24 lookup 3232241231
And table 16 leads to:
# ip netns exec qrouter-
default via 169.254.109.249 dev rfp-5e45608f-3
Which results in the instance no longer being able to reach the external network (packets are never sent to the snat- ns in my case).
The workaround is to delete that ip rule but neutron should be taking care of this. Looks like the culprit is in neutron/
Note that the NAT rules were successfully removed from iptables so looks like it is just this bit that is left behind.
tags: | added: sts |
Changed in neutron: | |
assignee: | nobody → Edward Hope-Morley (hopem) |
Changed in neutron: | |
status: | In Progress → Confirmed |
importance: | Undecided → High |
tags: | added: l3-dvr-backlog |
Changed in neutron: | |
status: | Confirmed → In Progress |
description: | updated |
tags: | added: sts-sru-needed |
Changed in neutron (Ubuntu Bionic): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in neutron (Ubuntu Focal): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in neutron (Ubuntu Groovy): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in neutron (Ubuntu Bionic): | |
status: | Incomplete → Triaged |
Changed in neutron (Ubuntu Focal): | |
status: | Incomplete → Triaged |
Changed in neutron (Ubuntu Groovy): | |
status: | Incomplete → Triaged |
Changed in neutron (Ubuntu Groovy): | |
status: | Triaged → Fix Committed |
Changed in neutron (Ubuntu Groovy): | |
status: | Fix Committed → Fix Released |
tags: | added: neutron-proactive-backport-potential |
tags: | removed: neutron-proactive-backport-potential |
tags: | added: neutron-proactive-backport-potential |
ok a bit more info, this problem only occurs if you restart the neutron-l3-agent in between adding and removing the floating ip. Looking at the code it looks like this is because the information needed to delete the fip is held in memory and comes from when the fip is added at which point it is added to floating_ips_dict which is never repopulated. So basically if you restart your l3-agent you lost all records of floating ips that need their ip rules deleted.
def _add_floating_ ip_rule( self, floating_ip, fixed_ip): ns.allocate_ rule_priority( floating_ ip)
self.floating_ ips_dict[ floating_ ip] = (fixed_ip, rule_pr)
rule_pr = self.fip_
def _remove_ floating_ ip_rule( self, floating_ip): ips_dict:
fixed_ ip, rule_pr = self.floating_ ips_dict[ floating_ ip]
ip_ lib.delete_ ip_rule( self.ns_ name, ip=fixed_ip,
table= dvr_fip_ ns.FIP_ RT_TBL,
priority= int(str( rule_pr) ))
self. fip_ns. deallocate_ rule_priority( floating_ ip)
if floating_ip in self.floating_
# TODO(rajeev): Handle else case - exception/log?