[SRU] Fix Race between L3 agent and neutron-ns-cleanup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Unassigned | ||
neutron |
Invalid
|
Undecided
|
Unassigned | ||
neutron (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Yakkety |
Fix Released
|
Undecided
|
Unassigned | ||
Zesty |
Fix Released
|
Undecided
|
Unassigned | ||
neutron-lbaas (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Yakkety |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I suspect a race between the neutron L3 agent and the neutron-
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.392 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
2016-08-03 03:30:03.393 2595 ERROR neutron.
In this case, the cleanup first deleted the qrouter namespace it found to be empty (not containing any netdevs other than lo). The router delete flow attempts to delete iptables rules within the namespace before deleting the namespace itself. However, if the namespace is deleted first, the iptables-save command on a non-existent namespace fails. The resulting exception prevents the router delete flow from succeeding and the L3 agent gets stuck in a failure loop.
Can somebody confirm if this is a known issue or if I've misunderstood the problem. Assuming my analysis is correct, would the following fix work?
diff --git a/neutron/
index b096091..8d3e8ae 100644
--- a/neutron/
+++ b/neutron/
@@ -358,8 +358,16 @@ class L3NATAgent(
return
- registry.
- self, router=ri)
+ try:
+ registry.
+ self, router=ri)
+ except Exception as e:
+ ns_err = "Cannot open network namespace qrouter-" + router_id
+ if ns_err not in e:
+ raise
+ else:
+ LOG.warn(
+ router_id)
summary: |
- Race between L3 agent and neutron-ns-cleanup + [SRU] Race between L3 agent and neutron-ns-cleanup |
summary: |
- [SRU] Race between L3 agent and neutron-ns-cleanup + [SRU] Fix Race between L3 agent and neutron-ns-cleanup |
I don't think running the netns-cleanup script in a CRON job is desired behavior.