neutron: l3-agent stop causes failover of router

Bug #1845900 reported by ZhouHeng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla
Triaged
Wishlist
Michal Nasiadka
neutron
Invalid
Undecided
Unassigned

Bug Description

Environment:
Deploy with kolla has three controler.

Action:
select one router, and router's vip in control01.
execute docker stop neutron_l3_agent in control01

Found:
control01 and control02 both have vip.

I suspect that the neutron account does not have permission to send a signal to keepalived. causes keepalived to withdraw forcibly.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Stopping L3 agent shouldn't automatically cause failover of router. If keepalived process or router is working it can still keep active router on control01 node even if L3 agent is stopped there.

By VIP address do You mean this IP address which is managed by keepalived and moved between ha interfaces on router? Are all other IPs, like like attached subnets' interfaces, external gateway, etc. configured on control02 after this?
And please also tell us what is router's status in Neutron's API. Is it master on both control01 and control02?

Also can You attach some logs from keepalived, keepalived-state-change-monitor and l3 agent from both nodes (control01 and control02)?

tags: added: l3-dvr-backlog
removed: api-ref
Changed in neutron:
status: New → Incomplete
Revision history for this message
ZhouHeng (zhouhenglc) wrote :

Stopping L3 agent through ``docker stop``, keepalived process run in L3 Agent container, container exited and keepalived will be killed.

VIP is external gateway ip.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

If You stopped L3 agent and keepalived process together, than keepalived from other node will decide to be new master and L3 agent on this other node will configure everything for router. That is how failover works.
But in such case there is nothing what could do cleaning of router's config on "old" node (control01 in Your case) as You just stopped L3 agent there. So IMO this is not an neutron issue as neutron works here as expected.
Maybe Kolla shouldn't run keepalived processes for routers in same container as L3 agent?

Changed in neutron:
status: Incomplete → Invalid
Revision history for this message
Michal Nasiadka (mnasiadka) wrote :

We could adopt TripleO approach to spawn those processes in separate containers - then neutron-l3-agent stop would not affect those services.

https://review.opendev.org/#/c/566559/1

Changed in kolla:
status: New → Triaged
importance: Undecided → Wishlist
milestone: none → 11.0.0
assignee: nobody → Michal Nasiadka (mnasiadka)
Changed in kolla:
milestone: 11.0.0 → none
summary: - ha router appear double vip
+ neutron: l3-agent stop causes failover of router
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.