Many stale neutron-keepalived-state-change processes left after upgrade to native pyroute2 state-change

Bug #2052681 reported by LIU Yulong
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

Needs a post-upgrade script to remove those stale "ip -o monitor" and traditional "neutron-keepalived-state-change" processes.

Tags: l3-ha
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Brian Haley (brian-haley) wrote :

Liu - will you be proposing a patch for this? Can I assign it to you? Thanks

tags: added: l3-ha
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

I've tested with an older version of Neutron (Stein) and the problem you are describing is not related to the new implementation of "neutron-keepalived-state-change" process, but how this process was stopped.

The "ip -o monitor" processes are child processes of "neutron-keepalived-state-change" and are (should be) stopped when "neutron-keepalived-state-change" is. If the process is killed, the child processed won't be correctly stopped. If the "neutron-keepalived-state-change" is started again (with the old or the new implementation), the "ip -o monitor" leftovers will remain in the system.

Please check how are you stopping the "neutron-keepalived-state-change" processes and how are you upgrading your system.

Regards.

Changed in neutron:
status: New → Invalid
LIU Yulong (dragon889)
Changed in neutron:
status: Invalid → New
Revision history for this message
LIU Yulong (dragon889) wrote :

Hi,

@Rodolfo, thank you for the tests. So, IMO, you reproduced the issue.

"How to do the upgrade? And how to stop the traditional neutron-keepalived-state-change processes?", IMHO, these should be the part of the pyroute2 implementation work. We should update the DOC for users to tell them how to upgrade, or warn the this such upgrade issue, while we do not cover the upgrade tools or stop the processes automatically and implicitly.

So, let's find the answer about the question, how to upgrade or stop the process ?
A simple kill "<pid of ip -o monitor>" and "<pid of neutron-keepalived-state-change>" is fine to work? It has a sequences here, IMO, pkill -f "ip -o monitor" should be run first, and then pkill -f neutron-keepalived-state-change. These commands should be run one by one on all l3-agent hosts.

Another issue I noticed is that, the new pyroute2 implementation changed the PID file name from:
<router_id>.monitor.pid
to
<router_id>.monitor.pid.neutron-keepalived-state-change-monitor

So, after upgrade, neutron-l3-agent will start new "neutron-keepalived-state-change" for all routers because the new pid file is empty. Then all traditional neutron-keepalived-state-change and the ip -o monitor process remain.

Please confirm this.

Revision history for this message
LIU Yulong (dragon889) wrote :

Alright, confirmed:
https://review.opendev.org/c/openstack/neutron/+/661760/7/neutron/agent/l3/ha_router.py#362
This line added the service param to the ProcessManager and no pid_file. Then ProcessManager build the new pid path to <router_id>.monitor.pid.neutron-keepalived-state-change-monitor.

So, IMO, after an upgrade with https://review.opendev.org/c/openstack/neutron/+/661760, many neutron external process will be re-spawned, and stale processes remain.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.