neutron

Bug #2023993
Activity log

Activity log for bug #2023993

Date	Who	What changed	Old value	New value	Message
2023-06-15 12:54:40	Ihtisham ul Haq	bug			added bug
2023-06-16 07:28:49	Lajos Katona	tags		l3-ha ovn
2023-06-19 08:47:02	Lajos Katona	neutron: importance	Undecided	Medium
2023-06-20 09:48:49	Ihtisham ul Haq	description	Consider the following Router Priority Chassis r_a 5 gtw06 r_a 4 gtw05 r_a 3 gtw04 r_a 2 gtw03 r_a 1 gtw02 Note the r_a doesn't have any priority on gtw01 but now if we stop gtw06(using ovn-appctl exit) due to maintenance reasons, afterwards the situation becomes: Router Priority Chassis r_a 5 gtw05 r_a 4 gtw04 r_a 3 gtw03 r_a 2 gtw02 r_a 1 gtw01 So basically neutron slides down the priorities for that router, when it detects that chassis(gtw06) is down, and I believe it does that to avoid moving the active LRP more then once, as the router is already failed over to prioity 4(gtw05), and when the gtw06 goes down and afterwards it only updates gtw05 to priority 5 and similarly for the other priorities<5. And the issue arises because of that is when we have many priority 5 routers on gtw06, and the rescheduling(due to failover of the chassis) doesn't result in a balanced distribution of the routers. And to resolve that we currently have to run another external script to rebalances the LRPs. I am not yet sure if that is case by design and the operator has to make sure they routers are rebalanced manually or if there is better solution here so we have rebalanced the LRP while keeping in mind to have least amount of failovers for the LRP. Neutron version: Yoga	Consider the following Router Priority Chassis r_a 5 gtw06 r_a 4 gtw05 r_a 3 gtw04 r_a 2 gtw03 r_a 1 gtw02 Note the r_a doesn't have any priority on gtw01 but now if we stop gtw06(using ovn-appctl exit) due to maintenance reasons, afterwards the situation becomes: Router Priority Chassis r_a 5 gtw05 r_a 4 gtw04 r_a 3 gtw03 r_a 2 gtw02 r_a 1 gtw01 So basically neutron promotes the priorities for that router, when it detects that chassis(gtw06) is down, and I believe it does that to avoid moving the active LRP more then once, as the router is already failed over to priority 4(gtw05), and when the gtw06 goes down and afterwards it only updates gtw05 to priority 5 and similarly for the other priorities<5. And the issue arises because of that is when we have many priority 5 routers on gtw06, and the rescheduling(due to failover of the chassis) doesn't result in a balanced distribution of the routers. And to resolve that we currently have to run another external script to rebalances the LRPs. I am not yet sure if that is case by design and the operator has to make sure they routers are rebalanced manually or if there is better solution here so we have rebalanced the LRP while keeping in mind to have least amount of failovers for the LRP. Neutron version: Yoga
2023-06-20 12:35:45	Rodolfo Alonso	neutron: assignee		Rodolfo Alonso (rodolfo-alonso-hernandez)
2023-09-04 14:07:40	OpenStack Infra	neutron: status	New	In Progress