Router State standby on all l3 agent when create

Bug #1902211 reported by Nguyen Thanh Cong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

Hi all

When i create a router, it is stuck in state standby forever on all agent. I login to agent and check, in /var/lib/neutron/ha_confs/<router-id> , it does not have a keepalived.conf filem, so all agent stuck in standby state.

Once stuck, I won't be able to create the router anymore, I have to restart neutron-l3-agent to fix it. It creates the file keepalived.conf and the router will be active in one agent, and I can continue to create the router

I am running OpenStack Train

How can reproduce:
- I can't reproduce it, some times this error appear and i can't create router. I have to restart neutron-l3-agent to fix it. Some time later (maybe 1-3 days, i am not sure), I can't create routers again.

Debug in code:
1. When add_router to agent, i saw code stuck when StrongSwan IPSec sync status with server side.
https://github.com/openstack/neutron-vpnaas/blob/2bea568b4cd4968dcbe64f55247a970545e911af/neutron_vpnaas/services/vpn/agent.py#L67

2. When code running in line above, i saw code not running to fuction sync
https://github.com/openstack/neutron-vpnaas/blob/2bea568b4c/neutron_vpnaas/services/vpn/device_drivers/ipsec.py#L1085

3. I think i stuck when running decorator
https://github.com/openstack/oslo.concurrency/blob/80a6e1d489c5d650ea1ce47f4d81bd98bc803542/oslo_concurrency/lockutils.py#L351

4. Finally, this line will stuck forever and i don't know how to fix it
https://github.com/openstack/oslo.concurrency/blob/80a6e1d489c5d650ea1ce47f4d81bd98bc803542/oslo_concurrency/lockutils.py#L264
name=vpn-agent
lock_file_prefix=neutron-
external=False
lock_path=None
do_log=False
semaphores=None
delay=0.01
fair=False
int_lock=<threading.Semaphore object at 0x7fa2e04b00f0>

Please help me fix this bug. Thank you all

Tags: l3-ha vpnaas
summary: - Create router standby on all l3 agent
+ Router State standby on all l3 agent when create
tags: added: vpnaas
Changed in neutron:
status: New → Incomplete
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Hi Nguyen Thanh Cong,

Can You give us debug logs from such stuck neutron-l3-agent? And also, can You check if You will have similar lock issue (of course in different place) without neutron-vpnaas enabled?

tags: added: l3-ha
Revision history for this message
Nguyen Thanh Cong (congnt95) wrote :

Hi Slawek Kaplinski,

This is all debug log when i create router and it stuck in standby state (router_id: 896da4e3-f085-4300-92d3-6cbdfd9ebec1). When stuck, No keepalived.conf file is created in the directory /var/lib/neutron/ha_conf/896da4e3-f085-4300-92d3-6cbdfd9ebec1

https://pikab.in/f688029f37

I try find solution for this bug in another place but no luck.

Thanks!

Revision history for this message
LIU Yulong (dragon889) wrote :

Please have a try to disable all L3 agent extensions to see if the router HA state can be recovered. Then enable the extensions one by one to get more informations about the router processing procedure.

Revision history for this message
Nguyen Thanh Cong (congnt95) wrote :

Hi LIU Yulong,

I try to disable plugin vpnaas and don't see the error.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.