Neutron LInux bridge agents wildly fluctuating

Bug #2017518 reported by YG Kumar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned

Bug Description

Hi All,

We have OSA Yoga setup. The neutron linux bridge agent is wildly fluctuating, the agents going up and down in the `neutron agent list` command. The count of the agents which are down is very intermittent and changing every few seconds as shown below:

----
38
root@utility-container-:~# neutron agent-list | grep Linux | grep xxx | wc -l
neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead.
34
root@utility-container-:~# neutron agent-list | grep Linux | grep xxx | wc -l
neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead.
43
root@utility-container-:~# neutron agent-list | grep Linux | grep xxx | wc -l
neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead.
2
root@utility-container-:~# neutron agent-list | grep Linux | grep xxx | wc -l
neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead.
2
root@utility-container-:~# neutron agent-list | grep Linux | grep xxx | wc -l
neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead.
82
root@utility-container-:~# neutron agent-list | grep Linux | grep xxx | wc -l
neutron CLI is deprecated and will be removed in the Z cycle. Use openstack CLI instead.
54
-------

As shown above, the agents down count is fluctuating within few seconds gap of executing the above command. The logs on the network nodes are not indicating anything wrong. Why is this happening ?

Tags: linuxbridge
Revision history for this message
Brian Haley (brian-haley) wrote :

Two comments:

1) If you could provide any tracebacks the agent(s) are logging it might help isolate the issue, you might need to look on the compute nodes for this agent.

2) The Linux Bridge agent has been unmaintained for a while, and in Zed it was marked Experimental, so it is unclear how much support the community will be able to give depending on the severity of the issue.

Changed in neutron:
status: New → Incomplete
tags: added: linuxbridge
Revision history for this message
YG Kumar (ygk-kmr) wrote :

I have attached the linux bridge agent log. Please check it....

Revision history for this message
YG Kumar (ygk-kmr) wrote :

Also, let me know how the agent is detected as up or down in neutron ? what happens in the background to assess the agent's status ?

Revision history for this message
Brian Haley (brian-haley) wrote :

I didn't see anything in the log that would explain the failure, sorry.

The agent will send a "report" to the server every 30 seconds by default, which will update it's status in the DB and the call you are making shows that info.

The code that logs the messages isn't very verbose, I could maybe fix that, but it won't solve the problem just help see the agent is doing work at every interval.

Revision history for this message
YG Kumar (ygk-kmr) wrote :

So, what could be the next step ?

Revision history for this message
Brian Haley (brian-haley) wrote :

You should try and find a system that is alternating between states and see what the log shows. You could also look at the neutron-server log to see if it is having any issues receiving the RPC messages from the agents.

But since the code is now experimental I'm not sure anyone will dig into this much more than the steps I've done already. I will try and assist since I'm watching this bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/881645

Revision history for this message
YG Kumar (ygk-kmr) wrote :

Shall I add the patch to the file neutron/plugins/ml2/drivers/agent/_common_agent.py and restart the agent and check for logs ?

Revision history for this message
YG Kumar (ygk-kmr) wrote :

Restarting the rabbitmq and syncing the time did the trick.. You can close this bug..

Revision history for this message
YG Kumar (ygk-kmr) wrote :

Thanks Brian for your time. Appreciate your effort...

Revision history for this message
Brian Haley (brian-haley) wrote :

The patch would not have fixed the issue, just added some logging so it was obvious what the agent was doing.

And yes, syncing time is important, it could have just been the time difference on the agents and server causing things to seem broken. Glad you solved your issue.

Changed in neutron:
status: Incomplete → Invalid
Revision history for this message
YG Kumar (ygk-kmr) wrote :

You can close this bug. Found the issue. Its the time sync issue between the agents and the neutron server. Once after fixing it and restarting rabbitmq, they stopped fluctuating..

Thanks Brian for your time and appreciate it...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.