MultipleAgentFoundByTypeHost: Multiple agents with agent_type=Open vSwitch agent

Bug #1322228 reported by Timur Nurlygayanov
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Timur Nurlygayanov
4.1.x
Won't Fix
High
Timur Nurlygayanov
5.0.x
Fix Committed
High
Timur Nurlygayanov

Bug Description

Environment:
Fuel 4.1a, baremetal, HA, CentOS, Neutron GRE

Fuel Version:
{"build_id": "2014-04-08_13-34-02", "mirantis": "yes", "build_number": "272", "nailgun_sha": "7a05e365240ab27c492b20585ef8ac8557102cc0", "ostf_sha": "de0222fed646525d248dc6892eeceab139d5c469", "fuelmain_sha": "306a50188d07bc1f09b4f3411fc5ebe057adf569", "astute_sha": "55df06b2e84fa5d71a1cc0e78dbccab5db29d968", "release": "4.1A", "fuellib_sha": "52e7f57695f33bafa5d84d524d77f1bc3a2289b2"}

Steps To Reproduce:
1. Deploy the HA OpenStack cloud with Fuel 4.1a on many baremetal servers (49 servers in my case)
2. Start to create 20 VMs with Fedora-17 image in parallel.

Observed Result:
Some VMs can not get the IP addresses (please, see attached screenshot) and I can see the following error in the /var/log/neutron-all.log:

MultipleAgentFoundByTypeHost: Multiple agents with agent_type=Open vSwitch agent and host=node-16.domain.tld found

________________________________________

Fuel Snapshot is available by the following link: https://drive.google.com/file/d/0Byup6hoNUUUeTzJaWXFWSUFleU0/edit?usp=sharing

----------------------------------------
Sergey Vasilenko:
# neutron agent-list | awk -F'|' '{print $4 $3}' | sort
node-19.domain.tld Open vSwitch agent
node-19.domain.tld Open vSwitch agent
node-19.domain.tld Open vSwitch agent

----------------------------------------
Full log of Neutron: http://paste2.org/YLZeghvI
----------------------------------------

<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp Traceback (most recent call last):
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/amqp.py", line 438, in _process_data
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp **args)
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/common/rpc.py", line 45, in dispatch
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp neutron_ctxt, version, method, namespace, **kwargs)
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 190, in report_state
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp self.plugin.create_or_update_agent(context, agent_state)
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 153, in create_or_update_agent
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp context, agent['agent_type'], agent['host'])
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp host=host)
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp MultipleAgentFoundByTypeHost: Multiple agents with agent_type=Open vSwitch agent and host=node-19.domain.tld found
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.817 23292 TRACE neutron.openstack.common.rpc.amqp
<167>May 22 14:14:29 node-13 neutron-server 2014-05-22 14:14:28.982 23436 INFO urllib3.connectionpool [-] Starting new HTTP connection (1): 192.168.0.11
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 ERROR neutron.openstack.common.rpc.amqp [-] Exception during message handling
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp Traceback (most recent call last):
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/amqp.py", line 438, in _process_data
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp **args)
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/common/rpc.py", line 45, in dispatch
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp neutron_ctxt, version, method, namespace, **kwargs)
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 190, in report_state
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp self.plugin.create_or_update_agent(context, agent_state)
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 153, in create_or_update_agent
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp context, agent['agent_type'], agent['host'])
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp host=host)
<167>May 22 14:14:39 node-13 neutron-server 2014-05-22 14:14:30.358 23292 TRACE neutron.openstack.common.rpc.amqp MultipleAgentFoundByTypeHost: Multiple agents with agent_type=Open vSwitch agent and host=node-21.domain.tld found

tags: added: backports-4.1.1
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Please, see this screenshot for more detailed information about 'how it looks at Horizon

description: updated
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 4.1.1
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

please show 'neutron agent list' output.

description: updated
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

Neutron has strange bug.
Sometimes it may duplicate agents for node in own database.

We can check it by:

# neutron agent-list | awk -F'|' '{print $4 $3}' | sort
.....
 node-19.domain.tld Open vSwitch agent
 node-19.domain.tld Open vSwitch agent
 node-19.domain.tld Open vSwitch agent
......

Just remove all invalid records by
# neutron agent-delete <<AGENT-ID>>

Ryan Moe (rmoe)
Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Ryan Moe (rmoe) wrote :

We need the full logs from fuel and output of neutron agent-list to finish debugging this.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

[root@node-13 ~]# neutron agent-list | awk -F'|' '{print $4 $3}' | sort

 host agent_type
 node-11.domain.tld Open vSwitch agent
 node-12.domain.tld Open vSwitch agent
 node-13.domain.tld Open vSwitch agent
 node-14.domain.tld Open vSwitch agent
 node-15.domain.tld Open vSwitch agent
 node-16.domain.tld Open vSwitch agent
 node-16.domain.tld Open vSwitch agent
 node-17.domain.tld Open vSwitch agent
 node-18.domain.tld Open vSwitch agent
 node-19.domain.tld Open vSwitch agent
 node-19.domain.tld Open vSwitch agent
 node-19.domain.tld Open vSwitch agent
 node-1.domain.tld Open vSwitch agent
 node-20.domain.tld Open vSwitch agent
 node-21.domain.tld Open vSwitch agent
 node-21.domain.tld Open vSwitch agent
 node-22.domain.tld Open vSwitch agent
 node-23.domain.tld Open vSwitch agent
 node-24.domain.tld L3 agent
 node-24.domain.tld Open vSwitch agent
 node-25.domain.tld DHCP agent
 node-25.domain.tld Open vSwitch agent
 node-26.domain.tld Open vSwitch agent
 node-27.domain.tld Open vSwitch agent
 node-28.domain.tld Open vSwitch agent
 node-29.domain.tld Open vSwitch agent
 node-2.domain.tld Open vSwitch agent
 node-30.domain.tld Open vSwitch agent
 node-30.domain.tld Open vSwitch agent
 node-31.domain.tld Open vSwitch agent
 node-32.domain.tld Open vSwitch agent
 node-33.domain.tld Open vSwitch agent
 node-34.domain.tld Open vSwitch agent
 node-35.domain.tld Open vSwitch agent
 node-36.domain.tld Open vSwitch agent
 node-37.domain.tld Open vSwitch agent
 node-38.domain.tld Open vSwitch agent
 node-39.domain.tld Open vSwitch agent
 node-39.domain.tld Open vSwitch agent
 node-3.domain.tld Open vSwitch agent
 node-40.domain.tld Open vSwitch agent
 node-40.domain.tld Open vSwitch agent
 node-41.domain.tld Open vSwitch agent
 node-41.domain.tld Open vSwitch agent
 node-42.domain.tld Open vSwitch agent
 node-43.domain.tld Open vSwitch agent
 node-44.domain.tld Open vSwitch agent
 node-45.domain.tld Open vSwitch agent
 node-46.domain.tld Open vSwitch agent
 node-47.domain.tld Open vSwitch agent
 node-48.domain.tld Open vSwitch agent
 node-49.domain.tld Open vSwitch agent
 node-4.domain.tld Open vSwitch agent
 node-50.domain.tld Open vSwitch agent
 node-5.domain.tld Open vSwitch agent
 node-6.domain.tld Open vSwitch agent
 node-7.domain.tld Open vSwitch agent
 node-8.domain.tld Open vSwitch agent
 node-9.domain.tld Open vSwitch agent

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Download full text (6.7 KiB)

[root@node-13 ~]# neutron agent-list
+--------------------------------------+--------------------+--------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------------------+-------+----------------+
| 003fbafc-880a-4158-ae43-caa6463f5a22 | Open vSwitch agent | node-27.domain.tld | :-) | True |
| 07d453b8-d431-4408-b984-bc23b522f4ef | Open vSwitch agent | node-46.domain.tld | :-) | True |
| 09ada459-5948-4550-bc14-2e80d95049be | Open vSwitch agent | node-33.domain.tld | :-) | True |
| 09d5e216-ed7f-4add-8d35-739f322eae80 | Open vSwitch agent | node-16.domain.tld | xxx | True |
| 0b7834eb-07b4-4176-953f-e00177e9f3c1 | Open vSwitch agent | node-24.domain.tld | :-) | True |
| 0c36c627-adb1-4801-b4b7-001cbc45da02 | Open vSwitch agent | node-31.domain.tld | :-) | True |
| 18826bae-3e6c-4a28-b3bf-b3ab54e2b051 | Open vSwitch agent | node-38.domain.tld | :-) | True |
| 28a340e8-19e0-4e5f-97df-82c7d40bdef5 | Open vSwitch agent | node-19.domain.tld | xxx | True |
| 2fb225e0-d472-431f-afce-81075eb5a979 | Open vSwitch agent | node-3.domain.tld | :-) | True |
| 30e4d978-c783-43c9-a02b-dfbe3b18d9ba | Open vSwitch agent | node-28.domain.tld | :-) | True |
| 319658ed-a76b-4a33-b9cd-9c7e568de671 | Open vSwitch agent | node-19.domain.tld | xxx | True |
| 349d9447-b944-4d74-b963-8371471eacb9 | Open vSwitch agent | node-21.domain.tld | xxx | True |
| 3af4b2af-65bc-4559-8c6f-f0d5502feb67 | Open vSwitch agent | node-23.domain.tld | :-) | True |
| 43cbb09a-838a-4d6f-bb52-9f18976ba91d | Open vSwitch agent | node-50.domain.tld | :-) | True |
| 45072e81-4651-4668-aba9-9de9212da989 | Open vSwitch agent | node-36.domain.tld | :-) | True |
| 472483bb-c75b-46af-b354-04a13f3432fe | Open vSwitch agent | node-29.domain.tld | :-) | True |
| 492225f9-b7a0-4754-b2a9-a30d53764436 | Open vSwitch agent | node-12.domain.tld | :-) | True |
| 4b7db2ed-bc00-44ae-81db-7afeed66549c | Open vSwitch agent | node-19.domain.tld | xxx | True |
| 4ee8c886-4c2e-46be-a989-dfc39a677715 | Open vSwitch agent | node-35.domain.tld | :-) | True |
| 5129db1c-1d04-4e2d-a791-7d34ce099a68 | Open vSwitch agent | node-42.domain.tld | :-) | True |
| 52c9ce24-1284-4e3c-96f5-0b84f3f2b2d1 | Open vSwitch agent | node-2.domain.tld | :-) | True |
| 535154d6-5151-4ee1-8dd7-5a3c23f6372f | Open vSwitch agent | node-11.domain.tld | :-) | True |
| 58ef89af-7010-4d61-89ee-45b6cf6a161b | Open vSwitch agent | node-39.domain.tld | xxx | True |
| 5c2e43ab-68ed-482d-912e-7fb153132b8a | DHCP agent | node-25.domain.tld | :-) | True |
| 5dbe409a-04ae-4454-98a3-0a77e8d62547 | Open vSwitch agent | node-48.domain.tld | :-) | True |
| 60787cbf-6158-4fb0-9e57-68745d3783d3 | Open vSwitch agent | node-30.domain.tld | xxx | True |
| 704d8e0b-74c8-49f8-bde3-e48f1...

Read more...

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Ryan, Fuel Snapshot is available by the following link: https://drive.google.com/file/d/0Byup6hoNUUUeTzJaWXFWSUFleU0/edit?usp=sharing

description: updated
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Additional information: I have manually removed all agents with 'xxx' status and after that executed Rully load tests for Nova (create and delete VMs).
During the tests I can see that agents randomly became to status 'xxx' and pacemaker return them to active state. But it is don't work, agents will fail every few secconds.

description: updated
Revision history for this message
Ryan Moe (rmoe) wrote :

I see lots of AMQP errors in the logs for most services on all nodes. This is probably the reason the agents seem to go offline randomly. I don't know if this could cause multiple agents to be registered or not though.

Revision history for this message
Ryan Moe (rmoe) wrote :

This looks similar to an issue fixed in Icehouse. https://bugs.launchpad.net/neutron/+bug/1254246. See this commit: https://review.openstack.org/#/c/58814/

Changed in fuel:
status: Incomplete → Confirmed
Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

This is another bug and fix for https://bugs.launchpad.net/fuel/+bug/1315338 doesn't fix this issue.

When user deploys OpenStack with 20+ nodes in HA mode we will see many errors with duplicated Neutron Agents after the deployment.

This is problem of Fuel and we should fix it in 4.1.2 and 5.x

Changed in fuel:
milestone: 4.1.1 → 4.1.2
milestone: 4.1.2 → 5.0.1
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Timur, this seems to be a duplicate of https://bugs.launchpad.net/neutron/+bug/1254246 . Backporting the fix to Havana is problematic, though, as it requires you to change the DB schema and it's not clear how to upgrade to Icehouse after that (you would have to provide your own havana->icehouse migration script for neutron db schema).

As long as you are not adding new hosts after the cluster has been deployed a simple work around could be removing duplicated entries manually by the means of neutron agent-delete.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

5.x shouldn't be affected as this has been fixed upstream

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

Roman, thank you!
This issue marked as 'Fix Released', I will recheck this issue on my environment with Fuel 5.1.

Changed in fuel:
status: Triaged → Fix Released
Changed in fuel:
status: Fix Released → Won't Fix
status: Won't Fix → In Progress
assignee: Fuel Library Team (fuel-library) → Timur Nurlygayanov (tnurlygayanov)
Changed in fuel:
milestone: 5.0.1 → 4.1.2
Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.