[neutron] neutron 2013.2.1 heisenbug

Bug #1263922 reported by Leontii Istomin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
4.1.x
Won't Fix
Medium
MOS Neutron
6.0.x
Won't Fix
Medium
MOS Neutron

Bug Description

I installed OpenStack by Mirantis OpenStack 4.0:
Ubuntu, HA, neutron with GRE.
I have error about uploading the image only.

Problem that dhcp and l3 neutron agents haven't been started.

I could start DHCP agent after the following commands:
root@node-1:~# crm_resource --resource p_neutron-dhcp-agent --cleanup --node `uname -n`
Cleaning up p_neutron-dhcp-agent on node-1
Waiting for 1 replies from the CRMd. OK
root@node-1:~# crm_resource --resource p_neutron-l3-agent --cleanup --node `uname -n`
Cleaning up p_neutron-l3-agent on node-1
Waiting for 1 replies from the CRMd. OK
root@node-1:~# crm resource manage p_neutron-l3-agent
root@node-1:~# crm resource manage p_neutron-dhcp-agent

However L3 agent still in down. I can't find the root of problem. Snapshot attached.
root@node-1:~# crm status
Last updated: Tue Dec 24 12:55:57 2013
Last change: Tue Dec 24 10:48:06 2013 via cibadmin on node-1
Stack: openais
Current DC: node-1 - partition with quorum
Version: 1.1.8-f722cf1
3 Nodes configured, 3 expected votes
17 Resources configured.

Online: [ node-1 node-4 node-5 ]

 vip__management_old (ocf::heartbeat:IPaddr2): Started node-1
 vip__public_old (ocf::heartbeat:IPaddr2): Started node-4
 Clone Set: clone_p_haproxy [p_haproxy]
     Started: [ node-1 node-4 node-5 ]
 Clone Set: clone_p_mysql [p_mysql]
     Started: [ node-1 node-4 node-5 ]
 Clone Set: clone_p_neutron-plugin-openvswitch-agent [p_neutron-plugin-openvswitch-agent]
     Started: [ node-1 node-4 node-5 ]
 Clone Set: clone_p_neutron-metadata-agent [p_neutron-metadata-agent]
     Started: [ node-1 node-4 node-5 ]
 p_neutron-dhcp-agent (ocf::mirantis:neutron-agent-dhcp): Started node-5
 p_neutron-l3-agent (ocf::mirantis:neutron-agent-l3): Started node-5 (unmanaged) FAILED
 heat-engine (ocf::mirantis:heat-engine): Started node-5

Failed actions:
    p_neutron-plugin-openvswitch-agent_monitor_20000 (node=node-1, call=17563, rc=1, status=complete): unknown error
    p_haproxy_start_0 (node=node-1, call=17371, rc=1, status=complete): unknown error
    p_neutron-plugin-openvswitch-agent_monitor_20000 (node=node-4, call=16966, rc=1, status=complete): unknown error
    p_haproxy_start_0 (node=node-4, call=16808, rc=1, status=complete): unknown error
    p_haproxy_start_0 (node=node-5, call=14278, rc=1, status=complete): unknown error
    p_neutron-plugin-openvswitch-agent_monitor_20000 (node=node-5, call=14601, rc=1, status=complete): unknown error
    p_neutron-l3-agent_stop_0 (node=node-5, call=14384, rc=1, status=Timed Out): unknown error

root@node-1:~# neutron agent-list
+--------------------------------------+--------------------+--------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------+-------+----------------+
| 0f9e29e7-5dde-4d6e-9699-313ca1379025 | Open vSwitch agent | node-3 | xxx | True |
| 13ea42d4-3f61-4668-80a6-eff0baebde53 | L3 agent | node-5 | xxx | True |
| 43b5e6b8-a142-4f5c-aee1-85474d0fef5b | Open vSwitch agent | node-1 | :-) | True |
| 5b5a5cb2-5de5-4083-8935-5a61c58c1dfd | Open vSwitch agent | node-2 | xxx | True |
| 7474b68f-b923-437f-92b9-2ce98e306a4c | Open vSwitch agent | node-2 | xxx | True |
| 7638334b-bf78-47fb-bfb1-d45af876d7e8 | Open vSwitch agent | node-5 | :-) | True |
| 94b3bf72-c9e4-4205-9a02-9c304a4910b7 | Open vSwitch agent | node-2 | xxx | True |
| ad2df5b8-1fc9-4603-8753-2551a17bc8ac | Open vSwitch agent | node-2 | xxx | True |
| bcd56820-7a1b-4d45-bbb6-87986e520b6b | Open vSwitch agent | node-4 | :-) | True |
| ddc0c0d6-2dc0-4c0e-a795-cfba8b285d86 | Open vSwitch agent | node-3 | xxx | True |
| e367b228-47c3-4325-a1df-9380f7cd1f31 | DHCP agent | node-5 | :-) | True |
| f7a477a8-c992-42f9-9069-654ea12f0f03 | Open vSwitch agent | node-2 | xxx | True |
+--------------------------------------+--------------------+--------+-------+----------------+

Tags: neutron
Revision history for this message
Leontii Istomin (listomin) wrote :
Revision history for this message
Leontii Istomin (listomin) wrote :

I found the following string in the log on l3 agent:
2013-12-24T13:12:56.489508+00:00 debug: 2013-12-24 13:12:51.436 30418 TRACE neutron.openstack.common.rpc.amqp MultipleAgentFoundByTypeHost: Multiple agents with agent_type=Open vSwitch agent and host=node-3 found

After that I have deleted agents except one:
root@node-1:~# neutron agent-list
+--------------------------------------+--------------------+--------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------+-------+----------------+
| 0f9e29e7-5dde-4d6e-9699-313ca1379025 | Open vSwitch agent | node-3 | :-) | True |
| 43b5e6b8-a142-4f5c-aee1-85474d0fef5b | Open vSwitch agent | node-1 | :-) | True |
| 5b5a5cb2-5de5-4083-8935-5a61c58c1dfd | Open vSwitch agent | node-2 | :-) | True |
| 7638334b-bf78-47fb-bfb1-d45af876d7e8 | Open vSwitch agent | node-5 | :-) | True |
| bcd56820-7a1b-4d45-bbb6-87986e520b6b | Open vSwitch agent | node-4 | :-) | True |
| bf5d85ff-be5e-4399-a400-cc1aa1872ebb | DHCP agent | node-1 | :-) | True |
| cc4ee7f6-94d5-47b8-9dc8-b0833c06cc9b | L3 agent | node-4 | :-) | True |
+--------------------------------------+--------------------+--------+-------+----------------+

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Can you please provide ISO build number as well?

Revision history for this message
Leontii Istomin (listomin) wrote :

I use the fuel-4.0-191-2013-12-22_00-01-41.iso

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Unable to reproduce on ISO 202

Changed in fuel:
status: New → Incomplete
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 4.0
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

I saw that behavior about 3 times. On ISO before #200.
But can't reproduce this heisenbug.

The distinctive features of this bug:
* MultipleAgentFoundByTypeHost exception in Neutron server log
* more than one Neutron OVS agents registered for one of controller (result of neutron agent-list command)

Speps to cure on live environment:
* ssh to any controller node.
* remove ALL duplicated ovs agent from neutron database (neutron agent delete AGENT_ID)

I think this problem should be researched in 4.1 release cycle.

Changed in fuel:
status: Incomplete → Confirmed
milestone: 4.0 → 4.1
summary: - neutron dhcp/l3 agents haven't started
+ neutron 2013.2.1 heisenbug
Changed in fuel:
assignee: nobody → Sergey Vasilenko (xenolog)
Changed in fuel:
importance: Undecided → Low
status: Confirmed → Triaged
milestone: 4.1 → 5.0
Revision history for this message
Sergey Vasilenko (xenolog) wrote : Re: neutron 2013.2.1 heisenbug

We should test it on 2013.2.2.
As I remember NEutron community fixed it in Neutron

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Could you please post info on upstream bug?

Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #225

"build_id": "2014-02-28_01-17-30",
"mirantis": "yes",
"build_number": "225",
"nailgun_sha": "12a7e7a99557f2bc302f0806ad3beef02e94b974",
"ostf_sha": "ceb3ea8c2c0da27306b30b9936f27dbc5044d2c6",
"fuelmain_sha": "ba019bf15a9597a154e7c1d6ecc840614d21414c",
"astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a",
"release": "4.1",
"fuellib_sha": "61d3a150402da3ce1160836c8d659f6d9d1f9640"

1. Create new environment (CentOS, HA mode)
2. Choose VLAN segmentation
3. Add 3 controllers, compute and cinder node
4. Start deployment. It was successful
5. Add any compute node
6. Redeploy environment. It was successful
7. Check one controller and click button "Delete". Node has status "Pending Deletion"
8. Add any controller node
9. Start deployment. It has failed

[root@node-1 ~]# neutron agent-list
+--------------------------------------+--------------------+-------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+-------------------+-------+----------------+
| 092aeb94-db67-4e97-aaa2-79ff25cd9c20 | Open vSwitch agent | node-1.domain.tld | xxx | True |
| 0d5b316a-4d5b-4d89-9c9a-f6d208f82b8d | Open vSwitch agent | node-1.domain.tld | xxx | True |
| 1b97ef47-78b2-45b7-a82b-70210f2b930e | L3 agent | node-7.domain.tld | :-) | True |
| 24490cf5-e9ff-4991-8e7b-915f0b35a909 | Open vSwitch agent | node-1.domain.tld | xxx | True |
| 2f025fab-bdbe-44f4-87cd-844aff6f14f2 | Open vSwitch agent | node-8.domain.tld | :-) | True |
| 3585d023-7ffa-4fee-8e51-2e9acd389e4f | Open vSwitch agent | node-6.domain.tld | :-) | True |
| 37e9aa81-7353-4aac-8f51-3e8e6d915a15 | Open vSwitch agent | node-4.domain.tld | :-) | True |
| 55c5d7d6-c3d5-478b-a166-30ac77544b19 | Open vSwitch agent | node-1.domain.tld | xxx | True |
| c0c6af93-e8de-4661-a1cd-0aacbbb01eea | DHCP agent | node-8.domain.tld | :-) | True |
| c70960ed-0c20-4dac-9175-cb98733c07ea | DHCP agent | node-1.domain.tld | xxx | True |
| d0fd4369-2b35-4ed5-8b90-9c7f882a1265 | Open vSwitch agent | node-3.domain.tld | :-) | True |
| f7205d9d-11e4-46aa-9f7d-248c2bc2518b | Open vSwitch agent | node-7.domain.tld | :-) | True |
+--------------------------------------+--------------------+-------------------+-------+----------------+

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Reproduced on ISO #235
"build_id": "2014-03-05_07-31-01",
"mirantis": "yes",
"build_number": "235",
"nailgun_sha": "f58aad317829112913f364347b14f1f0518ad371",
"ostf_sha": "dc54d99ddff2f497b131ad1a42362515f2a61afa",
"fuelmain_sha": "16637e2ea0ae6fe9a773aceb9d76c6e3a75f6c3b",
"astute_sha": "f15f5615249c59c826ea05d26707f062c88db32a",
"release": "4.1",
"fuellib_sha": "73313007c0914e602246ea41fa5e8ca2dfead9f8"

1. Create new environment (CentOS, HA mode)
2. Choose VLAN segmentation
3. Choose installing Ceilometer
4. Add 3 controllers, compute and cinder node
5. Start deployment. It was successful

[root@node-36 ~]# neutron agent-list
+--------------------------------------+--------------------+--------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------------------+-------+----------------+
| 2d33bd59-1b6c-4fb5-b9e5-2ccd4333653b | Open vSwitch agent | node-36.domain.tld | xxx | True |
| 49fa9961-a8d3-48ba-bd34-509e46e2efed | Open vSwitch agent | node-36.domain.tld | xxx | True |
| 5951d378-9727-4c37-ad80-4d1faa49ac6c | L3 agent | node-37.domain.tld | :-) | True |
| 8c77098b-574a-4748-8452-95aa5bb3c3f8 | Open vSwitch agent | node-36.domain.tld | xxx | True |
| a39b09dd-41dd-4df2-bab1-ff47cee22f42 | Open vSwitch agent | node-37.domain.tld | :-) | True |
| c3dd3665-9e6d-4c8a-b50a-f2a51b0c1dc6 | Open vSwitch agent | node-39.domain.tld | :-) | True |
| ce934944-808c-445b-98b2-b6489ccad0c7 | DHCP agent | node-38.domain.tld | :-) | True |
| d2ec7a1b-1bb4-4b08-a70b-330ab7308051 | Open vSwitch agent | node-38.domain.tld | :-) | True |
+--------------------------------------+--------------------+--------------------+-------+----------------+

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Revision history for this message
Anastasia Palkina (apalkina) wrote :

"build_id": "2014-03-21_01-01-16",
"mirantis": "yes",
"build_number": "36",
"nailgun_sha": "102bfc56d2e7c9d7295066cc30c21053fca44d79",
"ostf_sha": "608f109b6f07b695305468627941b8505abe3c7f",
"fuelmain_sha": "b853f8ad06516d83ad4956e945be923b5aa3ec9d",
"astute_sha": "f52db1262401cfb2beaf41e7d6eeaeb456d5fe95",
"release": "5.0",
"fuellib_sha": "54072cce48fc539e4d99f4862f9a5568b3ef3384"

1. Create new environment (CentOS, HA mode)
2. Choose VLAN segmentation
3. Add 3 controllers, compute and cinder
4. Start deployment. It was successful

[root@node-35 ~]# neutron agent-list
+--------------------------------------+--------------------+--------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------------------+-------+----------------+
| 2eb45d86-6518-4fc2-841f-0fb71f028639 | Open vSwitch agent | node-35.domain.tld | xxx | True |
| 3b5a02b2-1446-49e6-b729-4392ddc38dbe | Open vSwitch agent | node-35.domain.tld | xxx | True |
| 75c66912-9f2b-483a-8447-49eb61f66b16 | Open vSwitch agent | node-35.domain.tld | xxx | True |
| 87c2b0b5-4eaa-46eb-bc38-c489eb0661ad | DHCP agent | node-37.domain.tld | :-) | True |
| 947ceb71-3efb-4537-adfc-6b2cab17900e | Open vSwitch agent | node-38.domain.tld | :-) | True |
| c6a95a1f-6a0e-4f15-9dfe-8529d688fe16 | L3 agent | node-36.domain.tld | :-) | True |
| d4b1d7ca-7810-478e-8ba8-5c122de93704 | Open vSwitch agent | node-37.domain.tld | :-) | True |
| df559224-3831-45f2-b05d-4cddae3d371e | Open vSwitch agent | node-35.domain.tld | xxx | True |
| dfebb740-91d6-40a2-ac11-0cbddf443277 | Open vSwitch agent | node-36.domain.tld | :-) | True |
+--------------------------------------+--------------------+--------------------+-------+----------------+

Revision history for this message
Anastasia Palkina (apalkina) wrote :
Changed in fuel:
milestone: 5.0 → 5.1
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Try to reproduce on ISO #262
"build_id": "2014-06-20_00-31-14",
"mirantis": "yes",
"build_number": "262",
"ostf_sha": "2f30e5cab5bec1f1e2fd80e26e4da771a8ffe2d4",
"nailgun_sha": "0c5e3b94fdd6bc9a50d5f840bf5151f95a23d908",
"production": "docker",
"api": "1.0",
"fuelmain_sha": "4f547561532baf5f26733bf66db692dc5b61806d",
"astute_sha": "694b5a55695e01e1c42185bfac9cc7a641a9bd48",
"release": "5.1",
"fuellib_sha": "25eb618a33a2ec87bc56f6bad16dc25b1837f0f0"

1. Create new environment (Ubuntu, HA mode)
2. Choose VLAN segmentation
3. Add 3 controllers, compute and cinder
4. Start deployment. It was successful

On all controllers the same situation (node-11,12,13)

root@node-13:~# neutron agent-list
+--------------------------------------+--------------------+---------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+---------+-------+----------------+
| 1a1e6615-c964-4b91-8db9-daeb5bc38b58 | Open vSwitch agent | node-13 | :-) | True |
| 1af28b2b-e24d-4e26-b7d0-62765e6f31bd | Metadata agent | node-11 | :-) | True |
| 1daee4df-1891-4d77-baea-aace1bd47d7a | Open vSwitch agent | node-11 | :-) | True |
| 256d4bbf-164e-44f1-a747-871ceb0b6a58 | Open vSwitch agent | node-12 | :-) | True |
| 797a7120-ac43-4a42-9e9e-7fd7f0801c8d | DHCP agent | node-12 | :-) | True |
| 92c65a90-ede2-47fe-b3d6-d8a34994010e | L3 agent | node-11 | xxx | True |
| 9b68e787-2a7a-4b3f-8a8d-ef5b788d0127 | Metadata agent | node-13 | :-) | True |
| acd88b0e-a63e-420b-a503-f224da757383 | Open vSwitch agent | node-14 | :-) | True |
| c34d24b4-e578-4aa3-a4a8-7fc6229e0db0 | L3 agent | node-13 | :-) | True |
| ce7e0a66-0827-4828-8506-beb21a6330f9 | Metadata agent | node-12 | :-) | True |
+--------------------------------------+--------------------+---------+-------+----------------+

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Eugene Nikanorov (enikanorov)
Dmitry Ilyin (idv1985)
summary: - neutron 2013.2.1 heisenbug
+ [library] neutron 2013.2.1 heisenbug
summary: - [library] neutron 2013.2.1 heisenbug
+ [neutron] neutron 2013.2.1 heisenbug
Mike Scherbakov (mihgen)
no longer affects: fuel
Changed in mos:
milestone: none → 5.1
assignee: nobody → Eugene Nikanorov (enikanorov)
tags: added: neutron
Changed in mos:
milestone: 5.1 → 6.0
Changed in mos:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Suggest to close this bug as Invalid for 5.0.1+
This was probably fixed in upstream (the issue about correct constraint in agents table) and we got it by merging stable/icehouse.

Changed in mos:
status: Triaged → Incomplete
Changed in mos:
status: Incomplete → Won't Fix
assignee: Eugene Nikanorov (enikanorov) → MOS Neutron (mos-neutron)
no longer affects: mos
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Marked as Won't Fix for 4.1.x since we will not release new versions of 4.x branch, also Won't Fix for 6.0 since it's heisenbug and didn't occur in 5.x yet and 6.x as well. Once it occurs we will reopen this bug.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.