neutron l3 agent doesnt migrate correctly

Bug #1379272 reported by Stanislav Makar
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Stanislav Makar
5.1.x
Invalid
High
Stanislav Makar
6.0.x
Invalid
High
Stanislav Makar

Bug Description

I found it on release 5.1 ISO + patch fuel-5.1_neutron_fix_20141001.patch

1. Create new environment (CentOS, HA mode)
2. Choose VLAN neutron
3. Add 3 controllers, 1 compute
4. Start deployment. It was succeessful
5. Create instance for admin tennant
6. Pause(suspend) primary controller
7. Waiting some time
8. p_neutron-l3-agent migrated to third controller
9. Resume primary controller
10. l3-agent migrate to primary node but namespaces is not created

 neutron agent-list
+--------------------------------------+--------------------+--------------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------------------------+-------+----------------+
| 230bb3ea-08fa-446f-b42f-056f07279660 | Open vSwitch agent | node-5.test.domain.local | :-) | True |
| 3501b3a0-d0a2-41d5-82ea-4b6f2aa1da69 | Metadata agent | node-1.test.domain.local | xxx | True |
| 41722b78-d62c-4a43-bc19-96da19379e92 | Open vSwitch agent | node-3.test.domain.local | :-) | True |
| 5de585e0-8078-4412-8cfd-22e3c8a584fd | Metadata agent | node-2.test.domain.local | :-) | True |
| 7544292b-d8fb-4b03-96f4-c205dd02caf7 | Metadata agent | node-5.test.domain.local | :-) | True |
| 8ab79c73-4711-4e70-a9bc-e9d576c38cc9 | L3 agent | node-5.test.domain.local | :-) | True |
| 985644a8-c766-40a9-bc1f-b6fda8281953 | L3 agent | node-1.test.domain.local | :-) | True |
| 9b2172f7-2d60-41a0-8b9c-58e67e75456c | DHCP agent | node-2.test.domain.local | :-) | True |
| c7b859bd-53e2-4498-aa07-04b49c99ebcf | Open vSwitch agent | node-2.test.domain.local | :-) | True |
| d0bbf0ef-a489-4460-96f0-0a1fd23e90f1 | Open vSwitch agent | node-4.test.domain.local | :-) | True |
| efaa9f45-ecd1-4047-9d0f-9723e83c526e | Open vSwitch agent | node-1.test.domain.local | :-) | True |
+--------------------------------------+--------------------+--------------------------+-------+----------------+

As we see, we have here two active l3 agents
what is the root cause of this.
reshescheduling works, logs:
2014-10-09 09:38:53,738 - INFO - Started: /usr/bin/q-agent-cleanup.py --agent=l3 --reschedule --remove-dead --admin-auth-url=http://10.108.7.2:35357/v2.0 --auth-token=508MsThA
2014-10-09 09:38:54,142 - INFO - found alive L3 agent: 8ab79c73-4711-4e70-a9bc-e9d576c38cc9
2014-10-09 09:38:54,142 - INFO - found alive L3 agent: 985644a8-c766-40a9-bc1f-b6fda8281953
2014-10-09 09:38:54,176 - INFO - _reschedule_agent_l3: rescheduling orphaned routers
2014-10-09 09:38:54,176 - INFO - _reschedule_agent_l3: ended rescheduling of orphaned routers

Revision history for this message
Stanislav Makar (smakar) wrote :
description: updated
Revision history for this message
Stanislav Makar (smakar) wrote :

after some time ( ~20-30min) - we have only one l3 agent but namespaces is not presented on new node (primiry)

neutron agent-list
+--------------------------------------+--------------------+--------------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------------------------+-------+----------------+
| 230bb3ea-08fa-446f-b42f-056f07279660 | Open vSwitch agent | node-5.test.domain.local | :-) | True |
| 3501b3a0-d0a2-41d5-82ea-4b6f2aa1da69 | Metadata agent | node-1.test.domain.local | :-) | True |
| 41722b78-d62c-4a43-bc19-96da19379e92 | Open vSwitch agent | node-3.test.domain.local | :-) | True |
| 5de585e0-8078-4412-8cfd-22e3c8a584fd | Metadata agent | node-2.test.domain.local | :-) | True |
| 7544292b-d8fb-4b03-96f4-c205dd02caf7 | Metadata agent | node-5.test.domain.local | :-) | True |
| 8ab79c73-4711-4e70-a9bc-e9d576c38cc9 | L3 agent | node-5.test.domain.local | xxx | True |
| 985644a8-c766-40a9-bc1f-b6fda8281953 | L3 agent | node-1.test.domain.local | :-) | True |
| 9b2172f7-2d60-41a0-8b9c-58e67e75456c | DHCP agent | node-2.test.domain.local | :-) | True |
| c7b859bd-53e2-4498-aa07-04b49c99ebcf | Open vSwitch agent | node-2.test.domain.local | :-) | True |
| d0bbf0ef-a489-4460-96f0-0a1fd23e90f1 | Open vSwitch agent | node-4.test.domain.local | :-) | True |
| efaa9f45-ecd1-4047-9d0f-9723e83c526e | Open vSwitch agent | node-1.test.domain.local | :-) | True |
+--------------------------------------+--------------------+--------------------------+-------+----------------+

Changed in fuel:
importance: Undecided → High
Stanislav Makar (smakar)
tags: added: fuel-lib-neutron
tags: added: to-be-covered-by-tests
Mike Scherbakov (mihgen)
tags: added: neutron
removed: fuel-lib-neutron
Revision history for this message
Stanislav Makar (smakar) wrote :

Looks like it is rare case due to that I have not reproduced it again on my new environments.
I have found that it could be connected with controller vm's time after vm resuming , which is the same when you pause it and goes until ntpd corrects it ( service ntpd restart - to speed up it). The result of it is also "christmas lights "syndrome of nova services which are up or down because all controllers have different time.

Pausing and resuming vms - is artificial case and it is not connected with real world.
So I will close it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.