[MOS 8.0][Neutron][L3 HA] After resetting primary controller with active l3-agent the active l3-agent is not changed

Bug #1572207 reported by Alexander Gromov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Invalid
High
Kristina Berezovskaia
9.x
Invalid
High
Kristina Berezovskaia

Bug Description

Environment:
MOS 8.0
3 controllers + 2 computes + VLAN L3 HA or VxLAN L2 POP L3 HA

Steps to reproduce:
1. Create network1, network2
2. Create router1 and connect it with network1, network2 and external net
3. Boot vm1 in network1
4. Boot vm2 in network2 and associate floating ip
5. Add rules for ping
6. Find node with active ha_state for router
7. If node from step 6 isn't primary controller, reschedule router1 to primary by banning all another and then clear them
8. Start ping vm2 from vm1 by floating ip
9. Reset primary controller
10. Stop ping

Agents before test:
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| b66c0707-1ab6-4e75-ad94-eab6f1134769 | node-3.test.domain.local | True | :-) | active |
| 759a3b60-645b-46df-bc05-43cefe8f4760 | node-5.test.domain.local | True | :-) | standby |
| 30e36618-dcae-44d9-89e8-1ce78c23be3f | node-4.test.domain.local | True | :-) | standby |
+--------------------------------------+--------------------------+----------------+-------+----------+

Expected results:
ping lost NO MORE than 10 packets
ANOTHER agent has ACTIVE ha_state, others (2) has STAND BY ha_state
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| b66c0707-1ab6-4e75-ad94-eab6f1134769 | node-3.test.domain.local | True | :-) | standby |
| 759a3b60-645b-46df-bc05-43cefe8f4760 | node-5.test.domain.local | True | :-) | standby |
| 30e36618-dcae-44d9-89e8-1ce78c23be3f | node-4.test.domain.local | True | :-) | active |
+--------------------------------------+--------------------------+----------------+-------+----------+

Actual result:
ping lost MORE than 10 packets
THE SAME l3-agent becomes active after some time, others (2) has STANDBY ha_state
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| b66c0707-1ab6-4e75-ad94-eab6f1134769 | node-3.test.domain.local | True | :-) | active |
| 759a3b60-645b-46df-bc05-43cefe8f4760 | node-5.test.domain.local | True | :-) | standby |
| 30e36618-dcae-44d9-89e8-1ce78c23be3f | node-4.test.domain.local | True | :-) | standby |
+--------------------------------------+--------------------------+----------------+-------+----------+

Tags: area-mos
Revision history for this message
Alexander Gromov (agromov) wrote :
Revision history for this message
Dmitry Klenov (dklenov) wrote :

Neutron folks, please verify, if this one os a dup of https://bugs.launchpad.net/fuel/+bug/1572165.

Changed in fuel:
assignee: nobody → MOS Neutron (mos-neutron)
tags: added: area-mos
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Need to increase timeout in tests.

Changed in fuel:
status: New → Incomplete
assignee: MOS Neutron (mos-neutron) → nobody
milestone: none → 8.0-updates
assignee: nobody → MOS QA Team (mos-qa)
Changed in fuel:
assignee: MOS QA Team (mos-qa) → Kristina Kuznetsova (kkuznetsova)
Changed in mos:
assignee: nobody → Kristina Kuznetsova (kkuznetsova)
importance: Undecided → High
status: New → In Progress
milestone: none → 9.0
no longer affects: fuel/mitaka
no longer affects: fuel
Revision history for this message
Alexander Gromov (agromov) wrote :

Reproduced on CI again for the following configuration: 8GB RAM, 2 CPU (for all nodes), 3 controllers 2 computes-cinder, VxLAN L2POP L3HA

Revision history for this message
Alexander Gromov (agromov) wrote :

After resetting the same l3 agent was active.

Revision history for this message
Ann Taraday (akamyshnikova) wrote :

I looked into this issue. Although "neutron l3-agent-list-hosting-router router-id" shows standby for all agents, ping is available and if we check "cat /var/lib/neutron/ha_confs/<router-id>/state" we can see that another agent has become master. Also in neutron-keepalived-state-change.log there is log message that this agent became master http://paste.openstack.org/show/497923/.

In logs of l3-agents (and all other neutron logs) http://paste.openstack.org/show/497925/, so seems that was lost connection to rabbit, so state change of the agent was lost for l3-agent-list-hosting-router.

Workaround for tests check "cat /var/lib/neutron/ha_confs/<router-id>/state" for state change.

Changed in mos:
milestone: 9.0 → 9.0-updates
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.