[Neutron][L3 HA] After destroying controller with active ha_state all agents are standby

Bug #1572165 reported by Alexander Gromov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Status tracked in 10.0.x
10.0.x
Fix Committed
Medium
Ann Taraday
8.0.x
Won't Fix
Medium
MOS Neutron
9.x
Won't Fix
Medium
Ann Taraday

Bug Description

Environment:
MOS 8.0
3 controllers + 2 computes + VLAN L3 HA or VxLAN L2 POP L3 HA

Steps to reproduce:
1. Create network1, network2
2. Create router1 and connect it with network1, network2 and external net
3. Boot vm1 in network1
4. Boot vm2 in network2 and associate floating ip
5. Add rules for ping
6. Find node with active ha_state for router
7. If node from step 6 isn't primary controller, reschedule router1 to primary by banning all another and then clear them
8. Start ping vm2 from vm1 by floating ip
9. Destroy primary controller (l3 agent on it should be with ACTIVE ha_state)
10. Stop ping

Expected results:
ping lost no more than 10 packets
another l3-agent became active

Actual result:
ping lost no more than 10 packets
but all l3-agents are standby

root@node-4:~# neutron l3-agent-list-hosting-router fb6a7255-9b5c-46f5-a883-28426ebc99d8
+--------------------------------------+--------------------------+----------------+-------+----------+
| id | host | admin_state_up | alive | ha_state |
+--------------------------------------+--------------------------+----------------+-------+----------+
| 759a3b60-645b-46df-bc05-43cefe8f4760 | node-5.test.domain.local | True | :-) | standby |
| b66c0707-1ab6-4e75-ad94-eab6f1134769 | node-3.test.domain.local | True | xxx | standby |
| 30e36618-dcae-44d9-89e8-1ce78c23be3f | node-4.test.domain.local | True | :-) | standby |
+--------------------------------------+--------------------------+----------------+-------+----------+

This problem also exist for non primary controller with l3 agent with ACTIVE ha_state.

Tags: area-mos
Revision history for this message
Alexander Gromov (agromov) wrote :
Dmitry Klenov (dklenov)
tags: added: area-mos
Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Need to increase timeout in tests.

Changed in mos:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Ann Kamyshnikova (akamyshnikova)
no longer affects: fuel
no longer affects: fuel/8.0.x
no longer affects: fuel/mitaka
no longer affects: fuel/newton
Changed in mos:
milestone: none → 8.0-updates
Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :

This situation was reproduced (now controller nodes have 8G and 2CPU)

In namespaces we can see the following information:
https://paste.mirantis.net/show/2207/

Revision history for this message
Alexander Gromov (agromov) wrote :

Note to description: configuration was 3 controllers + 2 computes-cinder + VxLAN L2 POP L3 HA

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Targeted to 9.0 and 10.0, Won't Fix for 8.0-updates because of Medium importance

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Won't fix in 9.0 since it has Medium priority. Also not sure if that issue reproduced in MOS >= 9.0.

Changed in mos:
milestone: 9.0 → 9.1
Revision history for this message
Alexander Ignatov (aignatov) wrote :

This bug originally was found in MOS 8.0 and there is no logs, snapshots or other artifacts to continue investigations. Kristina or Alexander Gromov, could you please reproduce this issue in MOS 9.1 with your specific test cases or close this bug as Invalid as needed.

Changed in mos:
status: Confirmed → Incomplete
assignee: Ann Taraday (akamyshnikova) → Kristina Kuznetsova (kkuznetsova)
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :
Changed in mos:
status: Incomplete → Confirmed
Changed in mos:
assignee: Kristina Berezovskaia (kkuznetsova) → Ann Taraday (akamyshnikova)
Revision history for this message
Ann Taraday (akamyshnikova) wrote :
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Moving to 9.3 since it's unlikely to be fixed in upstream before 9.2 HCF.

Changed in mos:
milestone: 9.1 → 9.3
summary: - [MOS 8.0][Neutron][L3 HA] After destroying controller with active
- ha_state all agents are standby
+ [Neutron][L3 HA] After destroying controller with active ha_state all
+ agents are standby
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

Fix was merged and backported for Newton and Mitaka - https://review.openstack.org/#/q/If5596eb24041ea9fae1d5d2563dcaf655c5face7,n,z

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.