Mirantis OpenStack

L3 HA: Unstable behavior

Bug #1563298 reported by Ann Taraday on 2016-03-29

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
Mirantis OpenStack	Status tracked in 10.0.x
10.0.x	Fix Committed	Medium	Ann Taraday	Mirantis OpenStack 10.0
9.x	Won't Fix	Medium	Ann Taraday	Mirantis OpenStack 9.1

Bug Description

During manual testing L3 HA on scale(env-11 3 controllers, 46 computes VxLAN) was found instability in rescheduling routers behavior. As a result connection was established after longer period of time than it is expected.

During multiple reboots of (primary and non-primary) controllers (multiple restart of L3 agents) with large amount of routers >=175 after recover of rebooted node(start L3 agent that was banned) multiple agents try to become active(master):

http://paste.openstack.org/show/491593/
http://paste.openstack.org/show/491598/

After around 5 minutes only one agent is "active" and connection established normal.

This issues is floating: was reproduced with 175 routers and then with 200 routers was not reproduced(during rebooting of controllers), then it was reproduced with 200 routers and was not reproduced for 250 routers (ban/clear l3 agent).

Tags:

Ann Taraday (akamyshnikova) on 2016-03-29

Changed in mos:
importance:	Undecided → Medium

Revision history for this message

Bug Checker Bot (bug-checker) wrote on 2016-03-29: Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

steps to reproduce

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags:

added: need-info

Alexander Ignatov (aignatov) on 2016-03-29

Changed in mos:
status:	New → Confirmed
status:	Confirmed → New

Alexander Ignatov (aignatov) on 2016-03-30

Changed in mos:
status:	New → Confirmed

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-04-06:

On Neutron labs I was not able to reproduce such problem. I will continue investigation on scale lab.

Fuel Devops McRobotson (fuel-devops-robot) on 2016-04-12

Changed in mos:
status:	Confirmed → Won't Fix

Revision history for this message

Dina Belova (dbelova) wrote on 2016-04-13:

Added move-to-10.0 tag due to the fact bug was transferred from 9.0 to 10.0

tags:

added: move-to-10.0

Revision history for this message

John Schwarz (jschwarz) wrote on 2016-04-18:

Do we have any server/agent logs as an example of this error? Were there any exceptions noticeable when encountering this?

Revision history for this message

Assaf Muller (amuller) wrote on 2016-05-13:

@Ann, do you have https://review.openstack.org/#/c/162260/ applied in the environments you're testing? The issue you're seeing *might* be related to the bug described in https://bugs.launchpad.net/neutron/+bug/1525901 and is mitigated by the patch I linked.

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-05-16:

@Assaf,thanks for pointing this patch and the bug! Yes, seems that this was missing for the moment and synced later. I will recheck this, however on the virtual env with 5 nodes with Mitaka I once saw something similar, but it is really hard to reproduce :(

Logs of neutron server at that moment http://paste.openstack.org/show/495528/

Alexander Ignatov (aignatov) on 2016-07-06

tags:

added: 10.0-reviewed

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-07-15:

Scale testing on 9.0 helped to find root cause of this issue. Upstream bug https://bugs.launchpad.net/neutron/+bug/1597461 is described it.

Revision history for this message

Ann Taraday (akamyshnikova) wrote on 2016-09-02:

The bug https://bugs.launchpad.net/neutron/+bug/1597461 is fixed in upstream, but fix is not backportable, so remove it for 9.x.

Alexander Ignatov (aignatov) on 2016-09-05

Changed in mos:
status:	In Progress → Won't Fix

Revision history for this message

Alexander Ignatov (aignatov) wrote on 2016-12-02:

Fixed by https://bugs.launchpad.net/neutron/+bug/1597461

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.