neutron

All HA routers become active on the same agent

Bug #1365429 reported by Assaf Muller on 2014-09-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	neutron	Fix Released	High	Assaf Muller	neutron 2014.2 "juno"

Bug Description

How to reproduce:
On a setup with two L3 agents, create ten HA routers, the scheduler will place them on both agents, but the same agent will host the active instance of all ten routers. This defeats the idea of load sharing traffic across all L3 agents.

Solutions:
This can be solved in one of two ways:
1) Enable preemptive elections for HA routers. Keepalived enables a configuration value that causes VRRP pre-emptive elections. This way we can set a random VRRP priority for each router instance, and the elections process will guarantee a random distribution of active routers on the available agents. Preemptive elections have a major downside - If an agent that's hosting a master instance drops, the backup router will come in to play, but when the node is fixed the old master will re-assume its role. This second state transition is costly and redundant.
2) With non-preemptive elections the first router instance to come up will become the master. We can exploit this and send the notification from the server to the agents in a random order.

Tags:

Assaf Muller (amuller) on 2014-09-04

tags:

removed: vrrp

Revision history for this message

Kevin Benton (kevinbenton) wrote on 2014-09-04:

Why is 1 costly and redundant? A VRRP takeover isn't that bad. The other advantage is that the load will then be shared again across the agents where if you don't have preemption everything will be stuck on one router after the failure.

Revision history for this message

Assaf Muller (amuller) wrote on 2014-09-08:

A takeover is bad because it disrupts connectivity for a minimum of 8~ seconds. If you don't *have* to perform a failover, don't... You do bring up a good point though. If the failed node doesn't take back a few routers when it comes back online we'll also be at a less than ideal state when it comes to load sharing, which is what we're tying to solve. Definitely something to think about...

Assaf Muller (amuller) on 2014-09-15

Changed in neutron:
assignee:	nobody → Assaf Muller (amuller)

Assaf Muller (amuller) on 2014-09-15

Changed in neutron:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-15: Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/121620

Carl Baldwin (carl-baldwin) on 2014-09-16

Changed in neutron:
importance:	Undecided → High

Kyle Mestery (mestery) on 2014-09-17

Changed in neutron:
milestone:	none → juno-rc1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-20: Fix merged to neutron (master)

Reviewed: https://review.openstack.org/121620
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=0bd4472ef7bdb9d94f988669f34f7eaa53ca0a89
Submitter: Jenkins
Branch: master

commit 0bd4472ef7bdb9d94f988669f34f7eaa53ca0a89
Author: Assaf Muller <email address hidden>
Date: Mon Sep 15 18:11:17 2014 +0300

HA routers master state now distributed amongst agents

    We're currently running with no pre-emption, meaning that
    the first router in a cluster to go up will be the master,
    regardless of priority. Since the order in which we sent
    notifications was constant, the same agent hosted the
    master instances of all HA routers, defeating the idea
    of load sharing.

Closes-Bug: #1365429
Change-Id: Ia6fe2bd0317c241bf7eb55915df7650dfdc68210

Changed in neutron:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2014-10-03

Changed in neutron:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in neutron:
milestone:	juno-rc1 → 2014.2

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.