Bug #1365470 “Allow an admin to evacuate a L3 agent from HA rout...” : Bugs : neutron

Akhila C (chetlapalle-akhila-b) on 2014-09-08

Changed in neutron:
assignee:	nobody → tcs_openstack_group (tcs-openstack-group)

Revision history for this message

Assaf Muller (amuller) wrote on 2014-09-08:

#1

Not 100% sure if we should lower the priority or shut down all of the interfaces of each HA router.

Carl Baldwin (carl-baldwin) on 2014-09-16

Changed in neutron:
importance:	Undecided → Medium

Mounika (mounika-pandhiri) on 2014-10-08

Changed in neutron:
assignee:	tcs_openstack_group (tcs-openstack-group) → Mounika (mounika-pandhiri)

Revision history for this message

Mounika (mounika-pandhiri) wrote on 2014-10-08:

#2

unable to replicate the bug as HARouterScheduler class is not present in l3_agent_scheduler.py.
code merges are still being done.

Revision history for this message

Assaf Muller (amuller) wrote on 2014-10-08:

#3

All of the code for the feature has been merged :)

If you check out l3_agent_scheduler, you'll see that the random scheduler and least routers scheduler have a _choose_router_agents_for_ha method which bind_ha_router uses.

On a side note, why do we care about scheduling within the context of this bug?

Mounika (mounika-pandhiri) on 2014-10-08

Changed in neutron:
assignee:	Mounika (mounika-pandhiri) → nobody

Revision history for this message

Rounak (rounak-pramanik) wrote on 2014-10-08:

#5

"neutron router-create router_name --ha=True" is also not working(throwing an error : bad request) after making change l3_ha=True in /etc/neutron/neutron.conf in latest master version.

Revision history for this message

Assaf Muller (amuller) wrote on 2014-10-08:

#6

Please paste the error.

Revision history for this message

Rounak (rounak-pramanik) wrote on 2014-10-09:

#7

"neutron router-create router_name --ha=True" is throwing an error:
Bad Request (HTTP 400) (Request-ID: req-d86a7ac7-ad55-4d46-9577-670f679b3954)

Revision history for this message

Assaf Muller (amuller) wrote on 2014-10-09:

#8

Can you paste the trace from the server log?

Sridhar Gaddam (sridhargaddam) on 2015-07-19

Changed in neutron:
assignee:	nobody → Sridhar Gaddam (sridhargaddam)

Revision history for this message

Sridhar Gaddam (sridhargaddam) wrote on 2015-07-24:

#9

@Assaf, I had a look at this and following are my observations.

In the latest code, when we set the admin_state of an agent to False, HA router on that agent is getting deleted.
In such situations, one of the backup HA Routers is taking over the role of Master.
So, I hope the requirements of the Bug are met and it is not applicable with the present code.
If my understanding is wrong, please let me know.

OTOH, I wanted to see the behavior of keepalived (version v1.2.16) when we update the priority and SIGHUP the process.
I see that when we lower the priority of the master HA Router and SIGHUP, it continues to serve the role of Master* even with the lower priority.
This is a bit strange.

[*] with some small outage (for few seconds) when one of the backup routers takeover the role of Master and goes back to Backup.

10:25:25.316856 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:27.317914 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:34.123811 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:34.124913 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:34.125432 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
...
<SNIP> During this period HA router with IP 169.254.192.1 was acting as master.
...
10:26:06.943902 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:08.944739 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:08.946743 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:08.947452 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:10.949838 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 10, authtype none, intvl 2s, length 20
10:26:12.950664 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 10, authtype none, intvl 2s, length 20
10:26:14.952486 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 10, authtype none, intvl 2s, length 20

Note: When we lower the priority for a backup HA router and SIGHUP, as expected, it remains in the backup state.

@Assaf, I had a look at this and following are my observations.

In the latest code, when we set the admin_state of an agent to False, HA router on that agent is getting deleted.
In such situations, one of the backup HA Routers is taking over the role of Master. 
So, I hope the requirements of the Bug are met and it is not applicable with the present code. 
If my understanding is wrong, please let me know.

OTOH, I wanted to see the behavior of keepalived (version v1.2.16) when we update the priority and SIGHUP the process. 
I see that when we lower the priority of the master HA Router and SIGHUP, it continues to serve the role of Master* even with the lower priority. 
This is a bit strange.

[*] with some small outage (for few seconds) when one of the backup routers takeover the role of Master and goes back to Backup.

10:25:25.316856 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:27.317914 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:34.123811 IP 169.254.192.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:34.124913 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:25:34.125432 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
...
<SNIP> During this period HA router with IP 169.254.192.1 was acting as master.
...
10:26:06.943902 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:08.944739 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:08.946743 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:08.947452 IP 169.254.192.1 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 50, authtype none, intvl 2s, length 20
10:26:10.949838 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 10, authtype none, intvl 2s, length 20
10:26:12.950664 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 10, authtype none, intvl 2s, length 20
10:26:14.952486 IP 169.254.192.3 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 10, authtype none, intvl 2s, length 20

Note: When we lower the priority for a backup HA router and SIGHUP, as expected, it remains in the backup state.

Revision history for this message

Assaf Muller (amuller) wrote on 2015-07-26:

#10

Changed to fix committed according to Sridhar's comment.

@Sridhar: We use no-preemption so I don't think that priority settings are being taken in to account. If you'd like to fiddle around with it locally you can set pre-emption in the keepalived.conf template we use then play around with the priorities.

Changed in neutron:
status:	New → Fix Committed
description:	updated

Revision history for this message

Sridhar Gaddam (sridhargaddam) wrote on 2015-07-27:

#11

Yes @Assaf, I agree with you. The no-preempt flag would have an effect on the priority settings.

Doug Hellmann (doug-hellmann) on 2015-07-29

Changed in neutron:
milestone:	none → liberty-2
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-10-15

Changed in neutron:
milestone:	liberty-2 → 7.0.0

neutron

Allow an admin to evacuate a L3 agent from HA routers

Bug Description

Other bug subscribers

Remote bug watches