l3-ha: a router can be stuck in the ALLOCATING state
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Medium
|
John Schwarz |
Bug Description
The scenario is a simple one: during the creation of a router, the server that deals with the request crashes after creating the router with the ALLOCATING state [1] but before it's changed to ACTIVE [2]. In this case, the router will be "stuck" in the ALLOCATING and the only admin action to change the router back to ACTIVE (and allow it to be scheduled to agents) is:
1. set admin-state-up to False
2. set ha to False
3. set ha to True
4. set admin-state-up to True
That is, a full migration of the HA router to legacy and back to HA is required. This will trigger the code in [3] and will fix this issue. However, these 4 steps aren't intuitive at all - why should a user re-set the router as an HA to solve a weird state of the router?
Skipping steps 2 and 3 (only re-setting the admin-state-up) won't work because, as mentioned before, the scheduling happens on steps 2 and 3 (i.e. when the router is set to ha=False it's unscheduled, and when it's set to ha=True it is scheduled as if it's a new router). In fact, this means that the problem is more severe: if the server crashed in the middle of setting up the resources of an HA router, all 4 steps must be done to ensure the router is made valid again.
The proposed solution is to add a new state, such that if admin-state-up is changed to False then the router's status will be changed to "DOWN" (as opposed to the current "ACTIVE", which doesn't make much sense since admin-state-up is False). This will help mitigate the "stuck ALLOCATING status" portion of the problem.
In addition to changing the status, we will need to change the logic such that a router is unscheduled on admin-state-
[1]: https:/
[2]: https:/
[3]: https:/
Changed in neutron: | |
assignee: | nobody → John Schwarz (jschwarz) |
description: | updated |
tags: | added: neutron-proactive-backport-potential |
Related fix proposed to branch: master /review. openstack. org/352081
Review: https:/