juju add-unit often complains of "inconsistent state" when you need to replace a failed unit
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned | ||
juju-core |
Won't Fix
|
Undecided
|
Unassigned | ||
1.25 |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
One of the most common use cases for juju add-unit is when an existing unit is in an unrecoverable state (such as a problem with the compute node it is hosted on), and you want to take advantage of the cloud to just spin up a replacement. Unfortunately juju often refuses to add one for you because it's not happy with the state of the environment at that point:
ERROR cannot add unit 1/1 to service "example": cannot add unit to service "example": inconsistent state
Typically this operation is one you're performing under time pressure while a production service is down, and you are taking this action in order to repair a fault.
For a start, the error message suggests to the uninitiated that juju wants you to fix the unfixable unit before adding any more. It may be helpful to clarify what exactly it is that is inconsistent, and if waiting and trying again may help (it seems to in practice).
Second, it would be nice if this could be removed as a stumbling block to repairing production environments during crises.
tags: | added: canonical-is |
Changed in juju: | |
status: | Incomplete → New |
Changed in juju: | |
status: | New → Confirmed |
Changed in juju: | |
milestone: | 2.0.3 → 2.2.0 |
Changed in juju: | |
status: | Expired → Triaged |
tags: | added: add-unit |
@Nick Moffitt,
Could you please provide more information?
Reproducible scenario will go a long way too \o/
Were you running any commands before add-unit? If yes, what were they?
Does waiting and trying again help? If yes, for how long do you usually wait to try again?
Does it happen all the time? i.e. you run add-unit first time and it fails, you wait and run it again and it succeeds consistently?