juju add-unit often complains of "inconsistent state" when you need to replace a failed unit

Bug #1646385 reported by Nick Moffitt
40
This bug affects 8 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned
juju-core
Won't Fix
Undecided
Unassigned
1.25
Won't Fix
Undecided
Unassigned

Bug Description

One of the most common use cases for juju add-unit is when an existing unit is in an unrecoverable state (such as a problem with the compute node it is hosted on), and you want to take advantage of the cloud to just spin up a replacement. Unfortunately juju often refuses to add one for you because it's not happy with the state of the environment at that point:

    ERROR cannot add unit 1/1 to service "example": cannot add unit to service "example": inconsistent state

Typically this operation is one you're performing under time pressure while a production service is down, and you are taking this action in order to repair a fault.

For a start, the error message suggests to the uninitiated that juju wants you to fix the unfixable unit before adding any more. It may be helpful to clarify what exactly it is that is inconsistent, and if waiting and trying again may help (it seems to in practice).

Second, it would be nice if this could be removed as a stumbling block to repairing production environments during crises.

tags: added: canonical-is
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Nick Moffitt,

Could you please provide more information?
Reproducible scenario will go a long way too \o/

Were you running any commands before add-unit? If yes, what were they?
Does waiting and trying again help? If yes, for how long do you usually wait to try again?
Does it happen all the time? i.e. you run add-unit first time and it fails, you wait and run it again and it succeeds consistently?

Changed in juju:
status: New → Incomplete
Revision history for this message
Nick Moffitt (nick-moffitt) wrote :

I can't really say how to build up a reproducible situation, but this doesn't need a particularly large environment to have this. But if you just start up an environment in 1.25.6 or so and shut down a unit from outside juju, usually this happens (particularly if a hook failed earlier).

Yeah, usually if you keep retrying eventually it works. But we didn't know this until we got really desperate.

Jacek Nykis (jacekn)
Changed in juju:
status: Incomplete → New
Changed in juju:
status: New → Confirmed
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Nick,

As you have discovered this on Juju 1.x which is currently only open to Critical bugs, we will not be fixing it there. It's great that you have a workaround \o/

We will, however, fix it in Juju 2.x. Thank you for the additional info!

Changed in juju-core:
status: New → Won't Fix
Changed in juju:
status: Confirmed → Triaged
importance: Undecided → High
milestone: none → 2.0.4
milestone: 2.0.4 → 2.0.3
Changed in juju:
milestone: 2.0.3 → 2.2.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Nick Moffitt (nick-moffitt),
Could you please re-test with Juju 2.1?
We have addressed an issue with removing units/applications in error, in 2.1-rc1.

I think your experience would improve too \o/

Changed in juju:
status: Triaged → Incomplete
milestone: 2.2.0 → none
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
Paul Goins (vultaire) wrote :

On 2019-06-07, we encountered this when working on a customer's cloud, trying to add a new mysql unit to replace one which had failed. We got the same "inconsistent state" issues, and we were able to force it to go through by just continuing to retry. This was with juju client 2.6.3-trusty-amd64, juju model at 2.4.3.

That environment has since been upgraded, so I can't readily re-test this; just wanted to provide a data point that this issue might still be persisting to some degree.

Changed in juju:
status: Expired → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
David O Neill (dmzoneill) wrote :

Same

juju add-unit cinder --to lxd:1
ERROR cannot add unit 1/1 to application "cinder": cannot add unit to application "cinder": inconsistent state
juju add-unit cinder --to lxd:1
ERROR cannot add unit 1/1 to application "cinder": cannot add unit to application "cinder": inconsistent state
juju add-unit cinder --to lxd:1

Tim Penhey (thumper)
Changed in juju:
status: Expired → Triaged
tags: added: add-unit
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.