juju subordinate not allocating and can't be destroyed

Bug #1582463 reported by Brad Marshall
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
High
Unassigned
juju-core
Won't Fix
Undecided
Unassigned
1.25
Won't Fix
Undecided
Unassigned

Bug Description

I'm deploying using Juju 1.25.5 to Xenial via MAAS 1.9.2, and have had an issue where a subordinate charm didn't allocate.

After a conversation on #juju-dev it was suggested to try removing the parent charm and subordinate, and spin up a replacement. I've tried doing that, and have ended up with:

[Units]
ID WORKLOAD-STATE AGENT-STATE VERSION MACHINE PORTS PUBLIC-ADDRESS MESSAGE
infra/0 terminated executing 1.25.5 0 maas-infra.maas (stop)
  landscape-client-physical/5 unknown allocating maas-infra.maas Waiting for agent initialization to finish

Is there any way to destroy these units so I have a clean juju status? Ideally it'd be good to figure out why it didn't deploy, but I understand that may be harder to work out.

I've attached machine-0 and infra/0 logs, please let me know if you need any more.

$ dpkg-query -W maas
maas 1.9.2+bzr4568-0ubuntu1~trusty1

$ dpkg-query -W juju-core
juju-core 1.25.5-0ubuntu1~14.04.2~juju1

Revision history for this message
Brad Marshall (brad-marshall) wrote :
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.25.6
no longer affects: juju-core/2.0
Changed in juju-core:
milestone: 1.25.6 → none
Revision history for this message
Brad Marshall (brad-marshall) wrote :

FWIW, as per fwereade I've tried removing any trace of the busted subordinate and restarting both the machine-0 agent and the unit-infra-0 agent, neither seemed to help.

Revision history for this message
Benjamin Kaehne (ben-kaehne) wrote :

I also experienced this:

juju-core 1.25.5-0ubuntu3~14.04.1

I tried removing both parent and subordinate service, parent and subordinate unit all with no avail.

Eventually I was left with the parent unit in an executing (stop) state and the subordinate still waiting for agent initialisation.

Changed in juju-core:
milestone: none → 2.0-beta14
Curtis Hovey (sinzui)
Changed in juju-core:
milestone: 2.0-beta14 → 2.0-beta15
Changed in juju-core:
milestone: 2.0-beta15 → 2.0.0
affects: juju-core → juju
Changed in juju:
milestone: 2.0.0 → none
milestone: none → 2.0.0
Changed in juju-core:
status: New → Won't Fix
Revision history for this message
Alexis Bruemmer (alexis-bruemmer) wrote :

Please repo this issue on 2.0 and reopen if it is still an issue.

Changed in juju:
status: Triaged → Incomplete
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-rc3 → 2.0.0
Changed in juju:
milestone: 2.0.0 → none
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Changed in juju:
status: Expired → New
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

I have come across a similar phenomenon. I've added nrpe subordinates to a number of applications. On most applications this worked fine, except for a rabbitmq-server unit where nrpe subord got stuck in

waiting allocating x.x.x.x installing agent

I tried to remove the nrpe application subsequently. The subordinates for other units are gone; the stuck subordinate is still present though.

Full status

  nrpe-lxd:
    charm: local:xenial/nrpe-0
    series: xenial
    os: ubuntu
    charm-origin: local
    charm-name: nrpe
    charm-rev: 0
    exposed: false
    life: dying
    application-status:
      current: waiting
      message: installing agent

There are no agents in error state. The containers' machine log has no errors either.

Version
juju-2.0 1:2.1.3-0ubuntu1~16.04.1~juju1

Please shout if there's any logs you need

Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

I've now managed to remove the stuck subordinates.

I first tried to remove the principal units (rabbitmqs) but those were hanging as well. Only after also remove-machine --force on the principal containers subordinates and principals were removed.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Hi Peter - there've been a few problems with subordinates fixed recently, particularly when the subordinate was related to a number of different principal applications like you describe. Nothing specifically to do with units getting stuck at install, but they could have made it hard to remove the bad unit.

The fixes are in Juju 2.2.2 - could you try reproducing there to see if it's fixed in that version?

Related PRs:
https://github.com/juju/juju/pull/7369
https://github.com/juju/juju/pull/7590
https://github.com/juju/juju/pull/7603

Changed in juju:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.