Juju agent in a "failed" state after machine reboot on some charms

Bug #1649637 reported by Pen Gale
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

We ran into this while building and testing Matrix, but it appears to be reproducible outside of it.

To reproduce:

1) Deploy wiki-simple (or just mysql or the wiki charm).
2) juju run 'sudo reboot' on one of the units.
3) Note that the controller marks the unit as being in a "failed" state.

This happens even in the latest Juju 2.1 beta.

The interesting thing is that it doesn't happen for all charms. The latest 'zookeeper' charm from the store reboots without incident, for example. There are also no big obvious errors in the machine logs for the unit that the controller has marked as failed -- the agent appears to be in good shape, despite the controller thinking that it has failed.

Revision history for this message
Pen Gale (pengale) wrote :

Controller logs from a repro of this bug, for reference: http://paste.ubuntu.com/23624621/

Revision history for this message
Anastasia (anastasia-macmood) wrote :

If it does not happen for all the charms, only for latest 'zookeeper' one, maybe the problem is not with juju but with the charm?

Changed in juju-core:
status: New → Triaged
status: Triaged → Incomplete
Revision history for this message
Cory Johns (johnsca) wrote :

Anastasia,

That's backwards; it's fine on ZK but fails for both mysql and mediawiki. It's noteworthy that both mediawiki and mysql are trusty, while ZK is xenial. To test that further, I deployed the same charm, cs:ubuntu-8, on both xenial and trusty, and ran `juju run $unit 'sudo reboot'` on both, and the trusty version failed while the xenial version did not.

ubuntu-trusty/0* active failed 4 10.171.42.173 ready
ubuntu-xenial/0* active idle 3 10.171.42.78 ready

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Re-targeted to "juju" project that tracks Juju 2.x issues.

Changed in juju-core:
status: Incomplete → Triaged
importance: Undecided → High
no longer affects: juju-core
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.2.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

Could you please supply output of juju status? Is it agent or charm status reporting failure?

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Cory Johns (johnsca) wrote :

Full output of `juju status --format=yaml`: http://pastebin.ubuntu.com/23626030/

It's the juju-status field, and the message is "resolver loop error"

Changed in juju:
status: Incomplete → Triaged
Cory Johns (johnsca)
tags: added: matrix
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta1 → 2.2-beta2
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.2-beta2 → 2.2-beta3
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta4
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Revision history for this message
Tim Penhey (thumper) wrote :

Removing the milestone to stop this just getting punted.

How often does it happen? I'm trying to get an understanding of frequency.

tags: added: agents resolver uniter
Changed in juju:
importance: High → Medium
milestone: 2.2-rc1 → none
Revision history for this message
Tim Penhey (thumper) wrote :

Possibly related to bug 1694734

Revision history for this message
Tim Penhey (thumper) wrote :

Actually I'm pretty sure this is related, almost certainly timing related which is why it sometimes appears. Making this bug as the dupe as the other has more info.

Revision history for this message
Cory Johns (johnsca) wrote :

I can reproduce it with 100% consistency (or near enough that I can't tell) on trusty units with the lxd provider, but it does only happen with `juju run` and not `juju ssh`, so it being related to action status transactions seems reasonable.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.