Bug #1510651 “Agents are “lost” after terminating a state server...” : Bugs : juju-core

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-10-27:

#1

Also, even though machine 0 is stopped, it still shows as "started" in `juju status`

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-10-27:

#2

machine-2.log Edit (61.3 KiB, text/plain)

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-10-27:

#3

unit-wordpress-0.log Edit (212.3 KiB, text/plain)

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Mark Ramm (mark-ramm) wrote on 2015-10-27:

#4

I think this should be assigned to 1.25.1 and prioritized accordingly. HA issues are critical.

I also think there needs to be a test case for this in our CI infrastructure.

Cheryl Jennings (cherylj) on 2015-10-27

Changed in juju-core:
importance:	High → Critical
milestone:	none → 1.25.1
tags:	added: bug-squad ensure-availability

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-10-28:

#5

In recreating with 1.24.7, the agents seem to eventually connect in my small deployment, but it took over 10 minutes.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-10-28:

#6

1.24.7-logs.tar Edit (6.1 MiB, application/x-tar)

Revision history for this message

John A Meinel (jameinel) wrote on 2015-10-28:

#7

So is this killing the entire machine-0, or is it just stopping the API server?
If you stop a Juju API server without stopping its associated Mongo instance than we are in "known broken but not fixable right now" territory. The way we select the Master Juju server is as the one that has a local connection to the Mongo Master. (We did this to avoid writing our own election mechanism, and because it gives us DB level guarantees that if we stop being the master, we'll stop being able to make changes to the DB on that Local connection.)

Revision history for this message

Andrew McDermott (frobware) wrote on 2015-10-28:

#8

James: let's focus on 1.24 first.

Changed in juju-core:
assignee:	nobody → James Tunnicliffe (dooferlad)

Revision history for this message

godleon (godleon) wrote on 2015-10-29:

#9

Juju logs from all machines but machine 0 Edit (6.6 MiB, application/x-tar)

Hi Canonical friends,

I am Leon from QCT Taiwan, I deployed OpenStack in my environment to reproduce the issue.

I had 3 state server(machine 0, 4, 5), they are also my controller server in OpenStack.

I shutdown the bootstrap server(machine 0) around 2015-10-28 08:47, then Juju took about 1~2 minutes to get back, but I saw a lot of lost agent status in Juju status.

I originally thought lost status should only be in the services of machine 0, but it seems not. Almost all services in all machines are impacted by the failure of bootstrap server.

I attached the logs in this comment, you can check it from "2015-10-28 08:49" to see what's happened when bootstrap server failed in my environment.

By the way, I made all my test on physical servers with Juju 1.24.6 and MAAS 1.8.2 in dedicated network.

Revision history for this message

James Tunnicliffe (dooferlad) wrote on 2015-10-30:

#10

Just tried this on EC2 with 1.24:

juju bootstrap
juju deploy ubuntu
juju ensure-availability -n 3

An approximate timeline after this point:
0 minutes:
- terminate machine 0 from EC2 console
2 minutes:
- juju status returns a status
- State server (machine 3) agent-status: started
- State server (machine 2) agent-status down
- ubuntu charm agent-status: lost
5 minutes:
- ubuntu charm agent-status: idle
- State server (machine 3) agent-status: started
- State server (machine 2) agent-status down
7 minutes:
- Everything OK (apart from machine 0, which we terminated).

I didn't need to bounce any processes. It would be better if this happened in seconds rather than minutes though.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-10-30:

#11

Bouncing jujud in my first comment was necessary for the unit agents under 1.26. There are logs from the original environment in comment #9

Revision history for this message

James Tunnicliffe (dooferlad) wrote on 2015-10-30:

#12

Using 1.26 I did "juju ssh 0 sudo halt -p" and machine 0 is still showing up in the status output as started, but EC2 shows it as stopped. Interesting.

Didn't need to do anything other than wait for the Ubuntu charm to get back to idle.

Strange, but machine 0 is reporting as started >10 minutes after the machine stopped.

Revision history for this message

James Tunnicliffe (dooferlad) wrote on 2015-10-30:

#13

juju set-env logging-config=juju.state.presence=TRACE
juju debug-log
<snip>
machine-2: 2015-10-30 14:10:52 TRACE juju.state.presence presence.go:671 pinging "u#ubuntu/0" with seq=36
machine-2: 2015-10-30 14:10:52 TRACE juju.state.presence presence.go:671 pinging "m#3" with seq=37
machine-2: 2015-10-30 14:10:53 TRACE juju.state.presence presence.go:671 pinging "m#1" with seq=38
machine-2: 2015-10-30 14:11:04 TRACE juju.state.presence presence.go:671 pinging "m#2" with seq=30
machine-2: 2015-10-30 14:11:08 TRACE juju.state.presence presence.go:671 pinging "m#2" with seq=31
machine-1: 2015-10-30 14:11:05 TRACE juju.state.presence presence.go:671 pinging "m#1" with seq=39
machine-2: 2015-10-30 14:11:15 TRACE juju.state.presence presence.go:671 pinging "m#3" with seq=37
machine-2: 2015-10-30 14:11:15 TRACE juju.state.presence presence.go:671 pinging "u#ubuntu/0" with seq=36
machine-2: 2015-10-30 14:11:16 TRACE juju.state.presence presence.go:671 pinging "m#1" with seq=38
machine-1: 2015-10-30 14:11:28 TRACE juju.state.presence presence.go:671 pinging "m#1" with seq=39
machine-2: 2015-10-30 14:11:27 TRACE juju.state.presence presence.go:671 pinging "m#2" with seq=30
machine-2: 2015-10-30 14:11:31 TRACE juju.state.presence presence.go:671 pinging "m#2" with seq=31
machine-2: 2015-10-30 14:11:38 TRACE juju.state.presence presence.go:671 pinging "m#3" with seq=37

Looks like the pinger isn't trying to ping machine 0. Odd.

Revision history for this message

John A Meinel (jameinel) wrote on 2015-10-31:

#14

There was work being done (I think in 1.26) to move status into part of the machine document so that it could be included in the All watcher. (It may not have been done but it was discussed)
IIRC, it was the responsibility of the API server that a machine connected to, to notice when the connection was down and change the machine doc.
However, an API server probably only talks to itself. And with it dead nobody else is responsible for marking it dead.

That could explain 1.26 vs 1.24

Revision history for this message

John A Meinel (jameinel) wrote on 2015-10-31:

#15

As for the times listed in James comment,
2 min till status: AFAIK we connect to mongo with consistency STRONG. Which means we can only read from the master. Which means while mongo is reestablishing a master there is nobody we can read from.
I'm not positive if you can read from a slave if there is no master in other circumstances, we could investigate.
But also last login wants to record the connection, so we do want a (soft?) write.
As for why machine 2 stayed down so long and agent was lost, I don't have specific insights.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2015-11-01:

#16

John - RE: comment #14, the discussions around machine presence haven't been implemented yet, so that code hasn't changed.

Revision history for this message

William Reade (fwereade) wrote on 2015-11-02:

#17

Just to be clear: this information may be associated with the machine, but that's not a good reason to put it into the machine document. The AllWatcher can watch different collections and compose the changes into client-facing "docs" just fine; but every time we add a new write pattern for a mongodb document we (1) make O(N^2) transactions ever more likely and (2) make every other watcher for that document less efficient (perhaps significantly so at scale).

Revision history for this message

James Tunnicliffe (dooferlad) wrote on 2015-11-05:

#18

Download full text (4.1 KiB)

I have done some tests with a very simple set up that shows the "agent is lost, sorry!" message for somewhere between two and twenty minutes. I am still looking for the cause of this variability. It has always eventually settled down for me though.

When Juju loses a state server the units that are connected to it retry connecting and eventually give up and restart, triggering a reconnect to all state servers.

When you can't see state from "juju status" that is because your client is trying to talk to the dead state server. Once it works this out and tries another, that is when you get your status information.

The problem is that if you don't re-run 'juju ensure-availability -n 3' you don't get a replacement state server and Juju won't do this automatically. I think what is happening is:

worker tries to talk to start a watcher that is trying to watch a machine that is down.
watcher eventually gives up with 'ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down'
next, with the api as an example:
2015-10-28 09:29:39 INFO juju.worker runner.go:275 stopped "api", err: watcher has been stopped
2015-10-28 09:29:42 DEBUG juju.worker runner.go:203 "api" done: watcher has been stopped
2015-10-28 09:29:45 ERROR juju.worker runner.go:223 exited "api": watcher has been stopped
2015-10-28 09:29:48 INFO juju.worker runner.go:261 restarting "api" in 3s
2015-10-28 09:29:54 INFO juju.worker runner.go:269 start "api"
2015-10-28 09:29:57 INFO juju.api apiclient.go:331 dialing "wss://10.10.197.27:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:29:57 INFO juju.api apiclient.go:331 dialing "wss://10.10.197.29:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:29:57 INFO juju.api apiclient.go:331 dialing "wss://10.10.197.30:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:30:03 INFO juju.api apiclient.go:339 error dialing "wss://10.10.197.27:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api": websocket.Dial wss://10.10.197.27:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api: dial tcp 10.10.197.27:17070: no route to host
2015-10-28 09:30:03 INFO juju.api apiclient.go:263 connection established to "wss://10.10.197.29:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:30:12 DEBUG juju.worker runner.go:196 "api" started

Of course, since there is nothing on the end of 10.10.197.27 we get 'no route to host'. As machines 4 and 5 (the other state servers) still think machine 0 is in the peer group I imagine this is why we see this repeatedly in the logs (this is from machine-4.log):

2015-10-28 09:19:08 INFO juju.state.watcher watcher.go:121 watcher loop failed: watcher iteration error: EOF
2015-10-28 09:19:08 ERROR juju.worker.peergrouper worker.go:138 peergrouper loop terminated: tomb: dying
2015-10-28 09:19:08 DEBUG juju.cmd.jujud machine.go:1604 worker "peergrouper" exited with watcher iteration error: EOF
2015-10-28 09:19:08 INFO juju.worker runner.go:275 stopped "peergrouper", err: watcher iteration error: EOF
2015-10-28 09:19:08 DEBUG juju.cmd.jujud machine.go:1604 worker "certupdater" exited with watcher iterat...

I have done some tests with a very simple set up that shows the "agent is lost, sorry!" message for somewhere between two and twenty minutes. I am still looking for the cause of this variability. It has always eventually settled down for me though.

When Juju loses a state server the units that are connected to it retry connecting and eventually give up and restart, triggering a reconnect to all state servers.

When you can't see state from "juju status" that is because your client is trying to talk to the dead state server. Once it works this out and tries another, that is when you get your status information.

The problem is that if you don't re-run 'juju ensure-availability -n 3' you don't get a replacement state server and Juju won't do this automatically. I think what is happening is:

worker tries to talk to start a watcher that is trying to watch a machine that is down.
watcher eventually gives up with 'ERROR juju.api.watcher watcher.go:84 error trying to stop watcher: connection is shut down'
next, with the api as an example:
2015-10-28 09:29:39 INFO juju.worker runner.go:275 stopped "api", err: watcher has been stopped
2015-10-28 09:29:42 DEBUG juju.worker runner.go:203 "api" done: watcher has been stopped
2015-10-28 09:29:45 ERROR juju.worker runner.go:223 exited "api": watcher has been stopped
2015-10-28 09:29:48 INFO juju.worker runner.go:261 restarting "api" in 3s
2015-10-28 09:29:54 INFO juju.worker runner.go:269 start "api"
2015-10-28 09:29:57 INFO juju.api apiclient.go:331 dialing "wss://10.10.197.27:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:29:57 INFO juju.api apiclient.go:331 dialing "wss://10.10.197.29:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:29:57 INFO juju.api apiclient.go:331 dialing "wss://10.10.197.30:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:30:03 INFO juju.api apiclient.go:339 error dialing "wss://10.10.197.27:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api": websocket.Dial wss://10.10.197.27:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api: dial tcp 10.10.197.27:17070: no route to host
2015-10-28 09:30:03 INFO juju.api apiclient.go:263 connection established to "wss://10.10.197.29:17070/environment/f388baa3-77af-4b05-8f0f-5f277e626e58/api"
2015-10-28 09:30:12 DEBUG juju.worker runner.go:196 "api" started

Of course, since there is nothing on the end of 10.10.197.27 we get 'no route to host'. As machines 4 and 5 (the other state servers) still think machine 0 is in the peer group I imagine this is why we see this repeatedly in the logs (this is from machine-4.log):

2015-10-28 09:19:08 INFO juju.state.watcher watcher.go:121 watcher loop failed: watcher iteration error: EOF
2015-10-28 09:19:08 ERROR juju.worker.peergrouper worker.go:138 peergrouper loop terminated: tomb: dying
2015-10-28 09:19:08 DEBUG juju.cmd.jujud machine.go:1604 worker "peergrouper" exited with watcher iteration error: EOF
2015-10-28 09:19:08 INFO juju.worker runner.go:275 stopped "peergrouper", err: watcher iteration error: EOF
2015-10-28 09:19:08 DEBUG juju.cmd.jujud machine.go:1604 worker "certupdater" exited with watcher iteration error: EOF
2015-10-28 09:19:08 DEBUG juju.worker runner.go:203 "peergrouper" done: watcher iteration error: EOF
2015-10-28 09:19:08 INFO juju.worker runner.go:275 stopped "certupdater", err: watcher iteration error: EOF
2015-10-28 09:19:08 INFO juju.cmd.jujud util.go:139 error pinging *state.State: EOF
2015-10-28 09:19:08 ERROR juju.worker runner.go:212 fatal "peergrouper": watcher iteration error: EOF
2015-10-28 09:19:08 DEBUG juju.worker runner.go:248 killing "env worker manager"
2015-10-28 09:19:08 DEBUG juju.worker runner.go:248 killing "restore"
2015-10-28 09:19:08 DEBUG juju.worker runner.go:248 killing "apiserver"

Since the apiserver died (last line) all the clients will eventually time out and restart.

What we seem to need to do is somehow demote dead machines from the peer group by looking at the mongo replica set so machines don't try and connect to an API server that we know isn't there, or at least suspect isn't healthy right now.

James Tunnicliffe (dooferlad) on 2015-11-06

Changed in juju-core:
status:	Triaged → In Progress

Revision history for this message

Katherine Cox-Buday (cox-katherine-e) wrote on 2015-11-06:

#19

James has said we should look at the peer grouper.

Revision history for this message

Katherine Cox-Buday (cox-katherine-e) wrote on 2015-11-06:

#20

James also has some python scripts he will be uploading to do correlation between logs & times things are happening.

Revision history for this message

Katherine Cox-Buday (cox-katherine-e) wrote on 2015-11-06:

#21

Need to validate whether this is happening in 1.26.

Revision history for this message

James Tunnicliffe (dooferlad) wrote on 2015-11-06:

#22

The scripts I have hacked together to debug this are here: https://github.com/dooferlad/jujuWand/tree/master

I will try not to make them any more broken or less commented! What I have now is at rev 2 in case you need to rewind to this point in time.

James Tunnicliffe (dooferlad) on 2015-11-09

Changed in juju-core:
assignee:	James Tunnicliffe (dooferlad) → nobody

Wayne Witzel III (wwitzel3) on 2015-11-09

Changed in juju-core:
assignee:	nobody → Wayne Witzel III (wwitzel3)

Revision history for this message

Wayne Witzel III (wwitzel3) wrote on 2015-11-11:

#23

I verified this is happening in 1.26 as well. I am able to reproduce using the lxd provider which speeds up the process of iterating and debugging the deployments.

Cheryl Jennings (cherylj) on 2015-11-12

Changed in juju-core:
milestone:	1.25.1 → 1.25.2

Tim Penhey (thumper) on 2015-11-23

Changed in juju-core:
assignee:	Wayne Witzel III (wwitzel3) → Tim Penhey (thumper)

Revision history for this message

Tim Penhey (thumper) wrote on 2015-11-24:

#24

I have replicated this on 1.26 with lxd with the following observations:

* firstly, after about 20 - 30 minutes it appears to have all settled itself down (not actually true)
* I killed machine-0 and it always shows as started in status, never down
* both machines 3 and 4 (new state server machines) only list the machine-0 address for state addresses
- something running in those agents is attempting to connect to just the down machine-0 address for mongo
* the machine agent for machine-1 never shown as lost, but for some reason took 15 minutes to return from dialing the API server addresses.

2015-11-24 01:30:58 INFO juju.worker runner.go:269 start "api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.49:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.202:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.218:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:31:01 INFO juju.api apiclient.go:484 error dialing "wss://10.0.3.49:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api": websocket.Dial wss://10.0.3.49:17070/environment/a0b3e91b-e80f
-4ac8-8264-3b250e8f7599/api: dial tcp 10.0.3.49:17070: getsockopt: no route to host
2015-11-24 01:44:25 INFO juju.api apiclient.go:269 connection established to "wss://10.0.3.202:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"

* the dependency engine used by the unit agent seemed to get itself into a pathological state, and took some time to reconnect everything when the server went away.
* the unit agent log file has timestamps recorded out of order (weird) - although it appears that this is only during the time the machine-0 got stopped
* it appears that as the uniter restarted in the unit agent, it re-ran the config-changed hook
* the "rsyslog-config-updater" worker is stuck restarting as it is only trying to look at the old machine-0 address

I have replicated this on 1.26 with lxd with the following observations:

* firstly, after about 20 - 30 minutes it appears to have all settled itself down (not actually true)
* I killed machine-0 and it always shows as started in status, never down
* both machines 3 and 4 (new state server machines) only list the machine-0 address for state addresses
  - something running in those agents is attempting to connect to just the down machine-0 address for mongo
* the machine agent for machine-1 never shown as lost, but for some reason took 15 minutes to return from dialing the API server addresses.

2015-11-24 01:30:58 INFO juju.worker runner.go:269 start "api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.49:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.202:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:30:58 INFO juju.api apiclient.go:476 dialing "wss://10.0.3.218:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"
2015-11-24 01:31:01 INFO juju.api apiclient.go:484 error dialing "wss://10.0.3.49:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api": websocket.Dial wss://10.0.3.49:17070/environment/a0b3e91b-e80f
-4ac8-8264-3b250e8f7599/api: dial tcp 10.0.3.49:17070: getsockopt: no route to host
2015-11-24 01:44:25 INFO juju.api apiclient.go:269 connection established to "wss://10.0.3.202:17070/environment/a0b3e91b-e80f-4ac8-8264-3b250e8f7599/api"

* the dependency engine used by the unit agent seemed to get itself into a pathological state, and took some time to reconnect everything when the server went away.
* the unit agent log file has timestamps recorded out of order (weird) - although it appears that this is only during the time the machine-0 got stopped
* it appears that as the uniter restarted in the unit agent, it re-ran the config-changed hook
* the "rsyslog-config-updater" worker is stuck restarting as it is only trying to look at the old machine-0 address

Revision history for this message

Tim Penhey (thumper) wrote on 2015-11-24:

#25

OK, well at least I have eliminated one problem. The log elements out of order seem to be somehow related to the combination of juju ssh, less, and the lxd machine.

Copying the file from the lxd machine to the host shows that the log file is in fact in the correct order.

Which is good, because I was really beginning to wonder WTF was going on.

Cheryl Jennings (cherylj) on 2015-11-24

Changed in juju-core:
milestone:	1.25.2 → 1.25.3

Revision history for this message

Tim Penhey (thumper) wrote on 2015-11-26:

#26

The fundamentals here is that it appears that once the machine dies, the rest of Juju doesn't realise. This seems to be the foundation upon which the other failures are layered.

I'm looking into the instance poller and presence code to work out where the failure is.

Curtis Hovey (sinzui) on 2015-12-18

Changed in juju-core:
milestone:	1.25.3 → 2.0-alpha1

Cheryl Jennings (cherylj) on 2016-01-04

Changed in juju-core:
milestone:	2.0-alpha1 → 2.0-alpha2

Tim Penhey (thumper) on 2016-01-25

Changed in juju-core:
assignee:	Tim Penhey (thumper) → nobody

Cheryl Jennings (cherylj) on 2016-01-25

Changed in juju-core:
status:	In Progress → Triaged
milestone:	2.0-alpha2 → 2.0-alpha3

Cheryl Jennings (cherylj) on 2016-02-29

Changed in juju-core:
milestone:	2.0-alpha3 → 2.0-beta4

Cheryl Jennings (cherylj) on 2016-03-29

Changed in juju-core:
milestone:	2.0-beta4 → none
importance:	Critical → High

Revision history for this message

Benjamin Kaehne (ben-kaehne) wrote on 2016-06-28:

#27

This bug is also affecting me with juju 1.25.5 deploying xenial.

I notice for some reason or another. The initial state server will become unreachable; agents will be lost and when leadership elections performed for other parts in the env, various machine units will hang.

I never see the machine units try to connect to the other state servers.

Benjamin Kaehne (ben-kaehne) on 2016-07-07

tags:

added: canonical-bootstack

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

affects:

juju-core → juju

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

Changed in juju-core:
importance:	Undecided → High
status:	New → Triaged

Anastasia (anastasia-macmood) on 2016-09-01

Changed in juju-core:
status:	Triaged → Won't Fix
Changed in juju:
milestone:	none → 2.0-beta18

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2016-09-08:

#28

According to QA, this has been fixed: http://reports.vapour.ws/releases/4355/job/functional-ha-recovery/attempt/3650

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2016-09-08:

#29

QA confirm (from balloons), fix on master: http://reports.vapour.ws/releases/4362/job/functional-ha-recovery/attempt/3657

Changed in juju:
status:	Triaged → Fix Committed

Curtis Hovey (sinzui) on 2016-09-09

Changed in juju:
status:	Fix Committed → Fix Released

	Status	Importance	Assigned to	Milestone
Canonical Juju	Fix Released	High	Unassigned	Canonical Juju 2.0-beta18
juju-core	Won't Fix	High	Unassigned
1.25	Fix Released	Critical	Unassigned	juju-core 1.25.7

juju-core

Agents are "lost" after terminating a state server in an HA env

Bug Description

Other bug subscribers

Related blueprints

Bug attachments

Remote bug watches