Bug #1511659 “Destroyed leader, new leader not elected.” : Bugs : juju-core

Revision history for this message

Stuart Bishop (stub) wrote on 2015-10-30:

#1

all-machines.log Edit (16.5 MiB, text/plain)

Revision history for this message

Stuart Bishop (stub) wrote on 2015-10-30:

#2

The attached log came from the lxc run at http://reports.vapour.ws/charm-test-details/charm-bundle-test-parent-3201

I believe the same failure is happening with all the providers (same test failure - service gives up waiting for a master to appear), but have not trawled through their logs to confirm.

Revision history for this message

Stuart Bishop (stub) wrote on 2015-10-30:

#3

At 2015-10-29 03:59:44 you can see the leader-elected hook running when the service is initially setup, which confirms that it is wired up correctly and can be executed.

Cheryl Jennings (cherylj) on 2015-10-30

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Stuart Bishop (stub) wrote on 2015-11-03:

#4

I also have a staging Cassandra environment in this state (OpenStack controller, dse nodes manually provisioned, older version of Juju). I had two nodes in the service, and dropped one due to hardware failure. Now I only have dse/1 left, but it is not the leader:

$ juju run --unit=dse/1 is-leader
False
$ juju --version
1.24.4-trusty-amd64

Revision history for this message

David Ames (thedac) wrote on 2015-11-03:

#5

rabbitmq-server-no-leader-elected.tar.gz Edit (53.4 KiB, application/x-tar)

I have also seen this with rabbitmq-server.

Even after adding a new node, no election takes place. I have left this over a 12 hour period (in case leadership elections take place over a longer period of time) with no election taking place.

I have observed this with wily juju-core 1.24.6-0ubuntu3 and trusty juju-core 1.25.0-0ubuntu1~14.04.1~juju1.

Note this issue is intermittent. Occasionally a new leader is elected, but often one is not.

Process to recreate:
Deploy 3 nodes of a charm with peer relationship
Determine the leader node
Destroy the leader node
Check for a leader node

Example with rabbitmq-server:
Juju status: http://pastebin.ubuntu.com/13093778/

ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "4"
  Stdout: |
    True
  UnitId: rabbitmq-server/0
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |
    False
  UnitId: rabbitmq-server/2

# rabbitmq-server/0 is the leader. Destroy it.
ubuntu@thedac-bastion:~/rabbitmq-server$ juju destroy-unit rabbitmq-server/0
ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |
    False
  UnitId: rabbitmq-server/2
# No leader exists

ubuntu@thedac-bastion:~/rabbitmq-server$ juju add-unit rabbitmq-server
ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |

False
UnitId: rabbitmq-server/2
# No leader exists

ubuntu@thedac-bastion:~/rabbitmq-server$ juju add-unit rabbitmq-server
ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |
    False
  UnitId: rabbitmq-server/2
- MachineId: "7"
  Stdout: |
    False
  UnitId: rabbitmq-server/3
# No leader exists

I have also seen this with rabbitmq-server.

Even after adding a new node, no election takes place. I have left this over a 12 hour period (in case leadership elections take place over a longer period of time) with no election taking place.

I have observed this with wily juju-core 1.24.6-0ubuntu3 and trusty juju-core 1.25.0-0ubuntu1~14.04.1~juju1.

Note this issue is intermittent. Occasionally a new leader is elected, but often one is not.

Process to recreate:
Deploy 3 nodes of a charm with peer relationship
Determine the leader node
Destroy the leader node
Check for a leader node

Example with rabbitmq-server:
Juju status: http://pastebin.ubuntu.com/13093778/

ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "4"
  Stdout: |
    True
  UnitId: rabbitmq-server/0
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |
    False
  UnitId: rabbitmq-server/2

# rabbitmq-server/0 is the leader. Destroy it.
ubuntu@thedac-bastion:~/rabbitmq-server$ juju destroy-unit rabbitmq-server/0
ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |
    False
  UnitId: rabbitmq-server/2
# No leader exists

ubuntu@thedac-bastion:~/rabbitmq-server$ juju add-unit rabbitmq-server
ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |

False
  UnitId: rabbitmq-server/2
# No leader exists

ubuntu@thedac-bastion:~/rabbitmq-server$ juju add-unit rabbitmq-server
ubuntu@thedac-bastion:~/rabbitmq-server$ juju run --service rabbitmq-server is-leader
- MachineId: "5"
  Stdout: |
    False
  UnitId: rabbitmq-server/1
- MachineId: "6"
  Stdout: |
    False
  UnitId: rabbitmq-server/2
- MachineId: "7"
  Stdout: |
    False
  UnitId: rabbitmq-server/3
# No leader exists

Cheryl Jennings (cherylj) on 2015-11-10

Changed in juju-core:
milestone:	none → 1.26.0

Revision history for this message

Adam Collard (adam-collard) wrote on 2015-11-19:

#6

I think the logging request in https://bugs.launchpad.net/juju-core/+bug/1488166/comments/4 is still relevant

Cheryl Jennings (cherylj) on 2015-12-01

tags:

added: bug-squad leadership

Cheryl Jennings (cherylj) on 2015-12-07

Changed in juju-core:
milestone:	1.26.0 → 2.0-alpha2

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2016-01-12:

#7

I too have seen this with the latest Juju i.e. 1.25.0-0ubuntu1~14.04.1~juju1. I deployed 3 units of the swift-proxy charm with the openstack provider, powered off the leader unit and waited for a re-election to occur. Even after 15 minutes a new leader had not been elected (so there was no leader).

tags:

added: sts

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2016-01-12:

#8

As a follow on to my previous comment, here is what seems to be a failry reliable reproducer:

bzr branch https://code.launchpad.net/~openstack-charm-testers/+junk/swift-rings swift-test
cd swift-test
juju-deployer -c swift-next -d trusty-liberty

[wait for deployment to complete]

# work out who the leader is
juju run --service swift-proxy "is-leader"
...

# then poweroff the leader unit (assuming /0 here)
juju ssh swift-proxy/0 sudo poweroff

Now wait at least 60 seconds then re-run the is-leader check. When i do this, even after 15 minutes, I see that neither of the remaining units is now the leader.

Revision history for this message

Edward Hope-Morley (hopem) wrote on 2016-01-12:

#9

oh and enable debug logs prior to doing the test by doing:

juju set-env logging-config="<root>=DEBUG;"

Revision history for this message

Ursula Junque (ursinha) wrote on 2016-01-12:

#10

I can reproduce the issue consistently. Once I poweroff the leader, the other unit just won't show up as leader, even long after 60s.
Juju 1.25.0 in wily.

William Reade (fwereade) on 2016-01-12

Changed in juju-core:
assignee:	nobody → William Reade (fwereade)

Revision history for this message

Nate Finch (natefinch) wrote on 2016-01-15:

#11

I've done some testing with rabbitmq-server, and I keep getting errors from the hooks when I kill a unit. One time I got 'hook failed: "leader-elected"' and one time I got 'hook fail: "cluster-relation-changed"' ... so I'm not super confident about using rabbitmq as a test bed, if it can't even handle killing a unit.

I'll try it with another charm.

Dave Cheney (dave-cheney) on 2016-01-21

Changed in juju-core:
assignee:	William Reade (fwereade) → Dave Cheney (dave-cheney)

Dave Cheney (dave-cheney) on 2016-01-22

Changed in juju-core:
status:	Triaged → In Progress

Revision history for this message

Dave Cheney (dave-cheney) wrote on 2016-01-22:

#12

Fix out for review, http://reviews.vapour.ws/r/3605/

Revision history for this message

Dave Cheney (dave-cheney) wrote on 2016-01-25:

#13

Fix committed to master, will retest today and propose backport to 1.25 if successful

Changed in juju-core:
status:	In Progress → Fix Committed

Revision history for this message

Dave Cheney (dave-cheney) wrote on 2016-01-25:

#14

Due to various bugs it took all day to replicate the juju-deployer environment above, but I was able to reproduce the scenario and confirm that with this fix applied, a new leader is elected.

I will work on backporting this fix to 1.25.

Revision history for this message

Cheryl Jennings (cherylj) wrote on 2016-01-27:

#15

1.25 PR: https://github.com/juju/juju/pull/4212

Curtis Hovey (sinzui) on 2016-02-11

Changed in juju-core:
status:	Fix Committed → Fix Released

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

affects:	juju-core → juju
Changed in juju:
milestone:	2.0-alpha2 → none
milestone:	none → 2.0-alpha2

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-23

Changed in juju-core:
assignee:	nobody → Dave Cheney (dave-cheney)
importance:	Undecided → High
status:	New → Fix Released

juju-core

Destroyed leader, new leader not elected.

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
Canonical Juju	Fix Released	High	Dave Cheney	Canonical Juju 2.0-alpha2
juju-core	Fix Released	High	Dave Cheney
1.25	Fix Released	High	Dave Cheney	juju-core 1.25.4