Canonical Juju

enable-ha using existing machines breaks agents

Bug #1642618 reported by Mick Gregg on 2016-11-17

10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Andrew Wilkins	Canonical Juju 2.3-rc1

Bug Description

When `enable-ha` is passed existing juju machines to use as new controllers/mongodb hosts, the agents on those machines break with a panic trying to read from a nil state field.

To reproduce
1. Add two new machines to a newly bootstrapped environment (`juju add-machine`)
2. Enable HA using these two new machines (`juju enable-ha --to 1,2`)
  * The Juju agents on these two new machines will have stopped, or at least not have restarted
  * juju-mongodb* will not be installed on the new machines, though rs.config() will show them expected in the replica set and unreachable
3. Manually restart the agents on the new machines
  * The agents will start and run for a while
  * juju-mongodb* will be installed on each new node and rs.config() will show a good replica set
  * The agents will soon die with a panic in the machine log when MongoInfo() fails to access a state object
  * agent.conf will be updated on the new machines, but not with `stateaddresses` or `statepassword`

I've seen this using the manual provider and reproduced using gce with a build from ef17f71281d245540a9e5ed54a00095610ac0797.

As a work-around, adding the `stateaddresses` (`- localhost:37017`) and `statepassword` (same value as apipassword) fields to agent.conf before restarting the agents sees them restart without error.

See original description

Tags:

Mick Gregg (macgreagoir) on 2016-11-18

description:

updated

Anastasia (anastasia-macmood) on 2016-11-21

Changed in juju:
status:	New → Triaged
importance:	Undecided → Medium
milestone:	none → 2.2.0

Curtis Hovey (sinzui) on 2016-11-22

tags:

added: enable-ha

Revision history for this message

Mick Gregg (macgreagoir) wrote on 2016-12-08:

#1

Related to bug 1648375.

Anastasia (anastasia-macmood) on 2017-03-23

Changed in juju:
milestone:	2.2-beta1 → none

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2017-10-17:

#2

Not sure why this was dropped, this means you cannot enable HA when using the manual provider.

Changed in juju:
importance:	Medium → High
milestone:	none → 2.3.0

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2017-10-17:

#3

Seems to be worse now, or maybe there's a race. In my case, adding stateaddresses and statepassword is not enough, because the replicaset has not been initialised.

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2017-10-17:

#4

Mongo HA was "fixed" in https://github.com/juju/juju/commit/a9fb7fe787cae8db3a7652250ad9172016a37f2a, breaking this particular scenario.

Revision history for this message

Tim Penhey (thumper) wrote on 2017-11-06:

#5

Manual provisioning is how we are handling some s390 work, so we should fix this.

Changed in juju:
milestone:	2.3.0 → 2.3-rc1

Andrew Wilkins (axwalk) on 2017-11-10

Changed in juju:
assignee:	nobody → Andrew Wilkins (axwalk)
status:	Triaged → In Progress

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2017-11-14:

#6

I was wrong in comment #4, that's not a problem AFAICT. The peergrouper is failing to add the new machines to the replica set configuration:

2017-11-14 03:45:19 ERROR juju.worker.peergrouper worker.go:211 cannot set replicaset: Found two member configurations with same host field, members.1.host == members.2.host == 127.0.0.1:37017

Something is not filtering out localhost addresses.

Revision history for this message

Andrew Wilkins (axwalk) wrote on 2017-11-14:

#7

https://github.com/juju/juju/pull/8065
https://github.com/juju/juju/pull/8066

John A Meinel (jameinel) on 2017-11-14

Changed in juju:
status:	In Progress → Fix Committed

Anastasia (anastasia-macmood) on 2017-12-18

Changed in juju:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.