enable-ha using existing machines breaks agents

Bug #1642618 reported by Mick Gregg
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Andrew Wilkins

Bug Description

When `enable-ha` is passed existing juju machines to use as new controllers/mongodb hosts, the agents on those machines break with a panic trying to read from a nil state field.

To reproduce
1. Add two new machines to a newly bootstrapped environment (`juju add-machine`)
2. Enable HA using these two new machines (`juju enable-ha --to 1,2`)
  * The Juju agents on these two new machines will have stopped, or at least not have restarted
  * juju-mongodb* will not be installed on the new machines, though rs.config() will show them expected in the replica set and unreachable
3. Manually restart the agents on the new machines
  * The agents will start and run for a while
  * juju-mongodb* will be installed on each new node and rs.config() will show a good replica set
  * The agents will soon die with a panic in the machine log when MongoInfo() fails to access a state object
  * agent.conf will be updated on the new machines, but not with `stateaddresses` or `statepassword`

I've seen this using the manual provider and reproduced using gce with a build from ef17f71281d245540a9e5ed54a00095610ac0797.

As a work-around, adding the `stateaddresses` (`- localhost:37017`) and `statepassword` (same value as apipassword) fields to agent.conf before restarting the agents sees them restart without error.

Tags: enable-ha
Mick Gregg (macgreagoir)
description: updated
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 2.2.0
Curtis Hovey (sinzui)
tags: added: enable-ha
Revision history for this message
Mick Gregg (macgreagoir) wrote :

Related to bug 1648375.

Changed in juju:
milestone: 2.2-beta1 → none
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Not sure why this was dropped, this means you cannot enable HA when using the manual provider.

Changed in juju:
importance: Medium → High
milestone: none → 2.3.0
Revision history for this message
Andrew Wilkins (axwalk) wrote :

Seems to be worse now, or maybe there's a race. In my case, adding stateaddresses and statepassword is not enough, because the replicaset has not been initialised.

Revision history for this message
Andrew Wilkins (axwalk) wrote :

Mongo HA was "fixed" in https://github.com/juju/juju/commit/a9fb7fe787cae8db3a7652250ad9172016a37f2a, breaking this particular scenario.

Revision history for this message
Tim Penhey (thumper) wrote :

Manual provisioning is how we are handling some s390 work, so we should fix this.

Changed in juju:
milestone: 2.3.0 → 2.3-rc1
Andrew Wilkins (axwalk)
Changed in juju:
assignee: nobody → Andrew Wilkins (axwalk)
status: Triaged → In Progress
Revision history for this message
Andrew Wilkins (axwalk) wrote :

I was wrong in comment #4, that's not a problem AFAICT. The peergrouper is failing to add the new machines to the replica set configuration:

2017-11-14 03:45:19 ERROR juju.worker.peergrouper worker.go:211 cannot set replicaset: Found two member configurations with same host field, members.1.host == members.2.host == 127.0.0.1:37017

Something is not filtering out localhost addresses.

Revision history for this message
Andrew Wilkins (axwalk) wrote :
John A Meinel (jameinel)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.