multiple addresses in a space confuses peergrouper

Bug #1726317 reported by John A Meinel on 2017-10-23
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Joseph Phillips

Bug Description

Digging through issues at a client site going into HA, I think I've uncovered an issue with how the peergrouper determines what space to configure mongo.

The autodetection code is:
https://github.com/juju/juju/blob/f5763bb9ae7bafbb1f7ebec2f128eb11f981493b/worker/peergrouper/worker.go#L581

The issue there is that it iterates all addresses, and just increments the reference count for any space it finds.

However, if you had 1 machine with 2 addresses in the same space, and a second machine with none, it would end up thinking that the "LargestSpaceContainsAll" when it doesn't. And similarly, if any machine is counted more than once, it will think it *doesn't* contain all, when it really does.

We need a better check around what space to use.

John A Meinel (jameinel) wrote :

From what I could see of the concrete database records, there did not appear to actually be that problem. They all seemed to have 3 concrete addresses.

Maybe the issue is with unqualified spaces (""). Where '127.0.0.1' gets tied up with other local-only addresses.

At the very least, using 127.0.0.1 is *never* the correct address to use, because the other machines will never be able to contact you on that address. So even if we selected the "" space, we still should use an address that isn't localhost.

Changed in juju:
status: In Progress → Triaged
John A Meinel (jameinel) wrote :

The actual fix for them was to force the space with "mongo-space-name". Which we should make sure we can do after the fact.
Unfortunately we can't really block "juju enable-ha" if we want to use auto-detection, because the other machines won't be up for us to know how to auto-detect. Otherwise we could do something like "you must set mongo-space-name before you can enable-ha".

Tilman Baumann (tilmanbaumann) wrote :

juju status -m controller --format=yaml

tags: added: 4010
John A Meinel (jameinel) wrote :

Joe is currently working on this to change the 'enable-ha' code to refuse-to-guess (and do so incorrectly), and instead make it so
a) There is a "juju controller-config ha-space=XXX" that allows administrators to set what they want
b) We will allow 'juju enable-ha' to work if there is only a single cloud-local address to use, but
c) if there is >1, refuse to guess and require the administrator to specify the space to use for enable-ha.

We need these addresses for other purposes in 2.4 which is why the name is 'ha-space' rather than something about mongo.

Changed in juju:
assignee: John A Meinel (jameinel) → Joseph Phillips (manadart)
status: Triaged → In Progress
Joseph Phillips (manadart) wrote :

Mongo space name and status are are deprecated by https://github.com/juju/juju/pull/8425, and there is no longer any space auto-detection.

There are more patches to come in this same series around doing some validation earlier than in the peergrouper loop, but the a/b/c steps in John's comment above are current behaviour.

Changed in juju:
milestone: none → 2.4-beta1
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers