bootstrap issue with replicasets on 1.20.1 with VM on MAAS provider

Bug #1340663 reported by Adam Collard
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
juju-core
Invalid
High
Unassigned

Bug Description

I'm trying to bootstrap Juju 1.20.1 using a MAAS provider. The node I'm bootstrapping on is a VM on an OrangeBox.

I hit repeated
2014-07-11 10:25:08 WARNING juju.replicaset replicaset.go:86 Initiate: fetching replication status failed: cannot get replica set status: can't get local.system.replset config from self or any seed (EMPTYCONFIG)

Followed by

2014-07-11 10:25:08 INFO juju.worker.peergrouper initiate.go:94 finished MaybeInitiateMongoServer
2014-07-11 10:25:08 ERROR juju.cmd supercommand.go:323 cannot initiate replica set: cannot get replica set status: can't get local.system.replset config from self or any seed (EMPTYCONFIG)
ERROR bootstrap failed: subprocess encountered error code 1
Stopping instance...
Bootstrap failed, destroying environment
ERROR subprocess encountered error code 1

Full log from bootstrap on http://paste.ubuntu.com/7779621/

I was asked to grab some mongo logs, but I'm struggling to do that since the bootstrap nodes is torn down. So far this is 100% reproducible.

description: updated
Revision history for this message
Adam Collard (adam-collard) wrote :

Using a loopback mount of the VM's image, I can see the bootstrap errors in cloud-init-output.log but there are no logs from mongo (/var/log/mongo doesn't exist)

Revision history for this message
Adam Collard (adam-collard) wrote :

 # sudo find . -name '*mongo*'
./etc/default/mongodb
./usr/share/doc/juju-mongodb
./usr/share/vim/vim74/keymap/mongolian_utf-8.vim
./usr/lib/juju/bin/mongos
./usr/lib/juju/bin/mongoimport
./usr/lib/juju/bin/mongod
./usr/lib/juju/bin/mongoexport
./usr/lib/juju/bin/mongodump
./usr/lib/juju/bin/mongorestore
./tmp/mongodb-37017.sock
./var/lib/dpkg/info/juju-mongodb.list
./var/lib/dpkg/info/juju-mongodb.md5sums
./var/lib/juju/db/mongod.lock
./var/cache/apt/archives/juju-mongodb_2.4.9-0ubuntu3_amd64.deb

This looks odd to me, distinct lack of mongo bits?

Revision history for this message
Adam Collard (adam-collard) wrote :

Found mongo entries in /var/log/syslog. http://paste.ubuntu.com/7779659/

Revision history for this message
Adam Collard (adam-collard) wrote :

Note that I only see this when bootstrapping to a VM. I don't see it for physical nodes.

summary: - bootstrap issue with replicasets on 1.20.1 with MAAS provider
+ bootstrap issue with replicasets on 1.20.1 with VM on MAAS provider
Revision history for this message
Adam Collard (adam-collard) wrote :

I see it always for this particular VM, but can't reproduce on other VMs in a (almost) parallel setup

Curtis Hovey (sinzui)
tags: added: bootstrap mongodb
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.21-alpha1
Revision history for this message
Michael Foord (mfoord) wrote :

From the syslog entries it appears that the real error is that mongodb can't reach the host machine via the address we provide to it when creating (initiating) the replica set. So it's a networking issue. However this error is masked by our bad error reporting. I've raised bug 1340749 to address that problem.

Revision history for this message
Ian Booth (wallyworld) wrote :

Marking this bug as invalid as the root cause is a networking setup issue external to juju-core.

Changed in juju-core:
status: Triaged → Invalid
Revision history for this message
Martin Packman (gz) wrote :

The underlying issue here is juju is being given a bogus address via the MAAS api.

On bootstrap, we cope with this, because we attempt to connect to each address in turn, and the 10.14.4.76 one is correct and succeeds:

    Attempting to connect to node0vm0.maas:22
    Attempting to connect to 10.14.4.59:22
    Attempting to connect to 10.14.4.76:22
    ...
    ci-info: | eth0 | True | 10.14.4.76 | 255.255.255.0 | 52:54:00:7b:ea:23 |
    ...
    DHCPACK of 10.14.4.76 from 10.14.4.1
    bound to 10.14.4.76 -- renewal in 17171 seconds.

However, when juju comes to tell mongo what its public facing address is, we pick 10.14.4.59 which is bogus but first in the list.

    Jul 11 10:24:15 node0vm0 mongod.37017[12806]: Fri Jul 11 10:24:15.576 [conn1] replSet replSetInitiate exception: can't find self in the replset config my port: 37017
    Jul 11 10:24:15 node0vm0 mongod.37017[12806]: Fri Jul 11 10:24:15.577 [conn1] command admin.$cmd command: { replSetInitiate: { _id: "juju", version: 1, members: [ { _id: 1, host: "10.14.4.59:37017", tags: { juju-machine-id: "0" } } ] } } ntoreturn:1 keyUpdates:0 locks(micros) W:1257 reslen:122 3002ms

I don't think we have a reasonable course of action if the cloud is giving us bogus information here. There's no sane way of retrying this operation, and testing each address beforehand just in case the api is lying about an address is also not robust - not all addresses are valid in all contexts, but we do want to keep them around.

Martin Packman (gz)
Changed in juju-core:
milestone: 1.21-alpha1 → none
Revision history for this message
Patrick Hetu (patrick-hetu) wrote :

I think I've got this bug using the local provider.

If got an external (eth0) network in 10.36.0.0/24 and an internal (lxcbr0) in 10.3.0.0/24 and got this errror:

[initandlisten] connection accepted from 10.36.0.35:42103 #2 (1 connection now open)
[conn2] assertion 16550 not authorized for query on local.system.replset ns:local.system.replset query:{}
[conn2] ntoskip:0 ntoreturn:-1
[conn2] end connection 10.36.0.35:42103 (0 connections now open)

It looks like a juju is picking the 10.36.0.0/24 network instead of 10.3.0.0 like you said.
I was wondering if there is anything I can do to fix that?

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

It looks like you might be hitting the same issue as in bug 1416928. Can you retry with the latest 1.21.3 proposed release and see if that resolves the issue? https://launchpad.net/~juju/+archive/ubuntu/proposed

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.