juju-db.service fails to start after update to juju-mongo-tools3.2:amd64 (3.2.4+ds-0ubuntu1)

Bug #1743901 reported by John George on 2018-01-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
John A Meinel
2.2
High
John A Meinel
2.3
High
John A Meinel

Bug Description

Juju controller agent-version: 2.1.2 becomes unresponsive (e.g. juju status times out).
This happened just after an apt upgrade installed juju-mongo-tools3.2

Upon inspection of the controller logs show:

Jan 17 22:37:21 juju-3bc6d2-controller-0 systemd[1]: Started juju state database.
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod[1519]: 2018-01-17T22:37:21.860+0000 W CONTROL [main] No SSL certificate validation can be performed since no CA file has been provided; please specify an sslCAFile parameter
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] MongoDB starting : pid=1519 port=37017 dbpath=/var/lib/juju/db 64-bit host=juju-3bc6d2-controller-0
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] db version v3.2.15
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] git version: e11e3c1b9c9ce3f7b4a79493e16f5e4504e01140
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] allocator: tcmalloc
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] modules: none
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] build environment:
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] distarch: x86_64
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] target_arch: x86_64
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] options: { net: { ipv6: true, port: 37017, ssl: { PEMKeyFile: "/var/lib/juju/server.pem", PEMKeyPassword: "<password>", mode: "requireSSL" } }, replication: { oplogSizeMB: 1024, replSet: "juju" }, security: { authorization: "enabled", keyFile: "/var/lib/juju/shared-secret" }, storage: { dbPath: "/var/lib/juju/db", engine: "wiredTiger", journal: { enabled: true } }, systemLog: { destination: "syslog", quiet: true } }
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] exception in initAndListen: 28662 Cannot start server. Detected data files in /var/lib/juju/db created by the 'mmapv1' storage engine, but the specified storage engine was 'wiredTiger'., terminating
Jan 17 22:37:21 juju-3bc6d2-controller-0 mongod.37017[1519]: [initandlisten] dbexit: rc: 100
Jan 17 22:37:21 juju-3bc6d2-controller-0 systemd[1]: juju-db.service: Main process exited, code=exited, status=100/n/a
Jan 17 22:37:21 juju-3bc6d2-controller-0 systemd[1]: juju-db.service: Unit entered failed state.
Jan 17 22:37:21 juju-3bc6d2-controller-0 systemd[1]: juju-db.service: Failed with result 'exit-code'.
Jan 17 22:37:22 juju-3bc6d2-controller-0 systemd[1]: juju-db.service: Service hold-off time over, scheduling restart.
Jan 17 22:37:22 juju-3bc6d2-controller-0 systemd[1]: Stopped juju state database.
Jan 17 22:37:22 juju-3bc6d2-controller-0 systemd[1]: juju-db.service: Start request repeated too quickly.
Jan 17 22:37:22 juju-3bc6d2-controller-0 systemd[1]: Failed to start juju state database.

John George (jog) wrote :
tags: added: cdo-qa
John George (jog) wrote :
Ante Karamatić (ivoks) wrote :

This is an interesting part (and probably a separate bug):

2017-04-14 16:17:51 INFO juju.utils.packaging.manager utils.go:98 Retrying: &{/usr/bin/apt-get [apt-get --option=Dpkg::Options::=--force-confold --option=Dpkg::options::=--force-unsafe-io --ass
ume-yes --quiet install juju-mongodb3.2] [] <nil> Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
  libboost-chrono1.58.0 libboost-regex1.58.0 libyaml-cpp0.5v5
The following NEW packages will be installed:
  juju-mongodb3.2 libboost-chrono1.58.0 libboost-regex1.58.0 libyaml-cpp0.5v5
0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded.
Need to get 24.4 MB/24.8 MB of archives.
After this operation, 86.9 MB of additional disk space will be used.
Err:1 http://nova.clouds.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 juju-mongodb3.2 amd64 3.2.9-0ubuntu1~16.04
  404 Not Found [IP: 91.189.92.201 80]
E: Failed to fetch http://nova.clouds.archive.ubuntu.com/ubuntu/pool/universe/j/juju-mongodb3.2/juju-mongodb3.2_3.2.9-0ubuntu1~16.04_amd64.deb 404 Not Found [IP: 91.189.92.201 80]

Does juju not do apt-get update before trying to install?

John A Meinel (jameinel) wrote :

I believe we apt update but as part of "enable-os-update" that can be set to False.

As for what is specifically failing, it is this line:
[initandlisten] exception in initAndListen: 28662 Cannot start server. Detected data files in /var/lib/juju/db created by the 'mmapv1' storage engine, but the specified storage engine was 'wiredTiger'., terminating

AFAICT, we have always started Mongo 3.2 with the --storageEngine wiredTiger (it is part of the systemd script).

So I don't know how the directory could have been created with an mmap storage pool.

Has this been running a while? Is there some reason why a different mongo would have been installed on the machine?
(Its possible that some other package decided it wanted to have /usr/bin/mongo installed, and we had failed to install /usr/lib/juju/mongo3.2/bin/mongo, and thus we fell back to the system mongo during startup, which seemed to be working, until you installed the real mongo, and it decided to try to use it?)

John A Meinel (jameinel) wrote :

While debugging, it seems we had both juju-mongodb and juju-mongodb3.2 installed. And the database on disk was *actually* juju-mongodb and a 2.4 version database.
From what I can put together in the code we have:
func packagesForSeries(series string) ([]string, []string) {
 switch series {
 case "precise", "quantal", "raring", "saucy", "centos7":
  return []string{"mongodb-server"}, []string{}
 case "trusty", "wily", "xenial":
  return []string{JujuMongoPackage, JujuMongoToolsPackage}, []string{"juju-mongodb"}

And we have code that says "if there is any error trying to install the first list of packages fallback to the second list of packages".
So the timeline as near as I can tell is that on this machine

1) There was something wrong with the cloud archive telling us about juju-mongodb3.2 so we failed to install it.
2) We triggered the fallback code to install juju-mongodb instead, and somehow that succeeded.
3) We then configured everything to run using mongo 2.4 and mmapv1 storage engine.
4) Many moons later someone came along to install juju-mongo-tools3.2, which probably has a dependency on juju-mongodb3.2
5) When Juju restarted, it noticed that it now had mongo-3.2 available, and so changed the systemd startup script to use it, and thought it could just run in the same place, which it cannot.
We'll need a bunch of upgrade steps to actually get from 2.4 on-disk-w-mmapv1 to 3.2-on-disk-with-wiredTiger.

John A Meinel (jameinel) wrote :

Note the official mongo upgrade steps are:
https://docs.mongodb.com/manual/release-notes/2.6-upgrade/
https://docs.mongodb.com/manual/release-notes/3.0-upgrade/
https://docs.mongodb.com/manual/release-notes/3.2-upgrade/

All of those say "to upgrade to 2.6 you must be running 2.4", "to upgrade to 3.0 you must be running 2.6", "to upgrade to 3.2 you must be running 3.0".
And there is one extra wrinkle that 2.6 introduced a different user authentication mechanism than 2.4. And 3.0 only supports the new one. So you have to go from 2.4 to 2.6, then fix the user authentication, and then go 2.6 to 3.0, and then 3.0 to 3.2.

The actual *Juju* recommendation for how to handle this is to bootstrap a new controller and then migrate the models to the Xenial one.

John A Meinel (jameinel) wrote :
Changed in juju:
status: New → In Progress
importance: Undecided → High
assignee: nobody → John A Meinel (jameinel)
milestone: none → 2.4-beta1
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/8302

Is a branch targeting 2.2, which is the oldest that we might *possibly* support backporting. It doesn't fix this particular site as it stands, but it does prevent us from accidentally causing this in the future.

John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/8303 is bringing this into 2.3

Anastasia (anastasia-macmood) wrote :

Bigger merge (https://github.com/juju/juju/pull/8296) is bringing this change into develop (heading into 2.4.x)

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers