restore-backup op doesn't finish when ha is enabled.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Low
|
Unassigned | ||
2.1 |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
Hi,
Steps to reproduce:
1) Bootstrap a controller.
2) Enable ha.
3) create a backup.
4) Restore the backup.
Expected: operation will succeed.
Got: Primary node api server died and procedure hang for more than 30 mins.
More output below:
$ juju bootstrap --build-agent localhost ant-controller
Creating Juju controller "ant-controller" on localhost/localhost
Building local Juju agent binary version 2.1.4 for amd64
To configure your system to better support LXD containers, please see: https:/
Launching controller instance(s) on localhost/
- juju-a20416-0 (arch=amd64)
Fetching Juju GUI 2.9.2
Waiting for address
Attempting to connect to 192.168.0.3:22
Logging to /var/log/
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 192.168.0.3 to verify accessibility...
Bootstrap complete, "ant-controller" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added
$ juju enable-ha
maintaining machines: 0
adding machines: 1, 2
$ juju create-backup -m controller
20171002-
downloading to juju-backup-
$ juju restore-backup -m controller --id=20171002-
On other terminal:
$ juju ssh -m controller 0
...
root@juju-
sshd 283 root 3u IPv4 151513 0t0 TCP *:ssh (LISTEN)
sshd 283 root 4u IPv6 151648 0t0 TCP *:ssh (LISTEN)
lxd-bridg 4420 root 3u IPv6 176863 0t0 TCP [fe80::1]:13128 (LISTEN)
mongod 5137 root 6u IPv4 340754 0t0 TCP *:37017 (LISTEN)
mongod 5137 root 7u IPv6 340755 0t0 TCP *:37017 (LISTEN)
root@juju-
● jujud-machine-
Loaded: loaded (/var/lib/
Active: active (running) since Mon 2017-10-02 06:22:23 UTC; 29min ago
Main PID: 4597 (bash)
Tasks: 11
Memory: 113.7M
CGroup: /system.
├─4597 bash /var/lib/
└─4601 /var/lib/
Oct 02 06:22:23 juju-a20416-0 systemd[1]: Started juju agent for machine-0.
Warning: jujud-machine-
root@juju-
root@juju-
● jujud-machine-
Loaded: loaded (/var/lib/
Active: active (running) since Mon 2017-10-02 06:22:23 UTC; 30min ago
Main PID: 4597 (bash)
CGroup: /system.
├─4597 bash /var/lib/
└─4601 /var/lib/
Oct 02 06:22:23 juju-a20416-0 systemd[1]: Started juju agent for machine-0.
root@juju-
sshd 283 root 3u IPv4 151513 0t0 TCP *:ssh (LISTEN)
sshd 283 root 4u IPv6 151648 0t0 TCP *:ssh (LISTEN)
lxd-bridg 4420 root 3u IPv6 176863 0t0 TCP [fe80::1]:13128 (LISTEN)
mongod 5137 root 6u IPv4 340754 0t0 TCP *:37017 (LISTEN)
mongod 5137 root 7u IPv6 340755 0t0 TCP *:37017 (LISTEN)
Best regards.
José.
tags: | added: 4010 |
tags: | added: cpe-onsite |
Changed in juju: | |
status: | Incomplete → Triaged |
importance: | Undecided → High |
tags: | added: restore-backup |
Changed in juju: | |
status: | Triaged → Incomplete |
Changed in juju: | |
status: | Expired → Triaged |
tags: |
added: backup-restore removed: restore-backup |
I don't believe we've ever released a Juju 2.1.4, so I'm a little curious what binary you are running. It sounds like maybe you are running from source in the 2.1 branch?
Can you also describe what OS you are running on? LXD picking 192.168.0.3 address seems a bit surprising. I'm thinking possibly trusty?
This doesn't really seem to have much to do with HA or not HA, as much as there seems to be an issue around 'juju restore' trying to create a user that already exists. oploger@admin certainly feels like a Mongo internal account (it sounds like the account in charge of tracking the local oplog and sharing it to replicas).
Now, we always launch Mongo with --replSet and --oplog because if you ever want to migrate from a single Mongo instance to multiple, you have to start the first one in a compatible mode.
I'm also curious if there is something like you're taking a backup on Trusty but restoring to Xenial (or vice versa). Some of those can have an effect because Trusty only has Mongo 2.4, while we use Mongo 3.2 on Xenial. I would guess that mongo doesn't support downgrading (so taking a backup of a 3.2 couldn't be restored to a 2.4), but that doesn't seem to be what you're encountering.
Have you tried doing backup and restore without HA in the middle to narrow down if it is just 'juju restore' that isn't working for you, vs restore after-HA?