restore-backup op doesn't finish when ha is enabled.

Bug #1720740 reported by José Pekkarinen
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned
2.1
Won't Fix
Undecided
Unassigned

Bug Description

Hi,

Steps to reproduce:

1) Bootstrap a controller.
2) Enable ha.
3) create a backup.
4) Restore the backup.

Expected: operation will succeed.
Got: Primary node api server died and procedure hang for more than 30 mins.

More output below:

 $ juju bootstrap --build-agent localhost ant-controller
Creating Juju controller "ant-controller" on localhost/localhost
Building local Juju agent binary version 2.1.4 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on localhost/localhost...
 - juju-a20416-0 (arch=amd64)
Fetching Juju GUI 2.9.2
Waiting for address
Attempting to connect to 192.168.0.3:22
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 192.168.0.3 to verify accessibility...
Bootstrap complete, "ant-controller" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added

$ juju enable-ha
maintaining machines: 0
adding machines: 1, 2

$ juju create-backup -m controller
20171002-064013.b56950f7-d04a-480a-8993-0b9871a20416
downloading to juju-backup-20171002-064013.tar.gz

$ juju restore-backup -m controller --id=20171002-064013.b56950f7-d04a-480a-8993-0b9871a20416

On other terminal:

$ juju ssh -m controller 0
...
root@juju-a20416-0:~# lsof -i | grep LISTEN
sshd 283 root 3u IPv4 151513 0t0 TCP *:ssh (LISTEN)
sshd 283 root 4u IPv6 151648 0t0 TCP *:ssh (LISTEN)
lxd-bridg 4420 root 3u IPv6 176863 0t0 TCP [fe80::1]:13128 (LISTEN)
mongod 5137 root 6u IPv4 340754 0t0 TCP *:37017 (LISTEN)
mongod 5137 root 7u IPv6 340755 0t0 TCP *:37017 (LISTEN)

root@juju-a20416-0:~# systemctl status jujud-machine-0
● jujud-machine-0.service - juju agent for machine-0
   Loaded: loaded (/var/lib/juju/init/jujud-machine-0/jujud-machine-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-10-02 06:22:23 UTC; 29min ago
 Main PID: 4597 (bash)
    Tasks: 11
   Memory: 113.7M
   CGroup: /system.slice/jujud-machine-0.service
           ├─4597 bash /var/lib/juju/init/jujud-machine-0/exec-start.sh
           └─4601 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug

Oct 02 06:22:23 juju-a20416-0 systemd[1]: Started juju agent for machine-0.
Warning: jujud-machine-0.service changed on disk. Run 'systemctl daemon-reload' to reload units.

root@juju-a20416-0:~# systemctl daemon-reload

root@juju-a20416-0:~# systemctl status jujud-machine-0
● jujud-machine-0.service - juju agent for machine-0
   Loaded: loaded (/var/lib/juju/init/jujud-machine-0/jujud-machine-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-10-02 06:22:23 UTC; 30min ago
 Main PID: 4597 (bash)
   CGroup: /system.slice/jujud-machine-0.service
           ├─4597 bash /var/lib/juju/init/jujud-machine-0/exec-start.sh
           └─4601 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug

Oct 02 06:22:23 juju-a20416-0 systemd[1]: Started juju agent for machine-0.
root@juju-a20416-0:~# lsof -i | grep LISTEN
sshd 283 root 3u IPv4 151513 0t0 TCP *:ssh (LISTEN)
sshd 283 root 4u IPv6 151648 0t0 TCP *:ssh (LISTEN)
lxd-bridg 4420 root 3u IPv6 176863 0t0 TCP [fe80::1]:13128 (LISTEN)
mongod 5137 root 6u IPv4 340754 0t0 TCP *:37017 (LISTEN)
mongod 5137 root 7u IPv6 340755 0t0 TCP *:37017 (LISTEN)

Best regards.

José.

tags: added: 4010
Revision history for this message
John A Meinel (jameinel) wrote :

I don't believe we've ever released a Juju 2.1.4, so I'm a little curious what binary you are running. It sounds like maybe you are running from source in the 2.1 branch?

Can you also describe what OS you are running on? LXD picking 192.168.0.3 address seems a bit surprising. I'm thinking possibly trusty?

This doesn't really seem to have much to do with HA or not HA, as much as there seems to be an issue around 'juju restore' trying to create a user that already exists. oploger@admin certainly feels like a Mongo internal account (it sounds like the account in charge of tracking the local oplog and sharing it to replicas).

Now, we always launch Mongo with --replSet and --oplog because if you ever want to migrate from a single Mongo instance to multiple, you have to start the first one in a compatible mode.

I'm also curious if there is something like you're taking a backup on Trusty but restoring to Xenial (or vice versa). Some of those can have an effect because Trusty only has Mongo 2.4, while we use Mongo 3.2 on Xenial. I would guess that mongo doesn't support downgrading (so taking a backup of a 3.2 couldn't be restored to a 2.4), but that doesn't seem to be what you're encountering.

Have you tried doing backup and restore without HA in the middle to narrow down if it is just 'juju restore' that isn't working for you, vs restore after-HA?

Changed in juju:
status: New → Incomplete
Revision history for this message
John A Meinel (jameinel) wrote :

We're unlikely to do another 2.1 release at this time, unless there is a specific support request that requires it.

Revision history for this message
José Pekkarinen (koalinux) wrote :

Hi,

I'm building it from the branch with no additional patches as you can see
from the bootstrap(--build-agent). Head is pointing to:

3839803838e02e23fab51bb23125a9001d7f2b1f

Networking is modified in my laptop to allow lxd and vms use the same bridge
with the ethernet in case I'm using it, which may be irrelevant to the case. For
this local test I'm just using xenial images to get it easier.

Before HA I remember to have restored recently a backup without noticeable inconveniences
in this environment. It's actually in the output of 1720737:

$ juju create-backup -m controller
20171002-062133.b56950f7-d04a-480a-8993-0b9871a20416
downloading to juju-backup-20171002-062133.tar.gz
pekkari@ant ~/workspace $ juju restore-backup -m controller --id=20171002-062133.b56950f7-d04a-480a-8993-0b9871a20416
restore from "20171002-062133.b56950f7-d04a-480a-8993-0b9871a20416" completed

tags: added: cpe-onsite
Tim Penhey (thumper)
Changed in juju:
status: Incomplete → Triaged
importance: Undecided → High
tags: added: restore-backup
Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
John A Meinel (jameinel)
Changed in juju:
status: Expired → Triaged
Ian Booth (wallyworld)
tags: added: backup-restore
removed: restore-backup
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: High → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.