Canonical Juju

restore-backup op doesn't finish when ha is enabled.

Bug #1720740 reported by José Pekkarinen on 2017-10-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Triaged	Low	Unassigned
	2.1	Won't Fix	Undecided	Unassigned

Bug Description

Hi,

Steps to reproduce:

1) Bootstrap a controller.
2) Enable ha.
3) create a backup.
4) Restore the backup.

Expected: operation will succeed.
Got: Primary node api server died and procedure hang for more than 30 mins.

More output below:

$ juju bootstrap --build-agent localhost ant-controller
Creating Juju controller "ant-controller" on localhost/localhost
Building local Juju agent binary version 2.1.4 for amd64
To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md
Launching controller instance(s) on localhost/localhost...
- juju-a20416-0 (arch=amd64)
Fetching Juju GUI 2.9.2
Waiting for address
Attempting to connect to 192.168.0.3:22
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 192.168.0.3 to verify accessibility...
Bootstrap complete, "ant-controller" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added

$ juju enable-ha
maintaining machines: 0
adding machines: 1, 2

$ juju create-backup -m controller
20171002-064013.b56950f7-d04a-480a-8993-0b9871a20416
downloading to juju-backup-20171002-064013.tar.gz

$ juju restore-backup -m controller --id=20171002-064013.b56950f7-d04a-480a-8993-0b9871a20416

On other terminal:

$ juju ssh -m controller 0
...
root@juju-a20416-0:~# lsof -i | grep LISTEN
sshd 283 root 3u IPv4 151513 0t0 TCP *:ssh (LISTEN)
sshd 283 root 4u IPv6 151648 0t0 TCP *:ssh (LISTEN)
lxd-bridg 4420 root 3u IPv6 176863 0t0 TCP [fe80::1]:13128 (LISTEN)
mongod 5137 root 6u IPv4 340754 0t0 TCP *:37017 (LISTEN)
mongod 5137 root 7u IPv6 340755 0t0 TCP *:37017 (LISTEN)

root@juju-a20416-0:~# systemctl status jujud-machine-0
● jujud-machine-0.service - juju agent for machine-0
   Loaded: loaded (/var/lib/juju/init/jujud-machine-0/jujud-machine-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-10-02 06:22:23 UTC; 29min ago
Main PID: 4597 (bash)
    Tasks: 11
   Memory: 113.7M
   CGroup: /system.slice/jujud-machine-0.service
           ├─4597 bash /var/lib/juju/init/jujud-machine-0/exec-start.sh
           └─4601 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug

Oct 02 06:22:23 juju-a20416-0 systemd[1]: Started juju agent for machine-0.
Warning: jujud-machine-0.service changed on disk. Run 'systemctl daemon-reload' to reload units.

root@juju-a20416-0:~# systemctl daemon-reload

root@juju-a20416-0:~# systemctl status jujud-machine-0
● jujud-machine-0.service - juju agent for machine-0
   Loaded: loaded (/var/lib/juju/init/jujud-machine-0/jujud-machine-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-10-02 06:22:23 UTC; 30min ago
Main PID: 4597 (bash)
   CGroup: /system.slice/jujud-machine-0.service
           ├─4597 bash /var/lib/juju/init/jujud-machine-0/exec-start.sh
           └─4601 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug

Oct 02 06:22:23 juju-a20416-0 systemd[1]: Started juju agent for machine-0.
root@juju-a20416-0:~# lsof -i | grep LISTEN
sshd 283 root 3u IPv4 151513 0t0 TCP *:ssh (LISTEN)
sshd 283 root 4u IPv6 151648 0t0 TCP *:ssh (LISTEN)
lxd-bridg 4420 root 3u IPv6 176863 0t0 TCP [fe80::1]:13128 (LISTEN)
mongod 5137 root 6u IPv4 340754 0t0 TCP *:37017 (LISTEN)
mongod 5137 root 7u IPv6 340755 0t0 TCP *:37017 (LISTEN)

Best regards.

José.

Tags:

José Pekkarinen (koalinux) on 2017-10-02

tags:

added: 4010

Revision history for this message

John A Meinel (jameinel) wrote on 2017-10-04:

I don't believe we've ever released a Juju 2.1.4, so I'm a little curious what binary you are running. It sounds like maybe you are running from source in the 2.1 branch?

Can you also describe what OS you are running on? LXD picking 192.168.0.3 address seems a bit surprising. I'm thinking possibly trusty?

This doesn't really seem to have much to do with HA or not HA, as much as there seems to be an issue around 'juju restore' trying to create a user that already exists. oploger@admin certainly feels like a Mongo internal account (it sounds like the account in charge of tracking the local oplog and sharing it to replicas).

Now, we always launch Mongo with --replSet and --oplog because if you ever want to migrate from a single Mongo instance to multiple, you have to start the first one in a compatible mode.

I'm also curious if there is something like you're taking a backup on Trusty but restoring to Xenial (or vice versa). Some of those can have an effect because Trusty only has Mongo 2.4, while we use Mongo 3.2 on Xenial. I would guess that mongo doesn't support downgrading (so taking a backup of a 3.2 couldn't be restored to a 2.4), but that doesn't seem to be what you're encountering.

Have you tried doing backup and restore without HA in the middle to narrow down if it is just 'juju restore' that isn't working for you, vs restore after-HA?

Changed in juju:
status:	New → Incomplete

Revision history for this message

John A Meinel (jameinel) wrote on 2017-10-04:

We're unlikely to do another 2.1 release at this time, unless there is a specific support request that requires it.

Revision history for this message

José Pekkarinen (koalinux) wrote on 2017-10-04:

Hi,

I'm building it from the branch with no additional patches as you can see
from the bootstrap(--build-agent). Head is pointing to:

3839803838e02e23fab51bb23125a9001d7f2b1f

Networking is modified in my laptop to allow lxd and vms use the same bridge
with the ethernet in case I'm using it, which may be irrelevant to the case. For
this local test I'm just using xenial images to get it easier.

Before HA I remember to have restored recently a backup without noticeable inconveniences
in this environment. It's actually in the output of 1720737:

$ juju create-backup -m controller
20171002-062133.b56950f7-d04a-480a-8993-0b9871a20416
downloading to juju-backup-20171002-062133.tar.gz
pekkari@ant ~/workspace $ juju restore-backup -m controller --id=20171002-062133.b56950f7-d04a-480a-8993-0b9871a20416
restore from "20171002-062133.b56950f7-d04a-480a-8993-0b9871a20416" completed

José Pekkarinen (koalinux) on 2017-10-16

tags:

added: cpe-onsite

Tim Penhey (thumper) on 2017-10-19

Changed in juju:
status:	Incomplete → Triaged
importance:	Undecided → High

Heather Lanigan (hmlanigan) on 2018-01-03

tags:

added: restore-backup

Anastasia (anastasia-macmood) on 2018-03-22

Changed in juju:
status:	Triaged → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-05-22:

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status:	Incomplete → Expired

John A Meinel (jameinel) on 2019-03-29

Changed in juju:
status:	Expired → Triaged

Ian Booth (wallyworld) on 2020-09-01

tags:

added: backup-restore
removed: restore-backup

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2022-11-03:

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance:	High → Low
tags:	added: expirebugs-bot

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1720737

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.