juju 2.4.0. The restore command hangs for an unresponsive controller case

Bug #1783340 reported by Anton Kremenetsky
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

I did a backup and then try to restore from it. If a controller is unresponsive the restore command hangs. The previous version of Juju (2.3.8) does not have such problem. I'm attaching logs what I did.

Juju 2.3.8 - https://paste.ubuntu.com/p/4wrCgHZ6kW/
Juju 2.4.0 - https://paste.ubuntu.com/p/fytBS8jcRJ/

I see the following errors in the logs, the Juju 2.4.0 case.
Jul 24 13:41:47 juju-controller mongod.37017[5439]: [conn6044] SSL: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate
Jul 24 13:41:47 juju-controller mongod.37017[5439]: [conn6045] SSL: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate
Jul 24 13:41:47 juju-controller mongod.37017[5439]: [conn6046] SSL: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate
Jul 24 13:41:47 juju-controller mongod.37017[5439]: [conn6047] SSL: error:14094412:SSL routines:ssl3_read_bytes:sslv3 alert bad certificate

Revision history for this message
Anton Kremenetsky (akremenetsky) wrote :

This patch solves the issue partly. https://github.com/juju/juju/pull/8993

tags: added: backup-restore
Changed in juju:
status: New → Triaged
Revision history for this message
Heather Lanigan (hmlanigan) wrote :

The PR above has been landed. The current error is:

$ juju restore-backup -m controller --file /tmp/juju-backup-nagios-1.tgz --debug
...
10:34:38 DEBUG juju.api.backups restore.go:183 Attempting finishRestore
10:34:38 INFO juju.juju api.go:67 connecting to API addresses: [10.63.22.57:17070]
10:34:46 DEBUG juju.api apiclient.go:855 error dialing websocket: x509: certificate signed by unknown authority
10:34:46 DEBUG juju.rpc server.go:325 error closing codec: write tcp 10.63.22.1:36666->10.63.22.57:17070: i/o timeout
ERROR could not finish restore process: : unable to connect to API: x509: certificate signed by unknown authority
10:34:46 DEBUG cmd supercommand.go:459 error stack:
x509: certificate signed by unknown authority
github.com/juju/juju/api/apiclient.go:890:
github.com/juju/juju/api/apiclient.go:856: unable to connect to API
github.com/juju/juju/api/apiclient.go:752:
github.com/juju/juju/api/apiclient.go:597:
github.com/juju/juju/api/apiclient.go:197:
github.com/juju/juju/juju/api.go:72:
github.com/juju/juju/cmd/juju/backups/backups.go:76:
github.com/juju/juju/cmd/juju/backups/restore.go:145:
github.com/juju/juju/api/backups/restore.go:187:
github.com/juju/juju/api/backups/restore.go:164: could not finish restore process:
github.com/juju/juju/cmd/juju/backups/restore.go:200:
$
$ juju status
ERROR unable to connect to API: x509: certificate signed by unknown authority

Additional info from @akremenetsky (via pr) on next steps:

As for the error that you see. This occurs due to an incorrect certificate on the client side. When you did "bootstrap" command (number 5 from QA steps), a new certificates is generated on the client side. Then you perform the "restore-backup" command. Juju controller sets a certificate that was in the backup file but on the client side no changes occur. The client uses the certificate that "bootstrap" command generated. As result incorrect cert. on the client.
A possible solution. The "restore-backup" command should update certificates on the client side as well.

Felipe Reyes (freyes)
tags: added: sts
Revision history for this message
Felipe Reyes (freyes) wrote :

When using lxd provider and juju from the edge channel, this is what I get -> https://pastebin.ubuntu.com/p/z5rd4hpqDH/

This is the machine-0.log of the machine where I tried to restore the backup http://paste.ubuntu.com/p/GtYNVGtGTt/

So if I understand this correctly, the restore procedure assumes that the same IP addresses will be configured in the controller machine, this is optimistic for all the providers except MAAS and even when using MAAS provider there will be situations where the operator needs to restore the controller on a new machine (new nic, so a new IP address is assigned, because the nic is connected to a different port in the switch and it's not that easy to reuse the old IP).

Tim Penhey (thumper)
Changed in juju:
importance: Undecided → Medium
Jeff Hillman (jhillman)
tags: added: cpe-onsite
Revision history for this message
Vern Hart (vern) wrote :

Any updates on this? We are seeing this at a customer deployment.

Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.