juju restore failed with "error: cannot update machines: machine update failed: ssh command failed: "
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | juju-core |
Undecided
|
Unassigned | ||
| | 1.22 |
High
|
Eric Snow | ||
Bug Description
Hi
I failed to restore with maas envirionment (juju version is 1.21.3)
It looks fine to bootstrap with backup file during the process of restore.
But it failed when it updated services machines and logs are as below
updating all machines
updating machine: 1
updating machine: 2
updating machine: 3
2015-03-20 06:36:45 DEBUG juju.utils.ssh ssh.go:244 using OpenSSH ssh client
2015-03-20 06:36:45 DEBUG juju.utils.ssh ssh.go:244 using OpenSSH ssh client
2015-03-20 06:36:45 DEBUG juju.utils.ssh ssh.go:244 using OpenSSH ssh client
2015-03-20 06:36:45 ERROR juju.plugins.
Before restore, juju status show like this
ubuntu@maas01:~$ juju status
environment: maas
machines:
"0":
agent-state: started
agent-version: 1.21.3.1
dns-name: bootstrap01.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=2 mem=2048M tags=bootstrap,
state-
"1":
agent-state: started
agent-version: 1.21.3.1
dns-name: ceph02.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=2 mem=2048M tags=ceph,virtual
"2":
agent-state: started
agent-version: 1.21.3.1
dns-name: ceph01.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=2 mem=2048M tags=ceph,virtual
"3":
agent-state: started
agent-version: 1.21.3.1
dns-name: ceph03.maas
instance-id: /MAAS/api/
series: trusty
hardware: arch=amd64 cpu-cores=2 mem=2048M tags=ceph,virtual
services:
ceph:
charm: local:trusty/
exposed: false
relations:
mon:
- ceph
units:
ceph/0:
machine: "1"
ceph/1:
machine: "2"
ceph/2:
machine: "3"
networks:
maas-eth0:
provider-id: maas-eth0
cidr: 10.100.1.0/24
and I attached a full log of restore
Thanks
| Alex Kang (thkang0) wrote : | #1 |
| description: | updated |
| tags: | added: backup-restore maas-provider |
| Changed in juju-core: | |
| status: | New → Triaged |
| importance: | Undecided → High |
| milestone: | none → 1.23-beta1 |
| milestone: | 1.23-beta1 → 1.23-beta2 |
| Changed in juju-core: | |
| milestone: | 1.23-beta2 → 1.24-alpha1 |
| Changed in juju-core: | |
| assignee: | nobody → Eric Snow (ericsnowcurrently) |
| Eric Snow (ericsnowcurrently) wrote : | #2 |
| Eric Snow (ericsnowcurrently) wrote : | #3 |
The error implies that the /var/lib/
Also, is there a chance that at the time of the backup there was a machine set up in juju that has since been removed from juju (and the agents directory deleted) but is still reachable via SSH?
| Alex Kang (thkang0) wrote : | #4 |
There is no /var/lib/
When the restre process was going, the process deleted that directory.. I think.
And it is also reachable via ssh even though juju restore failed. I can connect to the non-state machines with ip address that machines has.
By the way how can I get the juju 1.22?
I am using the ppa repo as ppa:juju/stable
| Alex Kang (thkang0) wrote : | #5 |
I upgraded juju from 1.21.3 to 1.22 and tried to backup and restore but it failed again.
I attached logs for what I did and state machine log.
The state machine could not run state api server.
| Changed in juju-core: | |
| status: | Triaged → In Progress |
| Eric Snow (ericsnowcurrently) wrote : | #6 |
From the logs it looks like you're running into a different failure mode under 1.22 and it probably isn't restore-related. The key entry is:
2015-03-25 02:30:01 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap04.maas, not juju-mongodb
We ran into this in juju CI when we upgraded to 1.22. See lp:1434680, which has since been fixed. Could you verify?
| no longer affects: | juju-core/1.23 |
| Changed in juju-core: | |
| milestone: | 1.24-alpha1 → none |
| assignee: | Eric Snow (ericsnowcurrently) → nobody |
| importance: | High → Undecided |
| status: | In Progress → Invalid |
| Alex Kang (thkang0) wrote : | #7 |
I upgraded 1.22 version two days ago, which is released from lp:1434680 you mentioned.
This is a normal deployment process.
1. I destoryed whole envirionment with "juju destroy-
2. I bootstrapped a machine : juju bootstrap --constraints tags=bootstrap --upload-tools
3. I deployed ceph envirionment : juju-deployer --config bundle.yaml ceph
4. I back it up : juju backups create
5. I assume that a state machine has a failure situation : Delete the machine from maas
6. I restore the envirionemnt with the backup file : juju-restore --constraints tags=bootstrap backupfile.tar.gz
So in this process I got the error as above.
bootstrap04.maas is an old state machine which I deleted from maas for this test.
But juju restore is still using old hostname? which means do I have to set all server configuration same as old one?
| Eric Snow (ericsnowcurrently) wrote : | #8 |
Did you pull it build juju from source or update the package via apt-get? It's possible that the package wasn't quite up to date. I ask because that message I found in the logs definitely indicates that juju failed due to lp:1434680 (or that the bug isn't actually fixed). I'm not well enough versed in the mechanisms behind the distro package release to give you a more confident expectation (you'd have to ask sinzui). However, I'm still fairly confident that the bug is fixed, which would mean the juju against which you ran wasn't quite up to date yet.
| Alex Kang (thkang0) wrote : | #9 |
I updated this version 1.22 from ppa and it shows that I am using version 1.22
ubuntu@maas01:~$ dpkg -l | grep juju
ii juju-core 1.22.0-
ii juju-deployer 0.4.3-0ubuntu1~
ii python-jujuclient 0.50.1-2 amd64 Python API client for juju-core
And I got same error in another envirionment which is same as above error
I deployed an envrionment with juju 1.22 and restarted a state machine but it can't load juju api service
2015-03-26 02:28:28 INFO juju.worker runner.go:261 start "api"
2015-03-26 02:28:28 INFO juju.api apiclient.go:252 dialing "wss://
2015-03-26 02:28:28 INFO juju.api apiclient.go:260 error dialing "wss://
2015-03-26 02:28:28 ERROR juju.worker runner.go:219 exited "api": unable to connect to "wss://
2015-03-26 02:28:28 INFO juju.worker runner.go:253 restarting "api" in 3s
2015-03-26 02:28:28 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap05.maas, not juju-mongodb
2015-03-26 02:28:29 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap05.maas, not juju-mongodb
2015-03-26 02:28:30 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap05.maas, not juju-mongodb
2015-03-26 02:28:30 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap05.maas, not juju-mongodb
2015-03-26 02:28:30 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap05.maas, not juju-mongodb
2015-03-26 02:28:31 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, bootstrap05.maas, not juju-mongodb
2015-03-26 02:28:31 INFO juju.worker runner.go:261 start "api"
2015-03-26 02:28:31 INFO juju.api apiclient.go:252 dialing "wss://
2015-03-26 02:28:31 INFO juju.api apiclient.go:260 error dialing "wss://
Is there anyway to verify that juju is latest version?
| Eric Snow (ericsnowcurrently) wrote : | #10 |
Thanks for you patience on this. I've verified with Curtis (sinzui) that the fix for lp:1434680 will be in 1.20.1, which will be released in the next few days. Sorry for the confusion.
| Alex Kang (thkang0) wrote : | #11 |
Thanks Eric.
I will test it again when you release 1.20.1 and let you know if I have a problem.
| Eric Snow (ericsnowcurrently) wrote : | #12 |
Sorry, Alex. I meant 1.22.1, not 1.20.1.
| Aaron Bentley (abentley) wrote : | #13 |
Landed for 1.22 in https:/
| Andrew Love (andrew-love) wrote : | #14 |
I have the exact same issue in an environment I cannot destroy.
I am now at juju version 1.22.1-utopic on a management server. Should this version have the fix included?
How do I fix an existing state server (the bootstrap server) in a non-destructive fashion? At the moment I cannot get the state server service to listen on port 17070 due to the error loop:
2015-04-14 13:47:10 DEBUG juju.mongo open.go:122 TLS handshake failed: x509: certificate is valid for localhost, juju-apiserver, cloud-node-03.maas, not juju-mongodb
2015-04-14 13:47:11 INFO juju.worker runner.go:261 start "api"
2015-04-14 13:47:11 INFO juju.api apiclient.go:252 dialing "wss://
2015-04-14 13:47:11 INFO juju.api apiclient.go:260 error dialing "wss://
2015-04-14 13:47:11 ERROR juju.worker runner.go:219 exited "api": unable to connect to "wss://
2015-04-14 13:47:11 INFO juju.worker runner.go:253 restarting "api" in 3s
| Andrew Love (andrew-love) wrote : | #15 |
As an addition to the above, the state server contains the following in /var/lib/
{"version"
If my juju client (management server) is at 1.22.1 and my state server is at 1.22.0, how does one push juju upgrades to the state server?
(This may be the same answer as the question I asked above.)


Could you see if this is still a problem if you run with juju 1.22? I'm checking from my side too.