Model Migration Fails on 3rd attempt
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Christian Muirhead | ||
2.2 |
Fix Released
|
High
|
Christian Muirhead |
Bug Description
Recreation steps:
Using lxd provider bootstrap controller a and b
add-model foo on controller a
juju deploy elasticsearch-
juju migrate foo b & watch --color -n1 juju status -m b:foo --color (watch the migration complete sucessfully)
juju switch b:controller
juju migrate foo a & watch --color -n1 juju status -m a:foo --color (watch the migration complete successfully)
juju switch a:controller
juju migrate foo b & watch --color -n1 juju status -m b:foo --color
fails with:
ERROR source prechecks failed: machine 0 not running (started)
juju status -m a:foo
Model Controller Cloud/Region Version
foo a localhost/localhost 2.1-beta2.1
App Version Status Scale Charm Store Rev OS Notes
elasticsearch waiting 1 elasticsearch jujucharms 15 ubuntu
kibana waiting 1 kibana jujucharms 14 ubuntu
Unit Workload Agent Machine Public address Ports Message
elasticsearch/0* unknown idle 0 192.168.88.162 9200/tcp
kibana/0* active idle 1 192.168.88.140 80/tcp,9200/tcp ready
Machine State DNS Inst id Series AZ
0 started 192.168.88.162 juju-ebe370-0 trusty
1 started 192.168.88.140 juju-ebe370-1 trusty
Relation Provides Consumes Type
peer elasticsearch elasticsearch peer
rest elasticsearch kibana regular
Changed in juju: | |
status: | Triaged → In Progress |
assignee: | Menno Smits (menno.smits) → Christian Muirhead (2-xtian) |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
milestone: | 2.1-rc1 → 2.1-beta3 |
Changed in juju: | |
status: | Fix Committed → Fix Released |
I can reproduce this. I got really confused by the code and the message here first, because the machine status when everything is fine has always been "started" rather than "running". But looking at the code it's checking InstanceStatus(), rather than Status(). Using show-machine to see the instance status gives this on a fresh unmigrated model:
xtian@marathe:~$ juju show-machine 0
model: m2
machines:
"0":
juju-status:
current: started
since: 01 Dec 2016 16:44:28+13:00
version: 2.1-beta2.1
dns-name: 10.218.39.41
ip-addresses:
- 10.218.39.41
instance-id: juju-1cb8c3-0
machine-status:
current: running
message: Running
since: 01 Dec 2016 16:42:53+13:00
series: xenial
hardware: arch=amd64 cores=0 mem=0M
Note machine- status. current == running
Migrating once gives and showing the same machine gives:
xtian@marathe:~$ juju migrate -c A m2 B d84e-4c14- 81b3-06befa1cb8 c3:0"
Migration started with ID "6dd2f792-
xtian@marathe:~$ juju switch B:m2
A:admin/m2 -> B:admin/m2
xtian@marathe:~$ juju show-machine 0
model: m2
machines:
"0":
juju-status:
current: started
since: 01 Dec 2016 16:50:51+13:00
version: 2.1-beta2.1
dns-name: 10.218.39.41
ip-addresses:
- 10.218.39.41
instance-id: juju-1cb8c3-0
machine-status:
current: running
message: Running
since: 01 Dec 2016 16:50:41+13:00
series: xenial
hardware: arch=amd64
Machine status is still running. Migrating back gives:
xtian@marathe:~$ juju migrate -c B m2 A d84e-4c14- 81b3-06befa1cb8 c3:1"
Migration started with ID "6dd2f792-
xtian@marathe:~$ juju switch A:m2
B:admin/m2 -> A:admin/m2
xtian@marathe:~$ juju show-machine 0
model: m2
machines:
"0":
juju-status:
current: started
since: 01 Dec 2016 16:59:12+13:00
version: 2.1-beta2.1
dns-name: 10.218.39.41
ip-addresses:
- 10.218.39.41
instance-id: juju-1cb8c3-0
machine-status:
current: started
since: 01 Jan 1970 12:00:00+12:00
series: xenial
hardware: arch=amd64
Now machine-status is started, not running. Which is the source of the error message.
Not sure why that's happening - maybe status and instance status are being stored with the same key in a map and it's coming out the right way the first time by chance?
Still chasing, anyway.