Migrating model between controllers in conjure-up/novalxd openstack not complete after 3 hours stuck on transferring ownership of cloud resources to target controller

Bug #1677225 reported by Gareth Woolridge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Christian Muirhead

Bug Description

Migrating a simple model comprising of 1 instance with 3 charms deployed still running, apparently stuck after 3 hours with juju show-model showing status: transferring ownership of cloud resources to target controller

Juju status on the source controller shows:

gareth@mojo-nest-priv:~$ juju status
Model Controller Cloud/Region Version Notes
mojo-how-to mojo-test-migrate mystack/RegionOne 2.1.2.1 migrating: successful, timed out waiting for agents to report

App Version Status Scale Charm Store Rev OS Notes
apache2 unknown 0/1 apache2 local 0 ubuntu exposed
content-fetcher unknown 0/1 content-fetcher local 0 ubuntu
nrpe unknown 0/1 nrpe local 0 ubuntu

Unit Workload Agent Machine Public address Ports Message
apache2/0 unknown lost 0 10.101.0.75 80/tcp agent lost, see 'juju show-status-log apache2/0'
  content-fetcher/0 unknown lost 10.101.0.75 agent lost, see 'juju show-status-log content-fetcher/0'
  nrpe/0 unknown lost 10.101.0.75 agent lost, see 'juju show-status-log nrpe/0'

Machine State DNS Inst id Series AZ
0 down 10.101.0.75 291524ae-dcc6-4ddc-91d1-297743721799 xenial nova

Relation Provides Consumes Type
general-info apache2 content-fetcher subordinate
nrpe-external-master apache2 nrpe subordinate

gareth@mojo-nest-priv:~$ juju show-model mojo-how-to
mojo-how-to:
  name: mojo-how-to
  model-uuid: 5ec26299-d068-4b96-8f44-158da198d39e
  controller-uuid: 5431e276-6349-4c6a-88d8-e1ea67335bf8
  controller-name: mojo-test-migrate
  owner: admin
  cloud: mystack
  region: RegionOne
  type: openstack
  life: alive
  status:
    current: available
    since: 3 hours ago
    migration: transferring ownership of cloud resources to target controller
    migration-start: 3 hours ago
  users:
    admin:
      display-name: admin
      access: admin
      last-connection: 14 seconds ago
  machines:
    "0":
      cores: 1

If I juju switch to the "new" controller the model status looks good and juju status etc work:

gareth@mojo-nest-priv:~$ juju switch mojo-test
mojo-test-migrate:admin/mojo-how-to -> mojo-test (controller)
gareth@mojo-nest-priv:~$ juju switch mojo-how-to
mojo-test (controller) -> mojo-test:admin/mojo-how-to
gareth@mojo-nest-priv:~$ juju status
Model Controller Cloud/Region Version
mojo-how-to mojo-test mystack/RegionOne 2.1.2.1

App Version Status Scale Charm Store Rev OS Notes
apache2 waiting 1 apache2 local 0 ubuntu exposed
content-fetcher waiting 1 content-fetcher local 0 ubuntu
nrpe waiting 1 nrpe local 0 ubuntu

Unit Workload Agent Machine Public address Ports Message
apache2/0* unknown idle 0 10.101.0.75 80/tcp
  content-fetcher/0* unknown idle 10.101.0.75
  nrpe/0* unknown idle 10.101.0.75

Machine State DNS Inst id Series AZ
0 started 10.101.0.75 291524ae-dcc6-4ddc-91d1-297743721799 xenial nova

Relation Provides Consumes Type
general-info apache2 content-fetcher subordinate
nrpe-external-master apache2 nrpe subordinate

gareth@mojo-nest-priv:~$ juju show-model mojo-how-to
mojo-how-to:
  name: mojo-how-to
  model-uuid: 5ec26299-d068-4b96-8f44-158da198d39e
  controller-uuid: 719cc4d6-2e37-449a-8998-ccd25fc856d8
  controller-name: mojo-test
  owner: admin
  cloud: mystack
  region: RegionOne
  type: openstack
  life: alive
  status:
    current: available
    since: 3 hours ago
  users:
    admin:
      display-name: admin
      access: admin
      last-connection: 4 seconds ago
  machines:
    "0":
      cores: 1

Checking machine-0 on the old controller this appears to be an issue with volumes not being available, which is understandable given a conjure-up/nova-lxd environment doesn't have block storage deployed.

2017-03-29 09:13:44 ERROR juju.worker.dependency engine.go:547 "migration-master" manifold worker returned unexpected error: volumes not supported (not supported)
2017-03-29 09:13:47 ERROR juju.worker.migrationmaster.98d39e worker.go:719 no agents reported in time
2017-03-29 09:13:47 ERROR juju.worker.migrationmaster.98d39e worker.go:284 successful, timed out waiting for agents to report

The above seemingly loops forever.

This shouldn't affect migration though?!?

Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.2-beta3
tags: added: eda model-migration
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

We're looking into this. Could you provide some of the controller logs please?

First turn up the log level for the controller:
juju model-config -m controller logging-config="<root>=DEBUG"

Then wait 30s for the migrationmaster to restart a few times and grab the relevant logs:
juju debug-log -m controller --replay | grep migrat > migration.log

Please compress and attach the log here.

Revision history for this message
Christian Muirhead (2-xtian) wrote :

Once a model's migrated we update the metadata on the instances and other resources (it varies by provider) to indicate that they're now owned by the new controller. This is needed because when we're destroying a controller we clean up the resources it owns - if we leave them owned by the old controller then they'll be destroyed when it is.

It sounds like that's what's failing here - the code to update the "which controller owns this" tag doesn't handle not being able to get volumes.

Having the logs will let us confirm that.

Changed in juju:
status: Triaged → Incomplete
milestone: 2.2-beta3 → none
Revision history for this message
Gareth Woolridge (moon127) wrote :

Attached migration.log.xz, grabbed with the above command after enabling DEBUG.

Changed in juju:
status: Incomplete → New
Changed in juju:
status: New → In Progress
assignee: nobody → Christian Muirhead (2-xtian)
Revision history for this message
Christian Muirhead (2-xtian) wrote :
Changed in juju:
milestone: none → 2.2-beta3
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.