Model migration fails on large model

Bug #1680392 reported by Sandor Zeestraten
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Tim Penhey

Bug Description

# Versions
Juju 2.1.2
MAAS 2.1.3

# Issue
When trying to migrate a larger model for an OpenStack deployment, it fails after a few seconds.

# Error message from machine-0.log on the source controller we are migrating from
2017-04-06 09:37:06 ERROR juju.worker.migrationmaster.21803f worker.go:284 model data transfer failed, model export failed: expected []string] for tags, got []interface {}

@maas:~$ juju list-controllers
Controller Model User Access Cloud/Region Models Machines HA Version
prodcont2* controller admin superuser prodmaas 2 1 none 2.1.2
prodcont3 controller admin superuser prodmaas 2 1 none 2.1.2

@maas:~$ juju list-models -c prodcont2
Controller: prodcont2
Model Cloud/Region Status Machines Cores Access Last connection
controller* prodmaas available 1 2 admin just now
default prodmaas available 0 - admin 22 hours ago
newtest prodmaas available 0 - admin 19 minutes ago
testos prodmaas available 40 362 admin 5 minutes ago

@maas:~$ juju list-models -c prodcont3
Controller: prodcont3
Model Cloud/Region Status Machines Cores Access Last connection
controller* prodmaas available 1 2 admin just now
default prodmaas available 0 - admin 13 hours ago
testmodel prodmaas available 2 2 admin 23 minutes ago

@maas:~$ juju migrate testos prodcont3 --debug --show-log
11:37:12 INFO juju.cmd supercommand.go:63 running juju [2.1.2 gc go1.6.2]
11:37:12 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "--show-log", "migrate", "testos", "prodcont3", "--debug"}
11:37:12 INFO juju.juju api.go:72 connecting to API addresses: [10.xx.y.183:17070]
11:37:12 DEBUG juju.api apiclient.go:590 dialing "wss://10.xx.y.183:17070/api"
11:37:12 DEBUG juju.api apiclient.go:593 successfully dialed "wss://10.xx.y.183:17070/api"
11:37:12 INFO juju.api apiclient.go:521 connection established to "wss://10.xx.y.183:17070/api"
11:37:13 DEBUG juju.juju api.go:263 API hostnames unchanged - not resolving
11:37:18 INFO cmd cmd.go:129 Migration started with ID "a4590c27-ad20-4d49-85b9-49716421803f:5"
11:37:18 INFO cmd supercommand.go:465 command finished

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :
Revision history for this message
Tim Penhey (thumper) wrote :

Hmm... something is clearly wrong here.

Can I get you to do the following please?

  juju model-config -m controller logging-config=juju=debug

then do this:

grab the logs from the controller:

  juju debug-log --location -m controller

and in another shell, do this:

  JUJU_DEV_FEATURE_FLAGS=developer-mode juju dump-model -m testos

This should help us work out where it is failing.

Changed in juju:
status: New → Incomplete
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

It seems I cannot dump the model. Any ideas?

@maas:~/issue46$ JUJU_DEV_FEATURE_FLAGS=developer-mode juju dump-model testos --debug
12:22:15 INFO juju.cmd supercommand.go:63 running juju [2.1.2 gc go1.6.2]
12:22:15 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "dump-model", "testos", "--debug"}
12:22:15 INFO juju.juju api.go:72 connecting to API addresses: [10.xx.y.183:17070]
12:22:15 DEBUG juju.api apiclient.go:590 dialing "wss://10.xx.y.183:17070/api"
12:22:15 DEBUG juju.api apiclient.go:593 successfully dialed "wss://10.xx.y.183:17070/api"
12:22:15 INFO juju.api apiclient.go:521 connection established to "wss://10.xx.y.183:17070/api"
12:22:16 DEBUG juju.juju api.go:263 API hostnames unchanged - not resolving
12:22:17 DEBUG juju.api monitor.go:35 RPC connection died
12:22:17 ERROR cmd supercommand.go:458 failed to read status history collection: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit."
12:22:17 DEBUG cmd supercommand.go:459 (error details: [{failed to read status history collection: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit."}])

I'll see if I can produce the other logs from the controller.

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Managed to reproduce this in our dev environment.

Trying to migrate the same model ends up with the following in the machine-0.log on the source controller we're trying to migrate from:

2017-04-24 14:05:25 ERROR juju.worker.migrationmaster.9394f9 worker.go:284 model data transfer failed, model export failed: failed to read status history collection: Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.
2017-04-24 14:05:26 WARNING juju.worker.migrationmaster.9394f9 worker.go:592 failed to remove model from target controller, model not found (not found)

When running we run into the same issue as in #5 when trying to dump the model:

maas@ JUJU_DEV_FEATURE_FLAGS=developer-mode juju dump-model dev
ERROR failed to read status history collection: Executor error during find command: OperationFailed: Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit.

What is the next step?

Changed in juju:
status: Incomplete → New
Revision history for this message
Tim Penhey (thumper) wrote :

This has been fixed for 2.2-beta2

https://github.com/juju/juju/commit/fd20944902f03e60e8477dd6969a9d42d1c50701

The only way we'll be able to migrate the large models off a 2.1 controller is to manually go and add the index to the mongo db.

Changed in juju:
status: New → Fix Committed
milestone: none → 2.2-beta3
importance: Undecided → High
assignee: nobody → Tim Penhey (thumper)
Changed in juju:
milestone: 2.2-beta3 → 2.2-beta2
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

@thumper

1. Could you describe the workaround for a 2.1 controller?
2. How will this behave when upgrading a live model from 2.1.2 to 2.2? Will that solve the issue and allow for migration or does that also need manual fixing?

P.S. This model was just a POC OpenStack HA deployment with 3x everything. It might be worth testing with something with at least this size in the future.

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

I've tried to deploy the same model in 2.2-beta2 and can confirm that the original issue is still there so this should not be fix committed.

I managed to get the output from the controller as requested in #4 (see attachment). As juju dumps a lot I grepped the log (see beneath) and started just before I initialized the migration command.

  grep -i migration machine-0.log > controller-log.txt

Note that dump-model command requested in #4 does not work:
@maasdev:~$ JUJU_DEV_FEATURE_FLAGS=developer-mode juju dump-model default --debug
15:57:33 INFO juju.cmd supercommand.go:63 running juju [2.2-beta2 gc go1.6.2]
15:57:33 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "dump-model", "default", "--debug"}
15:57:33 INFO juju.juju api.go:72 connecting to API addresses: [10.xx.yy.251:17070]
15:57:33 DEBUG juju.api apiclient.go:678 dialing "wss://10.xx.yy.251:17070/api"
15:57:33 DEBUG juju.api apiclient.go:682 successfully dialed "wss://10.xx.yy.251:17070/api"
15:57:33 INFO juju.api apiclient.go:589 connection established to "wss://10.xx.yy.251:17070/api"
15:57:34 DEBUG juju.juju api.go:263 API hostnames unchanged - not resolving
15:57:34 DEBUG juju.api monitor.go:35 RPC connection died
15:57:34 ERROR cmd supercommand.go:458 expected []string] for tags, got []interface {}
15:57:34 DEBUG cmd supercommand.go:459 (error details: [{expected []string] for tags, got []interface {}}])

Changed in juju:
status: Fix Committed → Incomplete
Changed in juju:
status: Incomplete → New
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Any other logs or troubleshooting I can assist with to help this along?

Tim Penhey (thumper)
Changed in juju:
status: New → Triaged
Tim Penhey (thumper)
Changed in juju:
status: Triaged → In Progress
milestone: 2.2-beta2 → 2.2-rc1
Revision history for this message
Anastasia (anastasia-macmood) wrote :

PR against develop (2.2): https://github.com/juju/juju/pull/7342

Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Thanks for the fix! Will an upgrade from 2.1.2 to 2.2 resolve this issue on a live model?

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.