MAAS "migrations" '0121_recompute_storage_size' fails 1.5.4 to 1.8.0
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| MAAS |
Critical
|
Gavin Panella | ||
| 1.8 |
Critical
|
Gavin Panella |
Bug Description
* ULTS 14, had MAAS 1.5.4.
* Running over a cluster of 12 hosts, with Juju 1.24.5 handling a fairly static park of a few dozen instances.
After problems attempting to hand over "metal" from MAAS 1.5.4 to Juju 1.24.5 attempted upgrade to MAAS 1.8.0 after checking Changelog in
http://
that indicate some significant configuration changes in 1.7.0 and some more internal ones in 1.8.0 but no known upgrade difficulties. Option was 1.7.6 which is part of ULTS 14 'trusty-updates' archive, but since also running Juju 1.24.5 from PPA "stable" decided to go for PPA "stable" for MAAS too, and then follow configuration upgrades described in changelog. Looked at:
http://
The outcome is that one of the Django "migrations" fails as also described in:
http://
* Restarting PostgreSQL 9.3 database server
...done.
Syncing...
Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)
Synced:
> django.contrib.auth
> django.
> django.
> django.
> django.
> django.
> piston
> south
Not synced (use migrations):
- maasserver
- metadataserver
(use ./manage.py migrate to migrate these)
Running migrations for maasserver:
- Migrating forwards to 0138_perf_
> maasserver:
Error in migration: maasserver:
Traceback (most recent call last):
......
File "/usr/lib/
return self.cursor.
django.
LINE 1: ..."."name", "metadataserver
Related branches
- Andres Rodriguez (community): Approve on 2015-10-06
- Blake Rouse (community): Approve on 2015-10-05
-
Diff: 14 lines (+4/-0)1 file modifiedsrc/maasserver/migrations/0121_recompute_storage_size.py (+4/-0)
- Mike Pontillo (community): Approve on 2015-10-06
-
Diff: 14 lines (+4/-0)1 file modifiedsrc/maasserver/migrations/0121_recompute_storage_size.py (+4/-0)
Changed in maas: | |
importance: | Undecided → Critical |
status: | New → Triaged |
Peter Grandi (pg-8) wrote : | #1 |
Mike Pontillo (mpontillo) wrote : | #2 |
It looks like that particular migration (0121_recompute
A possible workaround may be to upgrade from 1.5.x to 1.7.x (available in trusty-updates), and then upgrade to 1.8.x.
I can think of a few possible fixes:
- Migration order should be enforced where dependencies like this exist
- Migrations under 'maasserver' should not depend on 'metadataserver'
- Move logic for recomputing storage size out of migrations
Gavin Panella (allenap) wrote : | #3 |
South allows dependencies to be declared which then influence ordering:
https:/
This may be all we need here.
Mike Pontillo (mpontillo) wrote : | #4 |
Nice find! I bet that would do it.
Peter Grandi (pg-8) wrote : | #5 |
Yes I guess it is an ordering issue. Some new on my recovery investigation:
* I have installed from scratch on a test system 1.5.4 1.7.6 and 1.8.0 (no upgrades) and in each case the migrations all happened successfully as expected.
* The database structure at the point of failure seems pretty close to the 1.7.6 but yet to double check.
I am about to attach SQL dumps of the 1.5.4, 1.7.6 and 1.8.0 *empty* databases, and a schema only update of the local database as of the interrupted upgrade. Also the last steps of installation for 1.5.4 and 1.7.6.
Peter Grandi (pg-8) wrote : | #6 |
BTW more details on #2/#5, the two involved "migrations" are:
* 'metadataserver
* 'maasserver/
In all this the last migrations ('maaserver', 'metadataserver') for 1.5.4 are 0074 and 0014, for 1.7.6 are 0120 and 0015, and for 1.8.0 are 0138 and 0015.
I am somewhat perplexed by the log of "migrations" for a clean install of 1.8.0 which does not fail, even if it applies all the metadata server "migrations", including 0015, after all the other ones, including 0121. I guess that 1.8.0 comes with a database schema already updated and that the "migrations" are idempotent.
Peter Grandi (pg-8) wrote : | #7 |
Now interestingly the 'south_
I have checked Diango-South and their IRC channel and already-registered "migrations" are not reapplied, so "upgrading" to 1.7.6 ought to be safe, at least as database structure goes, by carefully removing the 1.8.0 packages and installing the 1.7.6 ones, if that's possible. I'll do a test in a VM/LXC beforehand :-).
Peter Grandi (pg-8) wrote : | #8 |
BTW the root cause of this issue is going to potentially result in more problems later, and it is not quite the lack of dependencies, even if that can work around it.
After reading about Django and South and looking at how MAAS is/was structured the root cause is that two different Django applications 'maasserver' and 'metadataserver' use the same database, and in particular share some of the tables in that database.
The migration scripts are then generated by South (or its replacement for Django 1.7 and later) in the correct sequence for each Django application, assuming that nothing else updates them, like another application that shares the same database and tables.
Peter Grandi (pg-8) wrote : | #9 |
As per previous comments I have tested removing the partially-installed 1.8.0 packages (carefully) and the installing 1.7.6 and that worked as expected, that is the 'metadataserver' app migration 0015 was applied, and this brought the set of applied migrations to 0120 for the 'maasserver' app and 0015 for the 'metadataserver' app, which are the right update levels for 1.7.6. For now I am staying on 1.7.6 as would rather "jump" again to the 1.9.x series when it is ready.
Changed in maas: | |
assignee: | nobody → Gavin Panella (allenap) |
milestone: | none → 1.9.0 |
status: | Triaged → In Progress |
Changed in maas: | |
status: | In Progress → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
BTW as described in an AskUbuntu related question I have checked the MAAS db directly and summarizing it with:
select parameters as "WoL" dhcplease as l macaddress as m macaddress_ networks as m2n
l.ip as "lease",
l.mac as "Ethernet",
w.name as "network",
n.hostname as "node",
n.storage as "storage",
n.power_
from
maasserver_
inner join maasserver_
on l.mac = m.mac_address
left join maasserver_
on m.id = m2n.macaddress_id
left join maasserver_network as w
on m2n.network_id = w.id
left join maasserver_node as n
on m.node_id = n.id
order by
n.hostname
(a query which may of general usefulness) the output looks consistent and complete, so the upgrade does not seem to have damaged the database. However the storage size for the 12 nodes reported is different for some, when they are all identical nodes. Some report 916913 and most report 1408. I can't easily related either to the size of the '/' filetree (800G) or the total storage (around 24TB).