MAAS "migrations" '0121_recompute_storage_size' fails 1.5.4 to 1.8.0

Bug #1495064 reported by Peter Grandi on 2015-09-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Critical
Gavin Panella
1.8
Critical
Gavin Panella

Bug Description

* ULTS 14, had MAAS 1.5.4.
* Running over a cluster of 12 hosts, with Juju 1.24.5 handling a fairly static park of a few dozen instances.

After problems attempting to hand over "metal" from MAAS 1.5.4 to Juju 1.24.5 attempted upgrade to MAAS 1.8.0 after checking Changelog in

   http://maas.ubuntu.com/docs1.8/changelog.html#id2

that indicate some significant configuration changes in 1.7.0 and some more internal ones in 1.8.0 but no known upgrade difficulties. Option was 1.7.6 which is part of ULTS 14 'trusty-updates' archive, but since also running Juju 1.24.5 from PPA "stable" decided to go for PPA "stable" for MAAS too, and then follow configuration upgrades described in changelog. Looked at:

http://askubuntu.com/questions/581987/upgrading-maas-from-1-5-4-to-1-7-0-or-latest-on-ubuntu-14-04

The outcome is that one of the Django "migrations" fails as also described in:

http://askubuntu.com/questions/665170/python-error-during-maas-upgrade

* Restarting PostgreSQL 9.3 database server
   ...done.
Syncing...
Creating tables ...
Installing custom SQL ...
Installing indexes ...
Installed 0 object(s) from 0 fixture(s)

Synced:
 > django.contrib.auth
 > django.contrib.contenttypes
 > django.contrib.sessions
 > django.contrib.sites
 > django.contrib.messages
 > django.contrib.staticfiles
 > piston
 > south

Not synced (use migrations):
 - maasserver
 - metadataserver
(use ./manage.py migrate to migrate these)
Running migrations for maasserver:
 - Migrating forwards to 0138_perf_index_on_node_events.
 > maasserver:0121_recompute_storage_size
Error in migration: maasserver:0121_recompute_storage_size
Traceback (most recent call last):
......
  File "/usr/lib/python2.7/dist-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "metadataserver_noderesult" does not exist
LINE 1: ..."."name", "metadataserver_noderesult"."data" FROM "metadatas...

Related branches

Gavin Panella (allenap) on 2015-09-14
Changed in maas:
importance: Undecided → Critical
status: New → Triaged
Peter Grandi (pg-8) wrote :

BTW as described in an AskUbuntu related question I have checked the MAAS db directly and summarizing it with:

select
  l.ip as "lease",
  l.mac as "Ethernet",
  w.name as "network",
  n.hostname as "node",
  n.storage as "storage",
  n.power_parameters as "WoL"
from
  maasserver_dhcplease as l
    inner join maasserver_macaddress as m
    on l.mac = m.mac_address
      left join maasserver_macaddress_networks as m2n
      on m.id = m2n.macaddress_id
        left join maasserver_network as w
        on m2n.network_id = w.id
      left join maasserver_node as n
      on m.node_id = n.id
order by
  n.hostname

(a query which may of general usefulness) the output looks consistent and complete, so the upgrade does not seem to have damaged the database. However the storage size for the 12 nodes reported is different for some, when they are all identical nodes. Some report 916913 and most report 1408. I can't easily related either to the size of the '/' filetree (800G) or the total storage (around 24TB).

Mike Pontillo (mpontillo) wrote :

It looks like that particular migration (0121_recompute_storage_size) assumes that this table under 'metadataserver' has already been migrated. However, when going directly to 1.8 from 1.5, this isn't the case.

A possible workaround may be to upgrade from 1.5.x to 1.7.x (available in trusty-updates), and then upgrade to 1.8.x.

I can think of a few possible fixes:

 - Migration order should be enforced where dependencies like this exist
 - Migrations under 'maasserver' should not depend on 'metadataserver'
 - Move logic for recomputing storage size out of migrations

Gavin Panella (allenap) wrote :

South allows dependencies to be declared which then influence ordering:
  https://south.readthedocs.org/en/latest/dependencies.html
This may be all we need here.

Mike Pontillo (mpontillo) wrote :

Nice find! I bet that would do it.

Peter Grandi (pg-8) wrote :

Yes I guess it is an ordering issue. Some new on my recovery investigation:

* I have installed from scratch on a test system 1.5.4 1.7.6 and 1.8.0 (no upgrades) and in each case the migrations all happened successfully as expected.
* The database structure at the point of failure seems pretty close to the 1.7.6 but yet to double check.

I am about to attach SQL dumps of the 1.5.4, 1.7.6 and 1.8.0 *empty* databases, and a schema only update of the local database as of the interrupted upgrade. Also the last steps of installation for 1.5.4 and 1.7.6.

Peter Grandi (pg-8) wrote :

BTW more details on #2/#5, the two involved "migrations" are:

* 'metadataserver/migrations/0015_rename_nodecommissionresult_add_result_type.py' that renames 'metadataserver_nodecommissionresult' to 'metadataserver_noderesult' (and adds a field). This "migration" is the latest metadataserver one in 1.7.6 and 1.8.0.

* 'maasserver/migrations/0121_recompute_storage_size.py' that relies on the renaming done by '0015_rename_nodecommissionresult_add_result_type' but fails since all 'maasserver' "migrations" are done before all 'metadataserver' ones and '0015_rename_nodecommissionresult_add_result_type' is not in 1.5.4.

In all this the last migrations ('maaserver', 'metadataserver') for 1.5.4 are 0074 and 0014, for 1.7.6 are 0120 and 0015, and for 1.8.0 are 0138 and 0015.

I am somewhat perplexed by the log of "migrations" for a clean install of 1.8.0 which does not fail, even if it applies all the metadata server "migrations", including 0015, after all the other ones, including 0121. I guess that 1.8.0 comes with a database schema already updated and that the "migrations" are idempotent.

Peter Grandi (pg-8) wrote :

Now interestingly the 'south_migrationhistory' reports that migrations have been applied up to 0120 for 'maasserver' and up to '0014' for 'metadataserver'. Thus only 'metadataserver' 0015 is missing to be at the level needed for 1.7.6.

I have checked Diango-South and their IRC channel and already-registered "migrations" are not reapplied, so "upgrading" to 1.7.6 ought to be safe, at least as database structure goes, by carefully removing the 1.8.0 packages and installing the 1.7.6 ones, if that's possible. I'll do a test in a VM/LXC beforehand :-).

Peter Grandi (pg-8) wrote :

BTW the root cause of this issue is going to potentially result in more problems later, and it is not quite the lack of dependencies, even if that can work around it.

After reading about Django and South and looking at how MAAS is/was structured the root cause is that two different Django applications 'maasserver' and 'metadataserver' use the same database, and in particular share some of the tables in that database.

The migration scripts are then generated by South (or its replacement for Django 1.7 and later) in the correct sequence for each Django application, assuming that nothing else updates them, like another application that shares the same database and tables.

Peter Grandi (pg-8) wrote :

As per previous comments I have tested removing the partially-installed 1.8.0 packages (carefully) and the installing 1.7.6 and that worked as expected, that is the 'metadataserver' app migration 0015 was applied, and this brought the set of applied migrations to 0120 for the 'maasserver' app and 0015 for the 'metadataserver' app, which are the right update levels for 1.7.6. For now I am staying on 1.7.6 as would rather "jump" again to the 1.9.x series when it is ready.

Gavin Panella (allenap) on 2015-10-05
Changed in maas:
assignee: nobody → Gavin Panella (allenap)
milestone: none → 1.9.0
status: Triaged → In Progress
Gavin Panella (allenap) on 2015-10-06
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers