maas init fails; 'relation "maasserver_routable_pairs" does not exist'

Bug #1908552 reported by Jason Hobbs
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Alberto Donato
3.0
Fix Released
High
Alberto Donato

Bug Description

on 2.8.2, maas init takes 4 hours to return and then fails:

2020-12-17-01:45:50 root ERROR [root@10.244.40.32] Command failed: ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o LogLevel=ERROR ubuntu@10.244.40.32 -- sudo 'bash --login -c '"'"'sudo maas init region+rack --database-uri postgres://maas:lGu2mJ1eekOHXYH@10.244.40.34/maasdb --maas-url http://10.244.40.33:80/MAAS --force'"'"''
2020-12-17-01:45:50 root ERROR [root@10.244.40.32] STDOUT follows:
Starting services
Performing database migrations

2020-12-17-01:45:50 root ERROR [root@10.244.40.32] STDERR follows:
None

There is a SQL error here:
http://paste.ubuntu.com/p/y4X7YzQchh/

logs here:
https://oil-jenkins.canonical.com/artifacts/f7b80d89-8b41-4872-a115-d49173f53c4e/index.html

these files:
generated/generated/maas/logs-2020-12-17-01.46.26/10.244.40.30.tgz
generated/generated/maas/logs-2020-12-17-01.46.26/10.244.40.32.tgz
generated/generated/maas/logs-2020-12-17-01.46.26/10.244.40.31.tgz

it seems like migrations are failing, but I don't know where to find logs for that to see why.

Related branches

description: updated
Revision history for this message
Joshua Genet (genet022) wrote :

Hit this again in this testrun:
https://solutions.qa.canonical.com/testruns/testRun/2861231a-a7fb-4e7d-a6c3-62e7abe0f64c

This time using MAAS 2.9.2

Revision history for this message
Alexander Balderson (asbalderson) wrote :

Subbing to field high

so far we've hit this bug 11 times in our tests.
All the occurrences can be found at

https://solutions.qa.canonical.com/bugs/bugs/bug/1908552

Revision history for this message
Joshua Genet (genet022) wrote :

Subbing field crit.

We continue to hit this and hit it during our Kubernetes 1.21 release testing.

Revision history for this message
Joshua Genet (genet022) wrote :

Just hit this during Openstack release gate as well.

Revision history for this message
Alberto Donato (ack) wrote :

Out of curiosity, why the need for `--force` in the `maas init` call?

Changed in maas:
status: New → Incomplete
Revision history for this message
Alberto Donato (ack) wrote :

Also, is the database for MAAS reused across test runs?

Revision history for this message
Joshua Genet (genet022) wrote :

MAAS will complain if it has already been init'ed when we rerun the FCE MAAS setup step. So we use --force to effectively make it idempotent.

And no the database is not reused. We clean postgres after each test run.

Revision history for this message
Alberto Donato (ack) wrote :

Is the output of `maas init` available somewhere in logs?

Revision history for this message
Michael Skalka (mskalka) wrote :
Michael Skalka (mskalka)
Changed in maas:
status: Incomplete → New
Revision history for this message
Björn Tillenius (bjornt) wrote :

Ok, I think I see what's going on here. the .30 host successfully sets up MAAS and starts up. Then it fails, since maasserver_routable_pairs doesn't exist. Note that this is a view, and not a table.

What happens next is that the .31 runs 'maas init' and tries to run all the db migrations. The first thing it does is to drop all views. After that it should run all the migrations, and recreate the views, but it deadlocks somehow with the running .30 instance.

On the MAAS side, we could check to confirm whether any migrations need to be run, and don't do anything if the DB already has all the migration.

To work around it in FCE, you'd first have to stop MAAS after each 'maas init', and the restart them when everything is set up.

Alberto Donato (ack)
Changed in maas:
importance: Undecided → High
status: New → Triaged
milestone: none → 3.0.1
Revision history for this message
Björn Tillenius (bjornt) wrote :

It's quite tricky to detect that dbupgrade shouldn't do anything, since views and triggers aren't modeled as db migrations.

Modelling views and triggers as db migrations is tricky in itself, and I'm not even sure it's possible to do that with Django. But what we could do, is to manually create an empty db migration whenever we change a trigger or view. That way, dbupgrade could safely look whether all the migrations are in the db already, and exit without doing anything.

Since this will require a db migration, which is hard to include in minor releases, it would be worth looking into where the code is hanging, and try to do a graceful exit in that case.

Alberto Donato (ack)
Changed in maas:
milestone: 3.0.1 → 3.0.0-rc2
assignee: nobody → Alberto Donato (ack)
status: Triaged → In Progress
Alberto Donato (ack)
Changed in maas:
milestone: 3.0.0-rc2 → 3.0.1
Changed in maas:
status: In Progress → Fix Committed
Alberto Donato (ack)
Changed in maas:
milestone: 3.0.1 → next
Changed in maas:
status: Fix Committed → Fix Released
milestone: next → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.