maas config-vault migrate failed due to region not restarting

Bug #2041854 reported by Björn Tillenius
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Björn Tillenius

Bug Description

We had a failure in systemtests:

  http://maas-integration-ci.internal:8080/job/maas-system-tests/2021/

 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |Vault successfully configured for the region!
 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |Once all regions in cluster are configured, use the following command to migrate secrets:
 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |sudo maas config-vault migrate
 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: └ ✔
 2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: ┌ maas config-vault migrate
 2023-10-29 22:25:15 INFO systemtests.fixtures.maas_region: |Send restart signal to active regions
 2023-10-29 22:25:15 INFO systemtests.fixtures.maas_region: | - Signal sent. Waiting for 5 seconds...
 2023-10-29 22:25:20 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:20 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 1/5)
 2023-10-29 22:25:20 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
 2023-10-29 22:25:25 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:25 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 2/5)
 2023-10-29 22:25:25 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
 2023-10-29 22:25:30 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:30 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 3/5)
 2023-10-29 22:25:30 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
 2023-10-29 22:25:35 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:35 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 4/5)
 2023-10-29 22:25:35 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
 2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: |
 2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 5/5)
 2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
 2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: |ECommandError: Unable to migrate as one or more regions didn't restart when politely asked. Please shut down these regions before starting the migration process again.
 2023-10-29 22:25:40 WARNING systemtests.fixtures.maas_region: └ ❌ Return code: 1

Tags: systemtests

Related branches

Changed in maas:
status: New → Triaged
importance: Undecided → High
milestone: none → 3.5.0
tags: added: systemtests
Revision history for this message
Björn Tillenius (bjornt) wrote :

Looking at the logs, there are several things going on here. First there's this:

  okt 29 22:25:10 maas-system-maas sh[12958]: maasserver.listener: [info] Listening for database notifications.

This means that MAAS was still starting up when the migrate call was made, which means that it might miss the sys_vault_migration notification.

It does seem that in this case, MAAS did get the notification, since further down in the logs we can see:

  okt 29 22:25:15 maas-system-maas sh[12655]: maasserver.eventloop_12655.master: [info] Restarting region
  ...
  okt 29 22:25:41 maas-system-maas sh[12655]: maasserver.eventloop_12655.master: [info] Region restarted

So the region got restarted, but it took almost 30 seconds.

I'm also not sure about the way it checks for regions that have been restarted/stopped. It checks for region controllers that don't have any processes. But what's preventing the region from starting processes as it restarts?

Changed in maas:
status: Triaged → In Progress
assignee: nobody → Björn Tillenius (bjornt)
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
milestone: 3.5.0 → 3.5.0-beta1
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.