We had a failure in systemtests:
http://maas-integration-ci.internal:8080/job/maas-system-tests/2021/
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |Vault successfully configured for the region!
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |Once all regions in cluster are configured, use the following command to migrate secrets:
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |sudo maas config-vault migrate
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: └ ✔
2023-10-29 22:25:10 INFO systemtests.fixtures.maas_region: ┌ maas config-vault migrate
2023-10-29 22:25:15 INFO systemtests.fixtures.maas_region: |Send restart signal to active regions
2023-10-29 22:25:15 INFO systemtests.fixtures.maas_region: | - Signal sent. Waiting for 5 seconds...
2023-10-29 22:25:20 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:20 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 1/5)
2023-10-29 22:25:20 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
2023-10-29 22:25:25 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:25 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 2/5)
2023-10-29 22:25:25 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
2023-10-29 22:25:30 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:30 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 3/5)
2023-10-29 22:25:30 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
2023-10-29 22:25:35 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:35 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 4/5)
2023-10-29 22:25:35 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: |
2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: |Wait for active regions to restart (attempt 5/5)
2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: | - Regions are still active: maas-system-maas
2023-10-29 22:25:40 INFO systemtests.fixtures.maas_region: |ECommandError: Unable to migrate as one or more regions didn't restart when politely asked. Please shut down these regions before starting the migration process again.
2023-10-29 22:25:40 WARNING systemtests.fixtures.maas_region: └ ❌ Return code: 1
Looking at the logs, there are several things going on here. First there's this:
okt 29 22:25:10 maas-system-maas sh[12958]: maasserver. listener: [info] Listening for database notifications.
This means that MAAS was still starting up when the migrate call was made, which means that it might miss the sys_vault_migration notification.
It does seem that in this case, MAAS did get the notification, since further down in the logs we can see:
okt 29 22:25:15 maas-system-maas sh[12655]: maasserver. eventloop_ 12655.master: [info] Restarting region eventloop_ 12655.master: [info] Region restarted
...
okt 29 22:25:41 maas-system-maas sh[12655]: maasserver.
So the region got restarted, but it took almost 30 seconds.
I'm also not sure about the way it checks for regions that have been restarted/stopped. It checks for region controllers that don't have any processes. But what's preventing the region from starting processes as it restarts?