vault service not restarted to use new db after db migration
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Charm Guide |
New
|
Undecided
|
Unassigned | ||
vault-charm |
Triaged
|
Medium
|
Unassigned |
Bug Description
The issue arose after migrating vault's db from percona to innodb.
In old deployments percona db is used as the storage backend of vault. But new deployments use innodb. So there is a need to migrate old percona db to innodb as per [1].
The symptom of this issue is, after migration, vault service is still connecting to the old db (percona), although the config file has been changed to use the new db (innodb), because vault service is not restarted after config file change.
In some environments the old db is not stopped and keeps running (just in case a reversion is needed), so because of this issue, vault is running happily with old db and the user is not aware of it.
To reproduce this issue, firstly deploy one vault unit and one percona unit as per [2].
Then unseal vault.
Next deploy new innodb (3 units) and do db migration as per [1].
Finally the environment may look like [4], and vault 1.7/stable charm is used.
During the migration process it seems vault is never restarted.
After disconnecting vault and old db with "juju remove-relation vault:shared-db percona-
Then after adding a relation between vault and new db with "juju add-relation vault:shared-db vault-mysql-
From netstat output vault is still connecting to the old db.
In juju log it seems start_vault is triggered but somehow vault service is still not restarted [3].
The fix is easy, just need to restart vault after db migration and unseal it.
But this behaviour in vault doesn't look right and it looks like a bug.
[1] https:/
[2] https:/
[3] https:/
[4] https:/
start_vault(...) looks like this:
@when_not( "is-update- status- hook") 'started' ) opportunistic_ restart( )
@when('configured')
@when_not(
def start_vault():
# start or restart vault
vault.
@tenacity. retry(wait= tenacity. wait_exponentia l(multiplier= 1, max=10),
stop= tenacity. stop_after_ attempt( 10),
retry= tenacity. retry_if_ result( lambda b: not b)) vault_running( ): running( 'vault' )
def _check_
return service_
if _check_ vault_running( ):
set_flag( 'started' )
clear_ flag('failed. to.start' ) 'totally- unsecure- auto-unlock' ):
vault. prepare_ vault()
set_flag( 'failed. to.start' )
if config(
else:
It's probable that opportunistic_ restart( ) is trying very hard not to restart the vault unit as that will seal it. This is because opportunistic_ restart( ) -> can_restart() won't return True if the vault unit is unsealed, which it probably is if the vault unit is running.
A solution is to pause the unit (using the action), then un-pause it (again using the resume action) and then unseal the unit. This will cause the unit to use the new db configured.
As to solving this, the charm should probably stop the service when the shared-db relation is broken, but obviously, removing shared_db causes the charm to fall back to raft as the consensus system. Hmm