Cannot get out of blocked state (Vault failed to start; check journalctl -u vault)

Bug #1871539 reported by Trent Lloyd
26
This bug affects 5 people
Affects Status Importance Assigned to Milestone
vault-charm
Confirmed
Undecided
Jorge Niedbalski

Bug Description

If you reboot vault and MySQL is not yet ready vault enters a blocked state (as per Bug #1818973 vault fails to start when MySQL backend down).

However in my scenario it is impossible to get the charm out of this blocked state. This environment is a single vault unit (no HA) with totally-unsecure-auto-unlock

I tried the following things

 (1) systemctl start vault # works
 (2) juju run --unit vault/1 ./hooks/update-status
 (3) juju run-action vault/leader resume
 (4) juju run-action vault/leader pause; and then resume again;
 (4) manually unseal the vault using "VAULT_ADDR=http://127.0.0.1:8200 vault operator unseal xxxx"
 (5) retrying 2-3 after manually unsealing the vault
 (6) rebooting the node, when vault starts successfully and then trying 2/3/4/5 again

In all cases the charm never escapes "blocked (Vault failed to start; check journalctl -u vault)" even though vault is in fact started and even unsealed.

The debug log shows the following flags simultaneously set: started, failed.to.start, configured

If we look at the logic around these flags. The only function that clears failed.to.start is start_vault. It only runs @when('configured') @when_not('started').

However several functions set failed.to.start without clearing started. Such as publish_ca_info, tune_pki_backend_config_changed.

Additionally the 'resume' action just invokes charmhelpers resume_unit and doesn't clear the failed.to.start flags or any others.

This leads to there being no charm scenario that can get out of this situation. Perhaps _assess_status (which checks for failed.to.start and sets blocked) can also check if vault is actually started, and if so set the started and clear failed.to.start or if its stopped, clear started and set failed.to.start?

Tags: seg
Trent Lloyd (lathiat)
tags: added: seg
Revision history for this message
Billy Olsen (billy-olsen) wrote :

As a work around, the blocked state of the charm is only shown when the failed.to.start flag is set. This can be cleared with:

juju run --unit vault/<unit_num> -- charms.reactive clear_flag failed.to.start

Additionally, if the charm needs to restart the vault service (i.e. you haven't started it manually), you can clear the started flag as well:

juju run --unit vault<unit_num> -- charms.reactive clear_flag started

If the vault service is already started, it will restart the service. All necessary unsealing will be required for the unit.

Changed in vault-charm:
status: New → Confirmed
assignee: nobody → Jorge Niedbalski (niedbalski)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.