Vault service restarts may be required after authorize-charm action for non-leader units

Bug #1923067 reported by Paul Goins
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
vault-charm
Confirmed
Undecided
Unassigned

Bug Description

Hello,

I've been trying to set up a vault cluster, and have had troubles getting it up and running per documentation. I've run through things several times but was not able to get things to work without taking extra steps.

Specifically: the documented steps seem to work perfectly up until I get to the authorize-charm step. However, after I run the authorize-charm action, only the leader goes to a green state. The remaining two units hit an error and report the message: 'hook failed: "leader-settings-changed"'

The traceback encountered by the errored units looks like this:

2021-04-08 15:18:42 INFO juju-log Invoking reactive handler: hooks/relations/tls-certificates/provides.py:63:broken:certificates
2021-04-08 15:18:42 WARNING leader-settings-changed Traceback (most recent call last):
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/charm/hooks/leader-settings-changed", line 22, in <module>
2021-04-08 15:18:42 WARNING leader-settings-changed main()
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 84, in main
2021-04-08 15:18:42 WARNING leader-settings-changed hookenv._run_atexit()
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/charmhelpers/core/hookenv.py", line 1354, in _run_atexit
2021-04-08 15:18:42 WARNING leader-settings-changed callback(*args, **kwargs)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/charm/reactive/vault_handlers.py", line 759, in _assess_status
2021-04-08 15:18:42 WARNING leader-settings-changed if not client_approle_authorized():
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/charm/reactive/vault_handlers.py", line 789, in client_approle_authorized
2021-04-08 15:18:42 WARNING leader-settings-changed vault.get_local_client()
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/tenacity/__init__.py", line 333, in wrapped_f
2021-04-08 15:18:42 WARNING leader-settings-changed return self(f, *args, **kw)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/tenacity/__init__.py", line 423, in __call__
2021-04-08 15:18:42 WARNING leader-settings-changed do = self.iter(retry_state=retry_state)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/tenacity/__init__.py", line 360, in iter
2021-04-08 15:18:42 WARNING leader-settings-changed return fut.result()
2021-04-08 15:18:42 WARNING leader-settings-changed File "/usr/lib/python3.6/concurrent/futures/_base.py", line 425, in result
2021-04-08 15:18:42 WARNING leader-settings-changed return self.__get_result()
2021-04-08 15:18:42 WARNING leader-settings-changed File "/usr/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2021-04-08 15:18:42 WARNING leader-settings-changed raise self._exception
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/tenacity/__init__.py", line 426, in __call__
2021-04-08 15:18:42 WARNING leader-settings-changed result = fn(*args, **kwargs)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/charm/lib/charm/vault.py", line 254, in get_local_client
2021-04-08 15:18:42 WARNING leader-settings-changed client.auth_approle(app_role_id)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/hvac/v1/__init__.py", line 2072, in auth_approle
2021-04-08 15:18:42 WARNING leader-settings-changed return self.auth('/v1/auth/{0}/login'.format(mount_point), json=params, use_token=use_token)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/hvac/v1/__init__.py", line 1729, in auth
2021-04-08 15:18:42 WARNING leader-settings-changed **kwargs
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 159, in auth
2021-04-08 15:18:42 WARNING leader-settings-changed response = self.post(url, **kwargs).json()
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 103, in post
2021-04-08 15:18:42 WARNING leader-settings-changed return self.request('post', url, **kwargs)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 233, in request
2021-04-08 15:18:42 WARNING leader-settings-changed utils.raise_for_error(response.status_code, text, errors=errors)
2021-04-08 15:18:42 WARNING leader-settings-changed File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.6/site-packages/hvac/utils.py", line 29, in raise_for_error
2021-04-08 15:18:42 WARNING leader-settings-changed raise exceptions.InvalidRequest(message, errors=errors)
2021-04-08 15:18:42 WARNING leader-settings-changed hvac.exceptions.InvalidRequest: missing client token
2021-04-08 15:18:42 ERROR juju.worker.uniter.operation runhook.go:136 hook "leader-settings-changed" (via explicit, bespoke hook script) failed: exit status 1

I did find a workaround: if I restart vault on the 2 errored units, then unseal vault again, I'm then able to "juju resolved" the errored units and everything will go green.

This was tested with cs:vault-44. I'm also attaching the Juju bundle used to deploy the environment.

Revision history for this message
Paul Goins (vultaire) wrote :
Revision history for this message
Trent Lloyd (lathiat) wrote :

I hit this today but vault/0 only errored, vault/1 was ok and vault/2(leader) also OK. Restarted vault on vault/0, unsealed it, then it was fine.

Revision history for this message
Ian Marsh (drulgaard) wrote :

Might be completely unrelated, but happens at the same point and has the same resolution, so...

juju 3.2-beta1.1-5069b69, vault charm 1.8/stable rev 100

Immediately after...
  juju run vault/leader authorize-charm token=$TOKEN
... I get:
Unit Workload Agent Machine Public address Ports Message
vault/0 blocked idle 0/lxd/16 [REDACTED] 8200/tcp Vault cannot authorize approle
vault/1* blocked idle 1/lxd/16 [REDACTED] 8200/tcp Missing CA cert
vault/2 blocked idle 2/lxd/16 [REDACTED] 8200/tcp Vault cannot authorize approle

So I...
  juju run vault/{0,2} restart
... and then unseal those two instances, and I get:
Unit Workload Agent Machine Public address Ports Message
vault/0 active idle 0/lxd/16 [REDACTED] 8200/tcp Unit is ready (active: true, mlock: disabled)
vault/1* blocked idle 1/lxd/16 [REDACTED] 8200/tcp Missing CA cert
vault/2 active idle 2/lxd/16 [REDACTED] 8200/tcp Unit is ready (active: true, mlock: disabled)

... and then I can continue with generating CA cert.

Revision history for this message
Adam Dyess (addyess) wrote :

I've reproduced this charm with the following:

Deploy the attached bundle and wait for the vault units to be blocked, needing to be initialized:
    Vault needs to be initialized

Steps:
1) initialize vault on the leader unit

juju exec -u vault/leader -- VAULT_ADDR=http://localhost:8200 /snap/bin/vault operator init -key-shares=5 -key-threshold=3

2) ***THIS IS IMPORTANT***: unseal ALL the units

for key in ...; do
    juju exec -a vault -- VAULT_ADDR=http://localhost:8200 /snap/bin/vault operator unseal $key
done

3) Create a token for charm authorization

juju exec -u vault/leader -- VAULT_ADDR=http://localhost:8200 VAULT_TOKEN=<root-token> /snap/bin/vault token create --ttl=10m

4) Authorize the leader charm

juju run vault/leader authorize-charm token=<created token>

Profit:

vault/0* active idle 6 10.246.154.101 8200/tcp Unit is ready (active: true, mlock: disabled)
  vault-mysql-router/0* active idle 10.246.154.101 Unit is ready
vault/1 error idle 7 10.246.154.6 8200/tcp hook failed: "leader-settings-changed"
  vault-mysql-router/1 active idle 10.246.154.6 Unit is ready
vault/2 error idle 8 10.246.154.185 8200/tcp hook failed: "leader-settings-changed"
  vault-mysql-router/2 active idle 10.246.154.185 Unit is ready

---------------------------------------------

If I change the init process:

Steps:
1) initialize vault on the leader unit

juju exec -u vault/leader -- VAULT_ADDR=http://localhost:8200 /snap/bin/vault operator init -key-shares=5 -key-threshold=3

2) ***THIS IS IMPORTANT***: unseal ONLY the leader

for key in ...; do
    juju exec -u vault/leader -- VAULT_ADDR=http://localhost:8200 /snap/bin/vault operator unseal $key
done

3) Create a token for charm authorization

juju exec -u vault/leader -- VAULT_ADDR=http://localhost:8200 VAULT_TOKEN=<root-token> /snap/bin/vault token create --ttl=10m

4) Authorize the leader charm

juju run vault/leader authorize-charm token=<created token>

5) juju-wait for vault to be stable, with standby units blocked "sealed"
6) Unseal the other units

for key in ...; do
  juju exec -a vault -- VAULT_ADDR=http://localhost:8200 /snap/bin/vault operator unseal $key
done

Changed in vault-charm:
status: New → Confirmed
Revision history for this message
Adam Dyess (addyess) wrote :

This bug ONLY appears when not using the etcd relation:

if I have an active etcd available FIRST, this doesn't occur. I suppose it has something to do with the raft database not being in sync

by having an ACTIVE etcd, this relation:
- vault:etcd etcd:db

The above error doesn't occur.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.