One barbican-vault unit is failing with "wrapping token is not valid or does not exist"

Bug #1886424 reported by Alexander Litvinov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Barbican-Vault Charm
Invalid
High
Unassigned

Bug Description

The issue is very similar to bug below
https://bugs.launchpad.net/charm-barbican-vault/+bug/1871981

However I have the patch from the bug present
https://opendev.org/openstack/charm-barbican-vault/commit/f6546dda33636bca1a94cbce7736b350f64ab74a)

And hvac version is hvac-0.10.1

In my case 2 units are in ready state and 3rd unit is stuck with
hook failed: "secrets-storage-relation-joined".
and the following log.
After adding 4th and 5th unit - both are also stuck with the same wrapping token issue.

2020-07-06 12:18:51 INFO juju-log secrets-storage:57: Retrieving secret-id from vault (http://VAULTVIP:8200)
2020-07-06 12:18:51 ERROR juju-log secrets-storage:57: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-barbican-vault-6/charm/reactive/barbican_vault_handlers.py", line 94, in plugin_info_barbican_publish
    secret_id = get_secret_id(secrets_storage, current_secret_id)
  File "/var/lib/juju/agents/unit-barbican-vault-6/charm/reactive/barbican_vault_handlers.py", line 59, in get_secret_id
    secret_id = vault_utils.retrieve_secret_id(url, token)
  File "lib/charm/vault_utils.py", line 32, in retrieve_secret_id
    response = client._post('/v1/sys/wrapping/unwrap')
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/hvac/utils.py", line 174, in new_func
    return method(*args, **kwargs)
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/hvac/v1/__init__.py", line 2579, in _post
    return self._adapter.post(*args, **kwargs)
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 107, in post
    return self.request('post', url, **kwargs)
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/hvac/adapters.py", line 304, in request
    utils.raise_for_error(response.status_code, text, errors=errors)
  File "/var/lib/juju/agents/unit-barbican-vault-6/.venv/lib/python3.6/site-packages/hvac/utils.py", line 32, in raise_for_error
    raise exceptions.InvalidRequest(message, errors=errors)
hvac.exceptions.InvalidRequest: wrapping token is not valid or does not exist

Bionic stein.
Charm barbican-vault rev 15.
juju 2.7.6.

Revision history for this message
Alexander Litvinov (alitvinov) wrote :

As suggested in the previous bug, running the refresh action below didn't help barbican-vault units..

$ juju run-action --wait vault/1 refresh-secrets
unit-vault-1:
  UnitId: vault/1
  status: completed

Even after juju resolved barbican-vault/5 and executing, units end up in error state.

Changed in charm-barbican-vault:
status: New → Triaged
importance: Undecided → High
description: updated
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :
Download full text (7.9 KiB)

A PDB trace of the request done at the failing unit (the request looks OK from the perspective of this doc https://www.vaultproject.io/api-docs/system/wrapping-unwrap#parameters but the token is likely the issue):

root@juju-b68657-23-lxd-10:/var/lib/juju/agents/unit-barbican-vault-5/charm# ./hooks/secrets-storage-relation-joined
lib/charm/vault_utils.py:32: DeprecationWarning: Call to deprecated function '_post'. This method will be removed in version '0.8.0' Please use the 'post' method on the 'hvac.adapters' class moving forward.
  response = client._post('/v1/sys/wrapping/unwrap')
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(109)post()
-> return self.request('post', url, **kwargs)
(Pdb) url
'/v1/sys/wrapping/unwrap'
(Pdb) kwargs
{}

(Pdb) self.request
<bound method RawAdapter.request of <hvac.adapters.RawAdapter object at 0x7f79416a3c88>>
(Pdb) s
--Call--
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(248)request()
-> def request(self, method, url, headers=None, raise_exception=True, **kwargs):
(Pdb) n
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(266)request()
-> while '//' in url:
(Pdb) l
261 :param kwargs: Additional keyword arguments to include in the requests call.
262 :type kwargs: dict
263 :return: The response of the request.
264 :rtype: requests.Response
265 """
266 -> while '//' in url:
267 # Vault CLI treats a double forward slash ('//') as a single forward slash for a given path.
268 # To avoid issues with the requests module's redirection logic, we perform the same translation here.
269 url = url.replace('//', '/')
270
271 url = self.urljoin(self.base_uri, url)
(Pdb) url
'/v1/sys/wrapping/unwrap'
(Pdb) method
'post'
(Pdb) headers
(Pdb) kwargs
{}

(Pdb) n
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(271)request()
-> url = self.urljoin(self.base_uri, url)
(Pdb) n
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(273)request()
-> if not headers:
(Pdb) url
'http://vault.<redacted-hostname>:8200/v1/sys/wrapping/unwrap'

-> if self.token:
(Pdb) l
271 url = self.urljoin(self.base_uri, url)
272
273 if not headers:
274 headers = {}
275
276 -> if self.token:
277 headers['X-Vault-Token'] = self.token
278
279 if self.namespace:
280 headers['X-Vault-Namespace'] = self.namespace
281
(Pdb) self.token
's.QYHb0vo4v2a6RpnwaOLNlvps'

(Pdb) n
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(277)request()
-> headers['X-Vault-Token'] = self.token
(Pdb) n
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(279)request()
-> if self.namespace:
(Pdb) n
> /var/lib/juju/agents/unit-barbican-vault-5/.venv/lib/python3.6/site-packages/hvac/adapters.py(282)request()
-> wrap_ttl = kwargs.pop('wrap_ttl', No...

Read more...

Revision history for this message
Alexander Litvinov (alitvinov) wrote :
Revision history for this message
Alexander Litvinov (alitvinov) wrote :
Revision history for this message
Alexander Litvinov (alitvinov) wrote :

Subscribing ~field-critical as deployment is blocked and workaround with refresh-secrets didn't help

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

The response from Vault we see happens right after the wrapping token validation (if it fails):
https://github.com/hashicorp/vault/blob/v1.1.1/http/handler.go#L297-L303
 valid, err := core.ValidateWrappingToken(ctx, req)
 if err != nil {
  return errwrap.Wrapf("error validating wrapping token: {{err}}", err)
 }
 if !valid {
  return fmt.Errorf("wrapping token is not valid or does not exist")
 }

It is possible to set TTL for wrapping tokens:
https://www.vaultproject.io/docs/concepts/response-wrapping#response-wrapping-token-creation

The TTL for one-time tokens seems to be 1h as configured in charm-vault during the secret creation:

https://opendev.org/openstack/charm-vault/src/branch/stable/20.05/src/reactive/vault_handlers.py#L491-L499
        if new_role or refresh_secrets:
            wrapped_secret = vault.generate_role_secret_id(
                client,
                name=approle_name,
                cidr=cidr
            )
            secrets.set_role_id(unit=unit,
                                role_id=approle_id,
                                token=wrapped_secret)

Note 1h here:
https://opendev.org/openstack/charm-vault/src/branch/stable/20.05/src/lib/charm/vault.py#L395-L408
def generate_role_secret_id(client, name, cidr):
# ...
    response = client.write('auth/approle/role/{}/secret-id'.format(name),
                            wrap_ttl='1h', cidr_list=cidr)
    return response['wrap_info']['token']

https://hvac.readthedocs.io/en/stable/source/hvac_v1.html#hvac.v1.Client.create_role_secret_id

The wrapping token is then passed through the vault-kv interface to barbican-vault:

https://github.com/openstack-charmers/charm-interface-vault-kv/blob/5e71b61c1ddb6ecaade2b6675a5d8cf26655d7b0/requires.py#L75-L88
    @property
    def all_unit_tokens(self):
        """Retrieve the one-shot token(s) for secret_id retrieval for
        all application units or empty list.
        :returns token: Vault one-shot token for secret_id response
        :rtype token: str"""
        token_key = '{}_token'.format(hookenv.local_unit())
        tokens = set()
        for relation in self.relations:
            for unit in relation.units:
                token = unit.received.get(token_key)
                if token:
                    tokens.add(token)

https://github.com/openstack-charmers/charm-interface-vault-kv/blob/5e71b61c1ddb6ecaade2b6675a5d8cf26655d7b0/provides.py#L67-L74
    def set_role_id(self, unit, role_id, token):
        """ Set the AppRole ID and token for out-of-band Secret ID retrieval
        for a specific remote unit """
        # for cmr we will need to the other end to provide their unit name
        # expicitly.
        unit_name = unit.received.get('unit_name') or unit.unit_name
        unit.relation.to_publish['{}_role_id'.format(unit_name)] = role_id
        unit.relation.to_publish['{}_token'.format(unit_name)] = token

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

HA looks OK:

* 1 unit is active;
* 2 units are standby;
* 1/2 standbys holds a VIP and redirects client requests to the active node. (https://www.vaultproject.io/docs/concepts/ha#server-to-server-communication)

As far as I can tell, the problem is somewhere else.

ubuntu@vault-3:~$ vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.1.1
Cluster Name vault-cluster-df1658d2
Cluster ID 5a141718-4814-507f-0e51-0cfc6ac02905
HA Enabled true
HA Cluster https://10.201.12.7:8201
HA Mode active

ubuntu@vault-2:~$ vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.1.1
Cluster Name vault-cluster-df1658d2
Cluster ID 5a141718-4814-507f-0e51-0cfc6ac02905
HA Enabled true
HA Cluster https://10.201.12.7:8201
HA Mode standby
Active Node Address http://10.201.12.7:8200

ubuntu@vault-1:~$ vault status
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 5
Threshold 3
Version 1.1.1
Cluster Name vault-cluster-df1658d2
Cluster ID 5a141718-4814-507f-0e51-0cfc6ac02905
HA Enabled true
HA Cluster https://10.201.12.7:8201
HA Mode standby
Active Node Address http://10.201.12.7:8200

ubuntu@vault-1:~$ ip -4 -o a s
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
2: eth0 inet 10.201.11.170/24 brd 10.201.11.255 scope global eth0\ valid_lft forever preferred_lft forever
3: eth1 inet 10.201.12.28/24 brd 10.201.12.255 scope global eth1\ valid_lft forever preferred_lft forever
3: eth1 inet 10.201.12.43/24 brd 10.201.12.255 scope global secondary eth1\ valid_lft forever preferred_lft forever

ubuntu@vault-1:~$ sudo crm status
Stack: corosync
Current DC: vault-3 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Mon Jul 6 17:35:48 2020
Last change: Fri Jul 3 12:28:21 2020 by hacluster via crmd on vault-3

3 nodes configured
1 resource configured

Online: [ vault-1 vault-2 vault-3 ]

Full list of resources:

 Resource Group: grp_vault-ext_vips
     res_vault-ext_b547a20_vip (ocf::heartbeat:IPaddr2): Started vault-1

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

An observation:

adding new barbican-vault units results in the same failure during secrets-storage-relation-joined (even though new wrapping tokens are generated for them by charm-vault)

Revision history for this message
James Page (james-page) wrote :

For further context the token has a 1h ttl and is limited to access from a specific IP address.

Revision history for this message
James Page (james-page) wrote :

This worries me:

  barbican-vault:
    charm: cs:barbican-vault-15
...
    endpoint-bindings:
      "": alpha
      certificates: alpha
      juju-info: alpha
      secrets: alpha
      secrets-storage: alpha

  vault:
    charm: cs:vault-39
...
    endpoint-bindings:
      "": internal-space
      access: internal-space
      certificates: internal-space
      cluster: internal-space
      db: internal-space
      etcd: internal-space
      external: internal-space
      ha: internal-space
      nrpe-external-master: internal-space
      secrets: internal-space
      shared-db: internal-space

the 'alpha' endpoint bindings on the barbican-vault charm mean no explicit endpoint binding is provided and Juju has somewhat EUNDEFINED behaviour as to what network-get returns on any given unit in this case.

Please ensure "": internal-space is use in the bundle.

Revision history for this message
Alexander Litvinov (alitvinov) wrote :

After deploying with updated bindings all units are green.

juju deploy cs:barbican-vault --bind "oam-space secrets-storage:internal-space secrets:internal-space certificates:internal-space"

Thank you James end Dmitrii for helping with this issue

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Glad it helped.

Thanks to James for pointing the token IP ACL bit out.

Changed in charm-barbican-vault:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.