Recent backport breaks with many certificates (mplement cert cache for vault units (v2))

Bug #1983269 reported by Alex Kavanagh
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vault-charm
Triaged
High
Unassigned

Bug Description

A recent backport to vault breaks when vault manages many certificates for charm clients:

2022-07-30 01:23:36 DEBUG unit.vault/1.juju-log server.go:319 certificates:95: Saving certificate for "gnocchi_0" (cn: "gnocchi.silo2.solutionsqa") into cache.
2022-07-30 01:23:36 ERROR unit.vault/1.juju-log server.go:319 certificates:95: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1180, in inner_translate_exc2
    return f(*args, **kwargs)
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1241, in leader_set
    subprocess.check_call(cmd)
  File "/usr/lib/python3.8/subprocess.py", line 359, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 7] Argument list too long: 'leader-set'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-vault-1/charm/reactive/vault_handlers.py", line 1105, in create_certs
    vault_pki.update_cert_cache(request,
  File "/var/lib/juju/agents/unit-vault-1/charm/lib/charm/vault_pki.py", line 479, in update_cert_cache
    hookenv.leader_set({PKI_CACHE_KEY: json.dumps(pki_cache)})
  File "/var/lib/juju/agents/unit-vault-1/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1182, in inner_translate_exc2
    raise to_exc
NotImplementedError

Key part:

OSError: [Errno 7] Argument list too long: 'leader-set'

This occurred due to this code:

    hookenv.log('Saving certificate for "{}" '
                '(cn: "{}") into cache.'.format(request.unit_name,
                                                request.common_name),
                hookenv.DEBUG)
    pki_cache[request.unit_name] = unit_cache
    hookenv.leader_set({PKI_CACHE_KEY: json.dumps(pki_cache)})

i.e. with each new certificate, pki_cache[] gets bigger, and thus the CLI leader-set command gets bigger than the CLI command line can handle (e.g. Argument list too long).

Suggested solution:

have a key for each unit with PKI_CACHE_KEY as a prefix, and mangle the unit_name so that it is an acceptable key.

Tags: sts
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

See: https://review.opendev.org/q/topic:bug%252F1970888

Affects master, 1.7, 1.6 (which hasn't yet merged at the time of submitting this report).

Will do reverts and then the functionality will need to be reverted.

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

I was able to reproduce the bug on a very large model, 48 units (6 cinder units, glance, neutron-api, etc). It is unclear to me how the bug was hit in a small model by the CI. We cannot see the certificate being used in the juju crashdump.

tags: added: sts
Changed in vault-charm:
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.