Vault fails if certificates-relation-joined runs before initial setup

Bug #1970888 reported by Liam Young
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vault-charm
Fix Committed
High
Martin Kalcok

Bug Description

If the certificates-relation-joined hook runs before vault has been
configured then the hook fails. This is because the charm tries to access the running vault service but at this point vault is not configured or running. This regression appears to have been introduced by 1159e547
( https://review.opendev.org/c/openstack/charm-vault/+/828885 ). This patch seems to incorrectly gate on the `certificates.available` flag. Despite the name, `certificates.available` only indicates that certificate have been requested i.e. it means "a certificate is available to be processed" it does not mean that vault is ready.

The issue can be reproduced with this bundle:

series: focal
applications:

  keystone-mysql-router:
    charm: ch:mysql-router
    channel: latest/edge
  vault-mysql-router:
    charm: ch:mysql-router
    channel: latest/edge

  mysql-innodb-cluster:
    charm: ch:mysql-innodb-cluster
    constraints: mem=3072M
    num_units: 3
    channel: latest/edge

  vault:
    num_units: 3
    charm: ch:vault
    channel: latest/edge

  keystone:
    charm: ch:keystone
    num_units: 1
    options:
      admin-password: openstack
    channel: latest/edge

relations:
  - - 'vault:shared-db'
    - 'vault-mysql-router:shared-db'

  - - 'keystone:shared-db'
    - 'keystone-mysql-router:shared-db'
  - - 'keystone-mysql-router:db-router'
    - 'mysql-innodb-cluster:db-router'

  - - 'vault:certificates'
    - 'keystone:certificates'

Note that in the bundle the relation between vault-mysql-router and
mysql-innodb-cluster is missing. This simulates the situation where
a `certificates-relation-joined` fires before vault has been setup
because the initial configuration of vault is gated on
`shared-db.available` flag being set.

This bug can present itself in subtly different ways that may initially
appear like the db-router/shared-db relations are at fault. In the
output below vault/0 and vault/2 are both hitting this bug and in the
case of vault/0 the bug was hit before the unit sent its db access request
to vault-mysql-router/2 which is why vault-mysql-router/2 is reporting it
has missing data.

Unit Workload Message
vault/0 error hook failed: "certificates-relation-joined"
  vault-mysql-router/2 waiting shared-db' incomplete, Waiting for proxied
                                   DB creation from cluster
vault/1* blocked Vault needs to be initialized
  vault-mysql-router/1 active Unit is ready
vault/2 error hook failed: "certificates-relation-joined"
  vault-mysql-router/0* active Unit is ready

Liam Young (gnuoy)
description: updated
Revision history for this message
Liam Young (gnuoy) wrote :

Initial guess at the fix: https://paste.ubuntu.com/p/8FkpyhXKdJ/

Liam Young (gnuoy)
Changed in vault-charm:
status: New → Confirmed
importance: Undecided → High
Changed in vault-charm:
assignee: nobody → Martin Kalcok (martin-kalcok)
Revision history for this message
Martin Kalcok (martin-kalcok) wrote :

Thanks for catching this bug Liam. I wonder why this issue did not pop-up during my development/testing since the situation that "certificates.available" flag set but the vault is not "operational" also occurs before the vault initialization (which is common).

Regarding your proposed patch, this fixes the bug but unfortunately it also breaks the cache functionality. The flag "charm.vault.ca.ready" seems to be only set on the leader unit but the function "sync_cert_from_cache" needs to be run on the non-leaders to sync their data with the leader. I'm looking into alternative solution.

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

Martin, I pushed this patch to get started on the rework. As my -1 says, I have not tested it yet

https://review.opendev.org/c/openstack/charm-vault/+/840710

Revision history for this message
Martin Kalcok (martin-kalcok) wrote :

Thanks Rodrigo. I used it as a base and pushed another patchset to it. As it stands right now, that PR should fix this issue and keep the cache functionality.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I believe we can mark this as Fix Committed.

commit f55055b8783ca6f3f569209b4f82285377f5ac64
Author: Martin Kalcok <email address hidden>
Date: Fri Feb 11 15:13:41 2022 +0100

    Implement cert cache for vault units (v2)

    This cache is used to store certificates and keys
    issued by the leader unit. Non-leader units read
    these certificates and keep data in their
    "tls-certificates" relations up to date.
    This ensures that charm units that receive certs
    from vault can read from relation data of any
    vault unit and receive correct data.

    This patch is the same as
    1159e547dd755af97d5eab578cdfe90abad93843
    but improved to avoid LP#1970888

    Change-Id: Ic4dd009cc18c52e1667391b00ebba9928acc5937
    Closes-Bug: #1940549
    Closes-Bug: #1970888

Changed in vault-charm:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.