Requests for certificates hang when using vault charm in Juju Cross Model Relation

Bug #1813605 reported by Ed Stewart
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charm Helpers
Triaged
High
Unassigned
vault-charm
Fix Released
High
Chris MacNaughton

Bug Description

We are setting up a kubernetes cluster using Juju running on OpenStack.

Our undercloud juju controller is running vault in a model. We have created an offer on that model to expose it out for cross model relations.

Our overcloud juju controller (that targets OpenStack) has kubernetes-master, kubernetes-worker and etcd charms deployed.

We then created a cross controller, cross model relation for the Overcloud juju controller to consume the undercloud Vault:

```juju consume undercloud:admin/overcloud.vault vault```

This shows up on the overcloud juju controller just fine, however, I then try creating a relationship from etcd:certificates to the exposed vault:

``` juju add-relation etcd:certificates vault```

This shows success, however, the charm status is locked in maintenance:

Unit Workload Agent Machine Public address Ports Message
etcd/0* maintenance idle 0 172.16.20.60 Requesting tls certificates.

The etcd unit log shows the following in an endless loop:

2019-01-28 14:12:29 INFO juju-log certificates:10: Initializing Snap Layer
2019-01-28 14:12:29 DEBUG certificates-relation-changed none
2019-01-28 14:12:29 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:15:store_ca
2019-01-28 14:12:29 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:60:store_client
2019-01-28 14:12:29 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:89:set_app_version
2019-01-28 14:12:29 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:103:prepare_tls_certificates
2019-01-28 14:12:30 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:389:process_snapd_timer
2019-01-28 14:12:30 INFO juju-log certificates:10: Get config refresh.timer for snap core
2019-01-28 14:12:30 INFO juju-log certificates:10: Invoking reactive handler: hooks/relations/tls-certificates/requires.py:72:joined:certificates
2019-01-28 14:12:38 INFO juju-log certificates:10: Reactive main running for hook certificates-relation-changed
2019-01-28 14:12:38 INFO juju-log certificates:10: Initializing Leadership Layer (is leader)
2019-01-28 14:12:39 INFO juju-log certificates:10: Initializing Snap Layer
2019-01-28 14:12:39 DEBUG certificates-relation-changed none
2019-01-28 14:12:39 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:15:store_ca
2019-01-28 14:12:39 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:60:store_client
2019-01-28 14:12:39 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:89:set_app_version
2019-01-28 14:12:39 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:103:prepare_tls_certificates
2019-01-28 14:12:40 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:389:process_snapd_timer
2019-01-28 14:12:40 INFO juju-log certificates:10: Get config refresh.timer for snap core
2019-01-28 14:12:40 INFO juju-log certificates:10: Invoking reactive handler: hooks/relations/tls-certificates/requires.py:72:joined:certificates
2019-01-28 14:12:48 INFO juju-log certificates:10: Reactive main running for hook certificates-relation-changed
2019-01-28 14:12:48 INFO juju-log certificates:10: Initializing Leadership Layer (is leader)
2019-01-28 14:12:48 INFO juju-log certificates:10: Initializing Snap Layer
2019-01-28 14:12:49 DEBUG certificates-relation-changed none
2019-01-28 14:12:49 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:15:store_ca
2019-01-28 14:12:49 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:60:store_client
2019-01-28 14:12:49 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:89:set_app_version
2019-01-28 14:12:49 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:103:prepare_tls_certificates
2019-01-28 14:12:50 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:389:process_snapd_timer
2019-01-28 14:12:50 INFO juju-log certificates:10: Get config refresh.timer for snap core
2019-01-28 14:12:50 INFO juju-log certificates:10: Invoking reactive handler: hooks/relations/tls-certificates/requires.py:72:joined:certificates
2019-01-28 14:12:57 INFO juju-log certificates:10: Reactive main running for hook certificates-relation-changed
2019-01-28 14:12:58 INFO juju-log certificates:10: Initializing Leadership Layer (is leader)
2019-01-28 14:12:58 INFO juju-log certificates:10: Initializing Snap Layer
2019-01-28 14:12:58 DEBUG certificates-relation-changed none
2019-01-28 14:12:58 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:15:store_ca
2019-01-28 14:12:58 INFO juju-log certificates:10: Invoking reactive handler: reactive/tls_client.py:60:store_client
2019-01-28 14:12:59 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:89:set_app_version
2019-01-28 14:12:59 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:103:prepare_tls_certificates
2019-01-28 14:13:00 INFO juju-log certificates:10: Invoking reactive handler: reactive/etcd.py:389:process_snapd_timer
2019-01-28 14:13:00 INFO juju-log certificates:10: Get config refresh.timer for snap core
2019-01-28 14:13:00 INFO juju-log certificates:10: Invoking reactive handler: hooks/relations/tls-certificates/requires.py:72:joined:certificates
2019-01-28 14:13:07 INFO juju-log certificates:10: Reactive main running for hook certificates-relation-changed
2019-01-28 14:13:08 INFO juju-log certificates:10: Initializing Leadership Layer (is leader)
2019-01-28 14:13:08 INFO juju-log certificates:10: Initializing Snap Layer

The overcloud etcd lxd can access the undercloud vault API at a network level:

ubuntu@juju-8291d5-overcloud-0:~$ curl http://252.254.138.247:8200
404 page not found

Tags: atos
tags: added: atos
Ryan Beisner (1chb1n)
Changed in vault-charm:
assignee: nobody → Chris MacNaughton (chris.macnaughton)
milestone: none → 19.04
importance: Undecided → High
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

I have reproduced this and am digging in further.

I can see, in the charm logs on Vault, that

2019-02-05 07:48:07 INFO juju-log certificates:3: Processing certificate request from remote-2ea1e7247f6746568fd0e43cacfdc7c0/0 for 10.5.0.11

where the above IP is the etcd unit in the remote model.

James Page (james-page)
Changed in vault-charm:
status: New → Confirmed
Revision history for this message
James Page (james-page) wrote :

This issue will impact the tls-certificates and vault-kv interfaces as both make use of the remote unit name on the providing side when generating key prefixes for unit specific responses.

In a CMR context, the providing side sees obfuscated unit names rather than the actual unit name and the consuming side is looking for its local unit name in the data bag presented - so the relation never completes.

Fixing this requires a new pattern to be implemented to support this type of relation behaviour; as a starter for ten:

1) Consuming unit presents a 'response_nonce' key with a piece of unit specific data in it on the relation - this could be the hash of its unit name (ensuring we don't bleed the piece) + the model UUID thus generating something very specific to the unit and model it resides in.

2) The providing unit(s) uses the 'response_nonce' rather than the current munge of the remote unit name when generating responses for a specific unit.

This pattern can be used for the two vault interfaces and other CMR broken interfaces (such as the ceph mon and shared-db interfaces which also make use of unit names in data).

Changed in vault-charm:
status: Confirmed → Triaged
Changed in charm-helpers:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :
Revision history for this message
Cory Johns (johnsca) wrote :

I'm working on the fix for the interface layer.

David Ames (thedac)
Changed in vault-charm:
milestone: 19.04 → 19.07
Revision history for this message
Cory Johns (johnsca) wrote :

Fixed in https://github.com/juju-solutions/interface-tls-certificates/pull/16

Once merged, any charm using the interface (either provides or requires) will need to be rebuilt.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

charm-vault has had commits land in the two weeks since, in both stable and master branches. This bug should, thus, be fix-released.

Changed in vault-charm:
status: Triaged → Fix Released
milestone: 19.07 → 19.04
Revision history for this message
Edward Hope-Morley (hopem) wrote :

@chris.macnaughton I just tried this a vault charm based off master that I built myself (so should have latest charm-helpers patches in it) and I get:

$ juju run --unit etcd/0 'relation-get -r certificates:11 - vault/2'
egress-subnets: 10.0.0.74/32
ingress-address: 10.0.0.74
private-address: 10.0.0.74

and therefore:

etcd/0* blocked idle 0 10.100.0.209 Missing relation to certificate authority.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

and ^^ is using CMR

Revision history for this message
Edward Hope-Morley (hopem) wrote :

One extra piece of info is that the etcd charmstore version has not yet been updated to include the fix from https://github.com/juju-solutions/interface-tls-certificates/pull/16 but i tried building one myself and despite vault receiving the new unit_name setting it does not appear to be responding with certs.

Changed in vault-charm:
milestone: 19.04 → 19.07
status: Fix Released → Confirmed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

The etcd update is actually available in the edge channel (--channel edge)

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Ok, I have retested this and it is actually working. It needed the edge version of the etcd charm and I had also forgotten to run the generate-root-ca action on vault. Once I did those it worked.

Changed in vault-charm:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.