EasyRSA scale out broken

Bug #1809377 reported by Tim Van Steenburgh
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
CDK Addons
Fix Released
High
George Kraft
Calico Charm
Fix Released
High
George Kraft
Canal Charm
Fix Released
High
George Kraft
EasyRSA Charm
Won't Fix
High
Cory Johns
Etcd Charm
Fix Released
High
George Kraft
Flannel Charm
Fix Released
High
George Kraft
Kubernetes API Load Balancer
Fix Released
High
George Kraft
Kubernetes Control Plane Charm
Fix Released
High
George Kraft
Kubernetes Worker Charm
Fix Released
High
George Kraft
Tigera Secure EE Charm
Fix Released
High
George Kraft

Bug Description

Opened by jacekn on 2017-03-23 10:05:00+00:00 at https://github.com/juju-solutions/layer-easyrsa/issues/9

------------------------------------------------------------

I use easyrsa with k8s charms. I wanted to scale it out for HA but juju-add-unit easyrsa caused the following hook error:
# ./hooks/client-relation-changed

Easy-RSA error:

Missing expected CA file: serial (perhaps you need to run build-ca?)
Run without commands for usage and command help.
Traceback (most recent call last):
  File "./hooks/client-relation-changed", line 19, in <module>
    main()
  File "/usr/local/lib/python3.5/dist-packages/charms/reactive/__init__.py", line 78, in main
    bus.dispatch()
  File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 434, in dispatch
    _invoke(other_handlers)
  File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 417, in _invoke
    handler.invoke()
  File "/usr/local/lib/python3.5/dist-packages/charms/reactive/bus.py", line 291, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-easyrsa-1/charm/reactive/easyrsa.py", line 226, in create_server_cert
    server_cert, server_key = create_server_certificate(cn, sans, name)
  File "/var/lib/juju/agents/unit-easyrsa-1/charm/reactive/easyrsa.py", line 263, in create_server_certificate
    check_call(split(server))
  File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['./easyrsa', '--batch', '--req-cn=10.25.61.29', '--subject-alt-name=IP:10.25.61.29,IP:10.25.61.29,DNS:juju-555a14-juju2-is-kubernetes-105-6', 'build-server-full', 'kubernetes-worker_0', 'nopass', '2>&1']' returned non-zero exit status 1

====================== COMMENTS ============================

Comment created by mbruzek on 2017-03-23 16:07:29+00:00

It looks like the other easyrsa unit does not have the CA. We may need to change the code so only the leader signs the keys and certs and the other easyrsa units are ready in standby mode. However I do not know if easyrsa supports importing other server's PKI and signing certs. We need to come up with a strategy of how it should work and more testing and investigation are needed here.

@jacekn If you have more experience with easyrsa or have an architectural design that could help this issue, please leave a comment that would be appreciated.

------------------------------------------------------------

Comment created by hansbogert on 2018-01-01 22:49:34+00:00

In my case the common name of the CA is an ip address of the initial host which create the CA certificate, so if that's correct, then high-availability is even more problematic. In fact I experienced a "unknown certificate authority" when trying to do a manual switch from one easyrsa unit to a different one, on a different MAAS node.

Xav Paice (xavpaice)
tags: added: canonical-bootstack
Revision history for this message
Xav Paice (xavpaice) wrote :

I've added the canonical-bootstack tag here, as this affects our production Bootstack environments in the following way:

Firstly, we run K8s clouds, and have a single easyrsa unit there which is a single point of failure. I've not dug into recovery options for this as yet.

Secondly, the Openstack deployments use Easyrsa to provide a TLS cert for etcd, which is used by Vault, which stores the LUKs keys for Ceph. With a single unit of easyrsa, if we lose the host it resides on, re implementing a new easyrsa unit breaks the etcd cluster rendering it unusable, which in turn would do nasty things to Vault.

Revision history for this message
Xav Paice (xavpaice) wrote :

adding field-high, as the current implementation needs some design review

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

Thanks Xav. This is an old bug that should actually be closed now, since in the meantime we've provided a new recommended way to establish a HA certificate authority, using Vault instead of EasyRSA. Read https://ubuntu.com/kubernetes/docs/using-vault for details on using Vault with Charmed Kubernetes.

There are no plans to provide a HA configuration for EasyRSA; Vault should be used if HA is needed.

Changed in charm-easyrsa:
status: New → Won't Fix
Revision history for this message
James Troup (elmo) wrote : Re: [Bug 1809377] Re: Scale out broken

Tim Van Steenburgh <email address hidden> writes:

> Thanks Xav. This is an old bug that should actually be closed now, since
> in the meantime we've provided a new recommended way to establish a HA
> certificate authority, using Vault instead of EasyRSA. Read
> https://ubuntu.com/kubernetes/docs/using-vault for details on using
> Vault with Charmed Kubernetes.
>
> There are no plans to provide a HA configuration for EasyRSA; Vault
> should be used if HA is needed.

Err, I have so many questions about this, but let's put them to one
side for now.

Canonical OpenStack supports full disk encryption. It does this using
vault. vault uses etcd as a backing store. etcd gets its TLS from
easyrsa.

I don't think having an EasyRSA charm which doesn't support scale out
is a super reasonable position, TBH.

--
James

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

>
> Canonical OpenStack supports full disk encryption. It does this using
> vault. vault uses etcd as a backing store. etcd gets its TLS from
> easyrsa.
>

In this case I suspect that EasyRSA is there to bootstrap etcd certs, since
etcd must be up-and-running in order for Vault to enter HA mode. But you
can also achieve the same result without EasyRSA, by starting with a single
unit of Vault (which provides certs to etcd), and then scaling Vault up
after etcd is running.

Revision history for this message
James Hebden (ec0) wrote : Re: Scale out broken

Tim - Are you aware of a migration path between the easyrsa charm and the vault charm? Are users wanting to migrate from a non-HA deployment to a HA deployment based on Vault considered a supported use case?

Revision history for this message
Dean Henrichsmeyer (dean) wrote :

There has to be a migration path. A solution involving new technology is not a solution for existing users unless there is a documented, tested migration path.

Changed in charm-easyrsa:
status: Won't Fix → Confirmed
Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

James H - Yes and yes, but we're missing docs for that migration. Working on that now.

Changed in charm-easyrsa:
assignee: nobody → Cory Johns (johnsca)
importance: Undecided → High
status: Confirmed → In Progress
Revision history for this message
Xav Paice (xavpaice) wrote :

For the sake of clarity, there's two models right now where we need to have a migration path:

- OpenStack models, where we use Vault for encryption at rest, with a backing data store of etcd (which is bootstrapped using certs from easyrsa)
- Kubernetes models, where all certs are provided by easyrsa and there's no Vault.

For the first scenario, the existing Vault could be migrated to use a fresh store, possibly mysql rather than etcd even, and that would allow us to remove the easyrsa charm when switching to HA.

For the second, easyrsa is more tightly knitted into the environment and we need to determine exactly what and how to migrate from easyrsa to vault.

Revision history for this message
Cory Johns (johnsca) wrote :

https://github.com/charmed-kubernetes/kubernetes-docs/pull/223 adds instructions to transition a CDK cluster from EasyRSA to Vault and improves the documentation for making Vault HA.

You can preview the changes here: https://deploy-preview-223--cdk-docs-next.netlify.com/kubernetes/docs/using-vault#transitioning-an-existing-cluster-from-easyrsa-to-vault (the HA instructions are just a bit further down)

Changed in charm-easyrsa:
status: In Progress → Fix Committed
George Kraft (cynerva)
Changed in charm-calico:
status: New → In Progress
assignee: nobody → George Kraft (cynerva)
Changed in charm-canal:
assignee: nobody → George Kraft (cynerva)
George Kraft (cynerva)
Changed in charm-etcd:
assignee: nobody → George Kraft (cynerva)
Changed in charm-flannel:
assignee: nobody → George Kraft (cynerva)
Changed in charm-kubeapi-load-balancer:
assignee: nobody → George Kraft (cynerva)
Changed in charm-kubernetes-master:
assignee: nobody → George Kraft (cynerva)
Changed in charm-kubernetes-worker:
assignee: nobody → George Kraft (cynerva)
Changed in charm-tigera-secure-ee:
assignee: nobody → George Kraft (cynerva)
Changed in charm-canal:
status: New → Incomplete
status: Incomplete → In Progress
Changed in charm-etcd:
status: New → In Progress
Changed in charm-flannel:
status: New → In Progress
Changed in charm-kubeapi-load-balancer:
status: New → In Progress
Changed in charm-kubernetes-master:
status: New → In Progress
Changed in charm-kubernetes-worker:
status: New → In Progress
Changed in charm-tigera-secure-ee:
status: New → In Progress
Changed in charm-calico:
importance: Undecided → High
Changed in charm-canal:
importance: Undecided → High
Changed in charm-etcd:
importance: Undecided → High
Changed in charm-flannel:
importance: Undecided → High
Changed in charm-kubeapi-load-balancer:
importance: Undecided → High
Changed in charm-kubernetes-master:
importance: Undecided → High
Changed in charm-kubernetes-worker:
importance: Undecided → High
Changed in charm-tigera-secure-ee:
importance: Undecided → High
Changed in charm-easyrsa:
status: Fix Committed → Won't Fix
Revision history for this message
George Kraft (cynerva) wrote :

In testing, we found that the TLS client charms do not handle the transition from easyrsa to vault properly. I've added all of the affected components to this issue.

Cory opened initial PRs (linked below) but the issue has since been handed off to me. I'm still working through some issues with the transition on kubernetes-master.

PRs (WIP):
https://github.com/charmed-kubernetes/layer-etcd/pull/158
https://github.com/charmed-kubernetes/charm-kubeapi-load-balancer/pull/2
https://github.com/charmed-kubernetes/charm-kubernetes-master/pull/34
https://github.com/charmed-kubernetes/charm-kubernetes-worker/pull/22
https://github.com/charmed-kubernetes/charm-flannel/pull/55
https://github.com/charmed-kubernetes/layer-calico/pull/38
https://github.com/charmed-kubernetes/layer-canal/pull/35
https://github.com/charmed-kubernetes/layer-tigera-secure-ee/pull/17

George Kraft (cynerva)
Changed in cdk-addons:
assignee: nobody → George Kraft (cynerva)
importance: Undecided → High
status: New → In Progress
Revision history for this message
George Kraft (cynerva) wrote :

Another PR: https://github.com/charmed-kubernetes/cdk-addons/pull/133

Still working on this. Testing has been slow as there are a lot of components impacted by the CA change, many of which do not show obvious symptoms of failure until logs have been inspected.

George Kraft (cynerva)
Changed in cdk-addons:
status: In Progress → Fix Committed
Changed in charm-calico:
status: In Progress → Fix Committed
Changed in charm-canal:
status: In Progress → Fix Committed
Changed in charm-etcd:
status: In Progress → Fix Committed
Changed in charm-flannel:
status: In Progress → Fix Committed
Changed in charm-kubeapi-load-balancer:
status: In Progress → Fix Committed
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
Changed in charm-tigera-secure-ee:
status: In Progress → Fix Committed
Revision history for this message
George Kraft (cynerva) wrote :
summary: - Scale out broken
+ EasyRSA scale out broken
Changed in charm-etcd:
milestone: none → 1.15+ck1
Changed in cdk-addons:
milestone: none → 1.15+ck1
Changed in charm-calico:
milestone: none → 1.15+ck1
Changed in charm-canal:
milestone: none → 1.15+ck1
Changed in charm-flannel:
milestone: none → 1.15+ck1
Changed in charm-kubeapi-load-balancer:
milestone: none → 1.15+ck1
Changed in charm-kubernetes-master:
milestone: none → 1.15+ck1
Changed in charm-kubernetes-worker:
milestone: none → 1.15+ck1
Changed in charm-tigera-secure-ee:
milestone: none → 1.15+ck1
Revision history for this message
George Kraft (cynerva) wrote :
Changed in charm-calico:
milestone: 1.15+ck1 → none
status: Fix Committed → Fix Released
Changed in charm-calico:
milestone: none → 1.15+ck1
Changed in cdk-addons:
status: Fix Committed → Fix Released
Changed in charm-canal:
status: Fix Committed → Fix Released
Changed in charm-etcd:
status: Fix Committed → Fix Released
Changed in charm-flannel:
status: Fix Committed → Fix Released
Changed in charm-kubeapi-load-balancer:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Changed in charm-kubernetes-worker:
status: Fix Committed → Fix Released
Changed in charm-tigera-secure-ee:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.