Implement snap coherence

Bug #1845559 reported by Cory Johns on 2019-09-26
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Master Charm
Wishlist
Cory Johns
Kubernetes Worker Charm
Wishlist
Cory Johns
Snap Layer
Undecided
Cory Johns

Bug Description

Snapd now supports cohorts (https://forum.snapcraft.io/t/managing-cohorts/8995) which can be used to enforce consistent revisions of snaps across units and applications. Charmed Kubernetes should support this, with the following properties:

* The leader of the master should take the cohort snapshots and then provide that to the other master (if any) and workers
* The masters should upgrade before the workers
* Because the cohort snapshot does not change the channel and refreshing to a new snapshot will only refresh point releases, there is no need for it to involve manual intervention

Cory Johns (johnsca) wrote :

Added the snap layer because the support for cohort management should be available to all charms.

Changed in charm-kubernetes-worker:
status: New → In Progress
importance: Undecided → Wishlist
Changed in layer-snap:
status: New → In Progress
Changed in charm-kubernetes-worker:
assignee: nobody → Cory Johns (johnsca)
Changed in layer-snap:
assignee: nobody → Cory Johns (johnsca)
Changed in charm-kubernetes-worker:
milestone: none → 1.17
Cory Johns (johnsca) wrote :

Did some initial testing, fixed some typos, and ran into an issue with upstream support for cohorts which have since been fixed. Once again ready for a full set of testing.

Can test without the snap store proxy, but once that's ok, we should also test it with the proxy. To that end, I've created a small charm to help test with the snap store proxy: https://github.com/johnsca/charm-snap-store-proxy

Cory Johns (johnsca) wrote :

1 remaining piece to be done: There is no communication from the non-leader master back to the leader, so there is currently no blocking to wait for the other master to be done before sending the cohort keys to the workers. A peer interface layer will have to be created to manage that.

Cory Johns (johnsca) wrote :

To do any automated testing of this inherently requires the snap store proxy, since that's the only way we have to programmatically control the snap revisions.

The test would need to set up the proxy (I believe this has to be done via the snap-store-proxy model config to ensure that the proxy is honored from the very start), create an override for all of the snaps to one rev back, perform the deployment and let the cohorts be created, then remove the override and watch for the cohorts to update in a controlled fashion across the cluster.

I should note that in my initial testing with the snap store proxy, I ran into a couple of issues:

  * Once the charm is deployed, checking the status with `juju ssh {unit} snap-proxy check-connections` shows an error with Postgres, because the database with a name matching the username doesn't exist. You can correct this by manually connecting to the database, but I don't think it's actually a problem since the connection string specifies the database name to use as snap-store-proxy and the overrides and all seem to work anyway.

  * I'm not sure I saw this every time, but most times when I set a machine to use the proxy and then tried to create a cohort for kubelet, I got the error "cannot create cohorts: snap not found" (which I mention in https://forum.snapcraft.io/t/error-when-joining-cohort/13463/12). It may be that the snap store does not yet support cohorts, but that's likely to be a blocker for us, particularly for automated testing, as mentioned above.

Kevin W Monroe (kwmonroe) wrote :

Also need https://github.com/charmed-kubernetes/layer-kubernetes-master-worker-base/pull/10 to include layer-coordinator for k8s master/worker base. This lets us roll the refresh out one unit at a time.

Kevin W Monroe (kwmonroe) wrote :

Comment #4 is resolved by the kube-masters peer interface. PR to merge into the index:

https://github.com/juju/layer-index/pull/100

Kevin W Monroe (kwmonroe) wrote :

Alright folks, let's wrap this up. Lots-o moving parts in this bug, so I'll break them down here. All previous comments are superseded by this one.

1. Code to monitor upstream
(a) The k8s fork of layer-snap supports cohorts. This is proposed upstream with https://code.launchpad.net/~johnsca/layer-snap/+git/layer-snap/+merge/375315 but until that lands, k8s charms have this via https://github.com/charmed-kubernetes/layer-snap/pull/1.

2. Charmed-k8s supporting cast
(a) New k8s-masters peer interface to ensure all peers know about cohorts before informing workers:
- https://github.com/charmed-kubernetes/interface-kube-masters
(b) k8s-masters peer interface added to the upstream index:
- https://github.com/juju/layer-index/pull/100
(c) k8s-masters peer interface and layer-coordinator added to the charmed-k8s index:
- https://github.com/charmed-kubernetes/jenkins/pull/546
(d) add coordinator to k8s master/worker charms
- https://github.com/charmed-kubernetes/layer-kubernetes-master-worker-base/pull/10
(e) kube-control changes to handle cohort data between master/workers
- https://github.com/juju-solutions/interface-kube-control/pull/28

3. Charmed-k8s charm changes
(a) masters figure out cohorts
- https://github.com/charmed-kubernetes/charm-kubernetes-master/pull/61
(b) workers react to master cohorts
- https://github.com/charmed-kubernetes/charm-kubernetes-worker/pull/36

Kevin W Monroe (kwmonroe) wrote :
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Kevin W Monroe (kwmonroe) wrote :
Changed in charm-kubernetes-worker:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers