Removing application removes custom resources that should persist

Bug #1862390 reported by Kenneth Koski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Medium
Yang Kelvin Liu

Bug Description

In the Kubeflow bundle, there is a charm that creates a Profile object, which in turn gets an associated namespace created for it. I would like to be able to mark the Profile object as not getting removed when the associated charm is removed. This would let us redeploy that charm or Kubeflow as a whole without deleting user data, which is stored in the associated namespace.

Revision history for this message
Kenneth Koski (knkski) wrote :

I'm running into an issue where `juju destroy-model kubeflow` will remove the `kubeflow-dashboard` charm, triggering a delete of the Profile resource that it created when deployed. However, the Profile object has a finalizer on it that requires the `kubeflow-profiles` charm to clear before it can actually get deleted. That charm also got deleted with the destroy model step, so the Profile object can't get deleted, and hangs the kubeflow namespace deletion.

With https://bugs.launchpad.net/juju/+bug/1860688 landing, Profile resources will be cluster-scoped, and it seems like we can use the logic of deleting all namespaced resources associated with a charm when it's deleted, and leave cluster-scoped resources alone. That would prevent this namespace deletion hang issue from occurring.

Revision history for this message
Ian Booth (wallyworld) wrote :

Juju is designed to clean up after itself when an application or model is removed. Leaving behind resources fundamentally breaks the repeatability aspect of Juju deploys. If cluster global resources are required to be persistent, one option could be to deploy those to a separate model and leave that model alone - other models can be created and removed as needed, using a cross model relation to access the required profile data.

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1862390] Re: Removing application removes custom resources that should persist

We do have modeling around storage that persists beyond the lifetime of the
units/applications that are consuming it. However, having a model level
deploy, create global resources but not clean up after itself feels
dangerous as you're leaving things dirty.

On Mon, Feb 10, 2020 at 3:55 AM Ian Booth <email address hidden> wrote:

> Juju is designed to clean up after itself when an application or model
> is removed. Leaving behind resources fundamentally breaks the
> repeatability aspect of Juju deploys. If cluster global resources are
> required to be persistent, one option could be to deploy those to a
> separate model and leave that model alone - other models can be created
> and removed as needed, using a cross model relation to access the
> required profile data.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1862390
>
> Title:
> Removing application removes custom resources that should persist
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1862390/+subscriptions
>

Revision history for this message
Kenneth Koski (knkski) wrote :

We have to create a Profile/Namespace for Kubeflow, which leaves us with 4 options that I can see:

1. Manually create the Profile object
   This is probably the easiest route, just force users to run `kubectl apply` themselves. The downside is that we spend a whole bunch of effort in creating an abstraction on top of Kubernetes, only to tell users that they have to learn Kubernetes anyways.

2. Create the Profile object in a charm, delete it with the charm
   This is the way things work now, and has the downside that you can't ever remove a Kubeflow deployment without also deleting all user data associated with that deployment. This seems like a heavy requirement, since we can't foresee all of the ways in which someone might need to administer a Kubeflow deployment. For example, say upgrade-charm breaks at some point, and the only way to upgrade a charm is to remove and redeploy it.

3. Create the Profile object in a charm, disconnect it from the charm lifecycle
   This is what this ticket is asking for. The downside is that a Juju charm might leave behind things after it's deleted. The upside is that a user doesn't have to manually muck about with kubectl, and Juju can make it obvious when a charm leaves something behind.

4. Create the Profile object separately in Juju from a charm
   This would make the most sense from a modelling perspective, we could have some sort of `juju create-resource profile/admin`, and the lifecycle isn't tied to a charm. The downside is that it's a fair amount of work, and won't likely be implemented for a good while even if it is the path forward.

My preference would be #3 for a short-term solution, until something like #4 could be implemented that properly models the resource creation.

Revision history for this message
Ian Booth (wallyworld) wrote :

5. Deploy the profile charm in a separate model (which would manage all required global resources) and cross model relate kubeflow workloads deployed in other models. When removing a kubeflow model, the profile model would remain behind, as would the profile info etc. A newly deployed kubeflow model would relate to the retained profile model charm as needed.

This is probably the model juju-esque approach and really shouldn't be a lot of effort.

Revision history for this message
Kenneth Koski (knkski) wrote :

I think option #5 is a variant of #2, as it wouldn't help the case where a user has to specifically remove the charm that creates the Profile object. Putting the charm in a different model would help when a user runs `juju destroy-model foo`, but if you ever have to remove the charm (different model or no), it will necessitate deleting user data.

Revision history for this message
Ian Booth (wallyworld) wrote :

The idea though is that the separate model containing the profile charm would not be deleted. It hangs around as other models around it are created and deleted. This preserves the profile data but allows kubeflow deployments to be managed as needed. Of course, you can delete the separate profile model to remove data, and this does need to be catered for - there always needs to be a way to clean stuff up.

Revision history for this message
Kenneth Koski (knkski) wrote :

Right, a separate model would let you avoid destroying the Profile resource with `juju destroy-model`, but if you ever have to remove the charm for any reason (say something's wrong with `juju upgrade-charm` and you need to just redeploy the charm), it will delete user data, and a separate model won't help with that.

Revision history for this message
Ian Booth (wallyworld) wrote :

One option could be that if a resource is cluster scoped, and it has an annotation "juju.io/persistent", juju will retain it when the charm is deleted. But only for cluster scoped resources.

Revision history for this message
Kenneth Koski (knkski) wrote :

Sure, sounds good to me. We'll probably need some way of adding annotations to CustomResources, I don't think Juju has that capability ATM.

Revision history for this message
Thomas Miller (tlmiller) wrote :

Just to add my two cents of different types of situations that can be needed based on previous kube ops experience. When installing something into a cluster that has CRD's it's often needed that removing the application associated with the CRD's to keep the CRD's around for other things that may have objects of those CRD types still in use.

Some controllers between major version upgrades need a full removal in order to get the new version into a cluster.

Revision history for this message
John A Meinel (jameinel) wrote :

How do they have the CRD types in play if the application that was
providing it is gone?

On Tue, Feb 11, 2020 at 5:40 AM Thomas Miller <email address hidden>
wrote:

> Just to add my two cents of different types of situations that can be
> needed based on previous kube ops experience. When installing something
> into a cluster that has CRD's it's often needed that removing the
> application associated with the CRD's to keep the CRD's around for other
> things that may have objects of those CRD types still in use.
>
> Some controllers between major version upgrades need a full removal in
> order to get the new version into a cluster.
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1862390
>
> Title:
> Removing application removes custom resources that should persist
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1862390/+subscriptions
>

Revision history for this message
Kenneth Koski (knkski) wrote :

I think that's the exact issue: A charm in a model might create a CRD, a charm in another model would then create custom resources based on that CRD, and then deleting the first charm would wipe out the CRD, including custom resources that the second charm relies upon.

Revision history for this message
Ian Booth (wallyworld) wrote :

Models are supposed to be independent, self contained deployments though. It breaks an underlying Juju assumption that models should be able to depend on each other.

Revision history for this message
Kenneth Koski (knkski) wrote :

Unfortunately we're running into an architectural decision on Kubernetes' part here. CustomResourceDefinitions are always cluster-wide, and we can't limit them to a single model:

https://github.com/kubernetes/kubernetes/issues/65551

Additionally, some CRDs must be configured to also create cluster-wide custom resources. An example is that some Kubeflow code expects the Profile CRD to be configured this way.

Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

https://github.com/juju/juju/pull/11315 added lifecycle for CRD landed to 2.8,
https://github.com/juju/juju/pull/11335 added lifecycle for CR landed to 2.8,
to fix this issue.
Note: this feature is available in k8s spec V3.

Doc: https://discourse.jujucharms.com/t/k8s-spec-v3-changes/2698

Changed in juju:
status: New → Triaged
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
milestone: none → 2.8-beta1
importance: Undecided → Medium
status: Triaged → In Progress
status: In Progress → Fix Committed
Harry Pidcock (hpidcock)
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.