Charm doesn't make a determination about the default storage-class

Bug #1996809 reported by Adam Dyess
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Fix Released
Undecided
Adam Dyess

Bug Description

While testing the aws-k8s-storage charm, bootstrapping a k8s juju controller encountered an error
(https://bugs.launchpad.net/juju/+bug/1996808)
----------------------------------------------------------------------
2022-11-16-19:47:06 root ERROR [localhost] Command failed: juju add-k8s kubernetes_cloud --client --cloud aws --region us-east-1
2022-11-16-19:47:06 root ERROR [localhost] STDOUT follows:

2022-11-16-19:47:06 root ERROR [localhost] STDERR follows:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x1a32a70]

goroutine 1 [running]:
github.com/juju/juju/caas/kubernetes/provider.storageClassMatches({{0x45adcd7, 0xa}, {0x45e2854, 0x15}, 0x0, 0x1, {0x45db1cc, 0x14}}, 0xc00081d140?)
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/caas/kubernetes/provider/metadata.go:254 +0x70
github.com/juju/juju/caas/kubernetes/provider.(*kubernetesClient).CheckDefaultWorkloadStorage(0xc0002fc550?, {0xc0002fc550, 0x3}, 0xd?)
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/caas/kubernetes/provider/metadata.go:250 +0x131
github.com/juju/juju/caas/kubernetes/provider.UpdateKubeCloudWithStorage(0xc00081d7f0, {{0x0, 0x0}, {0xc0002fc550, 0xd}, {0x7fca318a5140, 0xc0005789c0}, 0xc0007d8290})
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/caas/kubernetes/provider/cloud.go:127 +0x3c2
github.com/juju/juju/cmd/juju/caas.(*AddCAASCommand).Run(0xc000000d20, 0xc0004cf380)
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/cmd/juju/caas/add.go:575 +0x9bd
github.com/juju/juju/cmd/modelcmd.(*baseCommandWrapper).Run(0xc0005b30f0, 0x4?)
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/cmd/modelcmd/base.go:551 +0xaf
github.com/juju/cmd/v3.(*SuperCommand).Run(0xc0005e5b80, 0xc0004cf380)
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/vendor/github.com/juju/cmd/v3/supercommand.go:523 +0x371
github.com/juju/cmd/v3.Main({0x4d94a28, 0xc0005e5b80}, 0xc0004cf380, {0xc0005a0770, 0x7, 0x7})
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/vendor/github.com/juju/cmd/v3/cmd.go:397 +0x24b
github.com/juju/juju/cmd/juju/commands.jujuMain.Run({0xc000120140?}, {0xc000122000, 0x8, 0x8})
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/cmd/juju/commands/main.go:199 +0x80f
github.com/juju/juju/cmd/juju/commands.Main(...)
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/cmd/juju/commands/main.go:122
main.main()
    /build/snapcraft-juju-f64d7e6b7ea3189aefa0ea413ed68a36/parts/juju/build/cmd/juju/main.go:27 +0x72
----------------------------------------------------------------------

Looking at the cluster there appeared two storage classes, neither of which was the default. This is likely the culprit for the juju client trying to bootstrap a controller.
----------------------------------------------------------------------
$ kubectl --kubeconfig=kube.conf get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
cdk-ebs kubernetes.io/aws-ebs Delete WaitForFirstConsumer false 98m
csi-aws-ebs-default ebs.csi.aws.com Delete WaitForFirstConsumer false 42m
----------------------------------------------------------------------

However, there's trouble here:

1) cdk-addons is installing a cdk-ebs storage-class that really can no longer be used in 1.25 because the provisioner kubernetes.io/aws-ebs isn't available anymore (that was name of the provisioner for the in-tree storage)
2) csi-aws-ebs-default is the name of the storage class that the aws-k8s-storage charm installed -- but it's not the default storage class. SHOULD IT BE?
3) If there are multiple storage backends (cloud backed and maybe ceph storage) shouldn't the cluster admin get the choice of which on to use -- why should the charms enforce their will on the cluster?
    3a) perhaps the charms can enforce their will through the use of config -- but that just begs the question what should the default config be on each of the charms and how to interlock them?
    3b) some suggested that "first storage class installed should win" but this could lead to the same bundles yielding different default storage based on which charm installs their storage-class first

Adam Dyess (addyess)
Changed in charm-aws-k8s-storage:
status: New → Incomplete
status: Incomplete → Confirmed
Changed in charm-azure-cloud-provider:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
Alexander Balderson (asbalderson) wrote :

The bug for the juju panic can be found at:
https://bugs.launchpad.net/juju/+bug/1996808

in the case of 2
you have decided the name of the storage in the charm, should the storage name also be something the user can configure? I fear that may also cause some confusion further down the line when it comes to juju add-k8s. Otherwise if the name is going to have "default" in it, I'd assume it would be the default unless I said otherwise.

If someone is using 2 kinds of storage, can you assume that they are advanced enough to also select which storage they want to use? I cant say for certain but my gut says that the average user wont be using more than 1 kind of storage

Revision history for this message
George Kraft (cynerva) wrote :

From https://kubernetes.io/docs/tasks/administer-cluster/change-default-storage-class/

> Please note that at most one StorageClass can be marked as default. If two or more of them are marked as default, a PersistentVolumeClaim without storageClassName explicitly specified cannot be created.

Given this behavior, I think that perhaps there could be a config option on charms that provide storage, something like "storage-class-default" that defaults to True.

Then in the case where you have just the one charm providing storage: there is only one default storage class, and everything "just works".

And in the case where you have more than one charm providing storage: there are multiple default storage classes, so Kubernetes treats it as if there is no default. The user is forced to specify storageClassName explicitly in their PVCs, OR update charm configuration to remove the conflicting defaults. It forces the user to make an explicit choice one way or another, which seems like a good thing to have them do.

Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-master:
status: New → Fix Committed
milestone: none → 1.26
Revision history for this message
Adam Dyess (addyess) wrote :

With regard to setting the default storage-class for the cluster, George makes a great point that the control-plane charm already has a config option called 'default-storage'

Currently this setting only applies when related to ceph-mon to select cephfs, ceph-ext4, or ceph-xfs. The default is "auto" which just sets ceph-xfs as the default if ceph is enabled.

I'd propose a refactor of the logic behind `default-storage` config option. "auto" seems farfetched -- so we should probably just let it continue to mean "ceph-xfs" since it really can't mean anything else at this point[1]

[1]: https://github.com/charmed-kubernetes/cdk-addons/blob/651070382303a86c7f09cfb1048140c4a6c9b386/cdk-addons/apply#L121-L122

> on.config.changed.default-storage
> on.update-status
   list all storage-classes in the cluster
   remove default annotation from all storage-classes in the cluster
   case of default-storage
   if "auto":
     -- add annotation to storage-class/ceph-xfs if it exists in the cluster
   else:
     -- add annotation to storage-class/{default-storage} if it exists in the cluster

I don't care for the poll on update-status, sadly, I know of no way for the charm to watch for external annotation changes to all the storage classes in the cluster and to react on that change.

Adam Dyess (addyess)
Changed in charm-kubernetes-master:
assignee: nobody → Adam Dyess (addyess)
Adam Dyess (addyess)
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-kubernetes-master:
status: Fix Released → In Progress
Changed in charm-aws-k8s-storage:
milestone: none → 1.26+ck2
Changed in charm-azure-cloud-provider:
milestone: none → 1.26+ck2
Changed in charm-gcp-k8s-storage:
milestone: none → 1.26+ck2
Changed in charm-kubernetes-master:
milestone: 1.26 → 1.26+ck2
Changed in charm-vsphere-cloud-provider:
milestone: none → 1.26+ck2
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Dual default storage classes in a typical CK deployment (1 storage provider) was resolved in 1.26 per comment 3 by disabling the in-tree SCs from cdk-addons.

We'll tackle more robust SC handling (as proposed in comment 5) in 1.27.

Changed in charm-aws-k8s-storage:
milestone: 1.26+ck2 → 1.27
Changed in charm-azure-cloud-provider:
milestone: 1.26+ck2 → 1.27
Changed in charm-gcp-k8s-storage:
milestone: 1.26+ck2 → 1.27
Changed in charm-kubernetes-master:
milestone: 1.26+ck2 → 1.27
Changed in charm-vsphere-cloud-provider:
milestone: 1.26+ck2 → 1.27
Changed in charm-kubernetes-master:
status: In Progress → Fix Committed
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Default sc handling is wholly implemented in the control plane; cloud-provider charm changes are no longer needed.

no longer affects: charm-aws-k8s-storage
no longer affects: charm-vsphere-cloud-provider
no longer affects: charm-azure-cloud-provider
no longer affects: charm-gcp-k8s-storage
Changed in charm-kubernetes-master:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.