baremetal k8s stays blocked when using ceph with full disk encryption

Bug #1949384 reported by Alexander Balderson
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
Invalid
Undecided
Unassigned

Bug Description

SQA recently added the relations between ceph and k8s-master. The ceph cluster is set to use full disk encryption. the k8s master units are all staying blocked with " Failed to configure encryption; secrets are unencrypted or inaccessible" and the logs show.

2021-10-31 12:16:51 WARNING unit.kubernetes-master/0.update-status logger.go:60 modprobe: FATAL: Module loop not found in directory /lib/modules/5.4.0-89-generic
2021-10-31 12:16:51 ERROR unit.kubernetes-master/0.juju-log server.go:327 Unable to create encrypted mount for storing encryption config.
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/lib/charms/layer/vaultlocker.py", line 154, in create_encrypted_loop_mount
    check_call(['modprobe', 'loop'])
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['modprobe', 'loop']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/reactive/kubernetes_master.py", line 3120, in create_secure_storage
    vaultlocker.create_encrypted_loop_mount(encryption_conf_dir)
  File "/var/lib/juju/agents/unit-kubernetes-master-0/charm/lib/charms/layer/vaultlocker.py", line 170, in create_encrypted_loop_mount
    raise VaultLockerError('Error configuring VaultLocker') from e
charms.layer.vaultlocker.VaultLockerError: Error configuring VaultLocker

I saw there is a note for setting permissions for containers when using ceph[1], is this how that issue presents itself?

A testrun with this issue can be found at:
https://solutions.qa.canonical.com/testruns/testRun/9a396026-c4a5-4ced-b0e1-b0c8f2a4113c
with crashdump at:
https://oil-jenkins.canonical.com/artifacts/9a396026-c4a5-4ced-b0e1-b0c8f2a4113c/generated/generated/kubernetes/juju-crashdump-kubernetes-2021-10-31-12.16.06.tar.gz
and bundle at:
https://oil-jenkins.canonical.com/artifacts/9a396026-c4a5-4ced-b0e1-b0c8f2a4113c/generated/generated/kubernetes/bundle.yaml

All testruns with this issue can be found at:
https://solutions.qa.canonical.com/bugs/bugs/bug/1949384

1) https://ubuntu.com/kubernetes/docs/storage

description: updated
Cory Johns (johnsca)
Changed in charm-kubernetes-master:
status: New → Incomplete
Revision history for this message
Cory Johns (johnsca) wrote :

The VaultLocker error is unrelated to Ceph, and Ceph using full-disk encryption should be completely transparent to Kubernetes. This error is because you're using LXD placement for the K8s master, which is called out in the docs as being unsupported [1] due to this exact issue: specifically, that the containerized charm cannot manage the loopback device needed to store the encrypted data. The suggested work-arounds are to either have the LXD storage pool encrypted, or use full-disk encryption on the host machine. It might be possible to add support for using Juju storage with encryption [2] but there was some reason we didn't go that route originally, though I can't recall what it was, so it might not be feasible.

[1]: https://ubuntu.com/kubernetes/docs/encryption-at-rest#known-issues
[2]: https://github.com/juju-solutions/layer-vaultlocker#using-juju-storage-annotations

The permissions issue you mentioned with Ceph only applies in a very specific case: old versions of Ceph (Train and before) along with CephFS and pods which run as the non-root user but require RWX volumes. Obviously, it came up at least once, but it's generally a pretty rare combination of circumstances, especially now that OpenStack is several released beyond that. Either way, it also has nothing to do with full-disk encryption for the Ceph storage.

Changed in charm-kubernetes-master:
status: Incomplete → Invalid
Revision history for this message
Cory Johns (johnsca) wrote (last edit ):

Looking at the code a bit more, I think the intent [1] was to at least allow for it to fall back to writing the encryption config file even if the encrypted loopback device cannot be created, but that doesn't seem to be possible currently. This would be required for either of the suggested work-arounds to help, although depending on the requirements another possible work-around would be to drop the vault-kv relation and use FDE or encrypted storage on the Etcd units.

I'm not sure if we should rework this bug to focus on allowing the Vaultlocker failure to be overridden, or just create a new bug.

[1]: https://github.com/charmed-kubernetes/charm-kubernetes-master/blob/a5d8758/reactive/kubernetes_master.py#L3146-L3148

Revision history for this message
Cory Johns (johnsca) wrote (last edit ):

I opened a new bug for allowing for overriding the Vaultlocker failure at https://bugs.launchpad.net/charm-kubernetes-master/+bug/1951876

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.