Deploying a new kubernetes-master unit makes pods lose connectivity to cephfs volumes

Bug #1891757 reported by Tiago Pasqualini da Silva
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Kubernetes Control Plane Charm
In Progress
High
Joseph Borg

Bug Description

Title says it all. I can consistently reproduce this with the following steps:

- Start with a simple deployment with cephfs, 3 ceph-monitors, 3 kubernetes-masters
- Deploy some pod with volumes backed by cephfs and verify that they are working
- Add new ceph-monitor and kubernetes-master units
- Check the volume on running pod and verify that they show an error: "Transport endpoint is not connected"

I tried isolating the steps to remove the manipulation of ceph-monitor units, but then I got inconsistent results with that (sometimes I got the error, sometimes I didn't). I believe there is some sort of race condition on the kubernetes-master when it updates the ceph-mon addresses.

Tags: sts
tags: added: sts
Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the report. I believe this is enough information for us to at least look into it.

If you can, please provide the charm revisions and kubernetes version you were running when you encountered this.

Changed in charm-kubernetes-master:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Chris Johnston (cjohnston) wrote :

I am able to reproduce with 1.17.10 and:

ceph-mon-49
ceph-fs-33
ceph-osd-304
kubeapi-load-balancer-742
kubernetes-master-865
kubernetes-worker-692

Joseph Borg (joeborg)
Changed in charm-kubernetes-master:
assignee: nobody → Joseph Borg (joeborg)
status: Triaged → In Progress
Revision history for this message
Joseph Borg (joeborg) wrote :

Hey Chris and Tiago,

Any chance you could share a pod spec that triggers this? I've tried mocking one up but it's not failing when adding ceph-mon and k8s-master so just want to make sure I'm binding to the right thing.

Revision history for this message
Tiago Pasqualini da Silva (tiago.pasqualini) wrote :

Hi Joseph,

I just used a sample from kubernetes documentation, modifying the claim to use cephfs SC. I'm attaching the PVC and pod spec.

Revision history for this message
Tiago Pasqualini da Silva (tiago.pasqualini) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.