Ceph PV fails to mount in pod

Bug #1820908 reported by Dan Ardelean
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
CDK Addons
Fix Released
High
George Kraft

Bug Description

Hi,

I deployed a 1.13 CDK cluster. It is deployed in 5 VMs on a big GCE VM (nested).

For storage classes and automatically provisioning volumes I am also deploying a Ceph cluster. The PVCs and PVs do create normally. But when I am connecting a PVC to a pod, the volume attaches but it fails on mounting. Below I am attaching some outputs. The same setup works with OpenStack.

Charm versions, k8s 1.13
https://paste.ubuntu.com/p/g4YVnn5GMQ/

SC, PVC and PV info. I am using gce naming for the PVC, but it’s using the Ceph SC, so don’t mind that.
https://paste.ubuntu.com/p/P9rc9jbpZ6/

A pod describe
https://paste.ubuntu.com/p/crvNCGPznJ/

Error in the ceph CSI pod
https://paste.ubuntu.com/p/s9pdMsntqQ/

I cannot see other errors in any other pods. I’ve tried with k8s 1.12 but same result.

Changed in charm-kubernetes-master:
status: New → Triaged
importance: Undecided → High
Changed in charm-kubernetes-master:
assignee: nobody → Joseph Borg (joeborg)
description: updated
Revision history for this message
Joseph Borg (joeborg) wrote :

Hi Dan,

I've tried to replicate this with a basic PV, PVC and pod based on busybox and can't replicate on AWS.

Could you pass the YAML spec for the busybox pod please?

If you have the capacity, could you also please try the deployment using edge?

`juju deploy canonical-kubernetes --channel edge`

The ceph-mon and ceph-osd can use stable (no edge currently exists).

Thanks,
Joe

Revision history for this message
Dan Ardelean (danardelean) wrote :

Hi Joseph,

Here is the Pod definition
https://paste.ubuntu.com/p/97sGm3gnnh/

Unfortunately edge will not help in our case, we need a stable version. Also, I used a bundle file to model and deploy the Cluster as it needs to resemble a production env. Here is the bundle
https://paste.ubuntu.com/p/tPSWymNMQK/

It closely resembles the bundles found on juju-solutions/bundle-canonical-kubernetes git wiki. The only difference here is that I deploy it in 5 KVM VMs.

I redeployed the Cluster and it's the same for both ceph-ext4 and ceph-xfs Storage Classes. Also tested in an nginx pod and same result.

Let me know if you need anything else.

Dan

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

Dan, edge will become stable next week. I think it's worth a shot, although I'm starting to suspect this is a bug in the ceph csi pod itself.

Revision history for this message
George Kraft (cynerva) wrote :

I was able to reproduce this on AWS using the provided bundle, PVC details, and pod spec. I'll see if I can narrow it down from here.

Revision history for this message
George Kraft (cynerva) wrote :

Setting the PVC's size to 1Gi or higher causes it to work. Any smaller than 1Gi and it fails in the way described in this issue.

Revision history for this message
George Kraft (cynerva) wrote :
Revision history for this message
George Kraft (cynerva) wrote :

It looks like we have this fixed in snap channels 1.13/edge and 1.12/edge. We picked up the fix when we updated csi-rbdplugin from 0.3.0 to 1.0.0.

I tested with 1.13/edge and the issue did not occur.

Revision history for this message
Dan Ardelean (danardelean) wrote :

Hi,

Thanks for the infos. Will the fix be merged into 1.13/stable this week? If I am using 1.13/edge, do I need to change kubernetes-master charm version?

Dan

Revision history for this message
George Kraft (cynerva) wrote :

> Will the fix be merged into 1.13/stable this week?

Yes. We don't have a concrete date planned yet, but we're aiming to do so this week.

> If I am using 1.13/edge, do I need to change kubernetes-master charm version?

Nope. 1.13/edge snaps should work with stable charms.

Revision history for this message
Dan Ardelean (danardelean) wrote :

Cool. Please let me know when it is on 1.13/stable and which charm version should I use.

Dan

Changed in charm-kubernetes-master:
assignee: Joseph Borg (joeborg) → George Kraft (cynerva)
Changed in cdk-addons:
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → George Kraft (cynerva)
milestone: none → 1.14
no longer affects: charm-kubernetes-master
Revision history for this message
Dan Ardelean (danardelean) wrote :

Hi,

Will this fix be on the 1.13/stable channel? If yes, which charm version. Thanks.

Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

The fix in the cdk-addons snap channel 1.13/edge has been promoted to 1.13/stable. A new charm revision is not required.

Changed in cdk-addons:
status: Fix Committed → Fix Released
Revision history for this message
Tim Van Steenburgh (tvansteenburgh) wrote :

From Dan via email:

Now there is another issue. I have deployed again the same bundle as mentioned in the bug, 1.13 with same charm versions. The PVC and pod definitions are the same as the ones in the bug.

The problem now is the 'csi-rbdplugin-attacher-0' pod, which has the following issue
https://paste.ubuntu.com/p/qv4DdhHWZ9/

As a result, when I create a busybox pod with a PVC reference this happens
https://paste.ubuntu.com/p/4nTX656MFx/

The pod looks like this
https://paste.ubuntu.com/p/gCQPTcZrq4/

Changed in cdk-addons:
status: Fix Released → Triaged
George Kraft (cynerva)
Changed in cdk-addons:
status: Triaged → In Progress
Revision history for this message
George Kraft (cynerva) wrote :

Ah, wonderful. I can reproduce this. It's another upstream issue:
https://github.com/ceph/ceph-csi/issues/278

Which looks to have been introduced 6 days ago:
https://github.com/ceph/ceph-csi/pull/265

And we picked this up 2 days ago in a new build of 1.13/edge, for Kubernetes 1.13.5.

Let me see what I can do.

Revision history for this message
Dan Ardelean (danardelean) wrote : Re: [Bug 1820908] Re: Ceph PV fails to mount in pod

Cool, thanks.

Sent from my iPhone

> On 28 Mar 2019, at 20:50, George Kraft <email address hidden> wrote:
>
> Ah, wonderful. I can reproduce this. It's another upstream issue:
> https://github.com/ceph/ceph-csi/issues/278
>
> Which looks to have been introduced 6 days ago:
> https://github.com/ceph/ceph-csi/pull/265
>
> And we picked this up 2 days ago in a new build of 1.13/edge, for
> Kubernetes 1.13.5.
>
> Let me see what I can do.
>
> ** Bug watch added: github.com/ceph/ceph-csi/issues #278
> https://github.com/ceph/ceph-csi/issues/278
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1820908
>
> Title:
> Ceph PV fails to mount in pod
>
> Status in CDK Addons:
> In Progress
>
> Bug description:
> Hi,
>
> I deployed a 1.13 CDK cluster. It is deployed in 5 VMs on a big GCE VM
> (nested).
>
> For storage classes and automatically provisioning volumes I am also
> deploying a Ceph cluster. The PVCs and PVs do create normally. But
> when I am connecting a PVC to a pod, the volume attaches but it fails
> on mounting. Below I am attaching some outputs. The same setup works
> with OpenStack.
>
> Charm versions, k8s 1.13
> https://paste.ubuntu.com/p/g4YVnn5GMQ/
>
> SC, PVC and PV info. I am using gce naming for the PVC, but it’s using the Ceph SC, so don’t mind that.
> https://paste.ubuntu.com/p/P9rc9jbpZ6/
>
> A pod describe
> https://paste.ubuntu.com/p/crvNCGPznJ/
>
> Error in the ceph CSI pod
> https://paste.ubuntu.com/p/s9pdMsntqQ/
>
> I cannot see other errors in any other pods. I’ve tried with k8s 1.12
> but same result.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cdk-addons/+bug/1820908/+subscriptions

Revision history for this message
George Kraft (cynerva) wrote :
George Kraft (cynerva)
Changed in cdk-addons:
status: In Progress → Fix Committed
Revision history for this message
Dan Ardelean (danardelean) wrote :

Hi,

Redeployed with same charm versions and works now. Thanks.

Revision history for this message
George Kraft (cynerva) wrote :

Thanks for the update. Let us know if you have any other issues.

Changed in cdk-addons:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.