fail to create volume from existed volume when nums are larger than rbd_max_clone_depth

Bug #1794956 reported by Boxiang Zhu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Jon Bernard

Bug Description

OS: CentOS 7
ENV: created by devstack, allinone
volume backend: ceph
rbd_max_clone_depth: 2(in file cinder.conf)

I created a volume(vol) with 1GB. Then I create a new volume(vol-1) from volume(vol). The I create an another new volume(vol-2) from volume(vol-1). Now I have three volumes and the depth is 2. So when I try to create a new volume(vol-3) from volume(vol-2), it failed. The error msg is as followed:
Sep 28 18:47:24 dev cinder-volume[28118]: INFO cinder.volume.drivers.rbd [None req-f33c3cba-108e-4d02-a1d5-d6c135f482a1 admin None] maximum clone depth (2) has been reached - flattening dest volume
Sep 28 18:47:24 dev cinder-volume[28118]: INFO oslo_service.service [None req-3d2dd756-cbd0-4375-9731-0bdc4d02817c None None] Child 28221 killed by signal 11
Sep 28 18:47:24 dev cinder-volume[28118]: INFO cinder.service [-] Starting cinder-volume node (version 13.0.0)
Sep 28 18:47:24 dev cinder-volume[28118]: INFO cinder.volume.manager [None req-e15ae702-d5e8-438b-8e81-6b120ccab3d7 None None] Starting volume driver RBDDriver (1.2.0)
Sep 28 18:47:25 dev cinder-volume[28118]: INFO cinder.keymgr.migration [None req-6d5032b0-c42b-4f51-b221-aa6f4d3cf937 None None] Not migrating encryption keys because the ConfKeyManager is still in use.
Sep 28 18:47:25 dev cinder-volume[28118]: INFO cinder.volume.manager [None req-e15ae702-d5e8-438b-8e81-6b120ccab3d7 None None] Driver initialization completed successfully.

Boxiang Zhu (bxzhu-5355)
Changed in cinder:
assignee: nobody → Boxiang Zhu (bxzhu-5355)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/606038

Changed in cinder:
status: New → In Progress
Jay Bryant (jsbryant)
Changed in cinder:
importance: Undecided → Medium
Revision history for this message
Eric Harney (eharney) wrote :

"Child 28221 killed by signal 11" indicates some kind of nasty bug here. That probably requires a fix outside of just Cinder itself. Is 28221 the cinder volume pid?

Can you provide more info about what Ceph packages are installed on this environment?

Revision history for this message
Jon Bernard (jbernard) wrote :

Not reproducible here, need additional information.

Revision history for this message
Boxiang Zhu (bxzhu-5355) wrote :

hi Eric, more info about ceph packages are as followed:
On ceph cluster
ceph-common-12.2.5-0.el7.x86_64
ceph-selinux-12.2.5-0.el7.x86_64
ceph-mon-12.2.5-0.el7.x86_64
ceph-osd-12.2.5-0.el7.x86_64
ceph-radosgw-12.2.5-0.el7.x86_64
centos-release-ceph-jewel-1.0-1.el7.centos.noarch
libcephfs2-12.2.5-0.el7.x86_64
python-cephfs-12.2.5-0.el7.x86_64
ceph-base-12.2.5-0.el7.x86_64
ceph-mgr-12.2.5-0.el7.x86_64
ceph-mds-12.2.5-0.el7.x86_64
ceph-12.2.5-0.el7.x86_64
ceph-fuse-12.2.5-0.el7.x86_64

On OpenStack Cluster(devstack allinone):
centos-release-ceph-luminous-1.1-2.el7.centos.noarch
python-cephfs-12.2.5-0.el7.x86_64
libcephfs2-12.2.5-0.el7.x86_64
ceph-common-12.2.5-0.el7.x86_64

Revision history for this message
Boxiang Zhu (bxzhu-5355) wrote :

hi Jon, I think that if you can do as followed, you will reproduce the issue.
- First of all, you must have integrated openstack with ceph.
- Change the config 'rbd_max_clone_depth' as '2' in cinder.conf, then restart the cinder_volume service.
- Create the first volume(named as vol01)
- Create the second volume(named as vol02) and the volume vol01 as the source volume
- Create the third volume(named as vol03) and the volume vol02 as the source volume. Now for vol01, the depth of vol03 is 2 which equals the 'rbd_max_clone_depth' in cinder.conf
- At this time, when we create the fourth volume(named as vol04) and the volume vol03 as the source volume. The cinder-volume service was crashed and the service was restarted.

The main reason is that when the image wants to do the 'unprotect_snap' action, in fact it has been close before.
https://github.com/openstack/cinder/blob/stable/rocky/cinder/volume/drivers/rbd.py#L622
https://github.com/openstack/cinder/blob/stable/rocky/cinder/volume/drivers/rbd.py#L651

Revision history for this message
Boxiang Zhu (bxzhu-5355) wrote :

Do with python interactive model:
[root@dev ~]# python
Python 2.7.5 (default, Jul 13 2018, 13:06:57)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import rbd
>>> import rados
>>> cluster = rados.Rados(conffile='/etc/ceph/ceph.conf')
>>> cluster.connect()
>>> ioctx = cluster.open_ioctx('volumes001')
>>> image = rbd.Image(ioctx, 'vol01')
>>> image
rbd.Image(ioctx, 'vol01')
>>> image.close()
>>> image
rbd.Image(ioctx, 'vol01')
>>> image.unprotect_snap('snap01')
Segmentation fault

Revision history for this message
Eric Harney (eharney) wrote :

Is there a corresponding bug elsewhere against librbd for this?

Changed in cinder:
assignee: Boxiang Zhu (bxzhu-5355) → Jon Bernard (jbernard)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/606038
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=394fbd7e20dbd8fb8cc58510657753432c43fa9b
Submitter: Zuul
Branch: master

commit 394fbd7e20dbd8fb8cc58510657753432c43fa9b
Author: zhu.boxiang <zhu.boxiang@99cloud.net>
Date: Fri Sep 28 19:07:12 2018 +0800

    RBD: fix volume reference handling in clone logic

    This patch fixes a bug in volume cloning where the source volume is
    prematurely closed. If the destination volume requires flattening and
    an exception occurs during flattening, the code attempts to perform
    cleanup operations on an already closed volume. This resulted in a
    segmentation fault which causes cinder to restart.

    Co-authored-by: Jon Bernard <email address hidden>

    Change-Id: Ib713aa91b775d8ec07ffdb24dfe1db1b6ecf2921
    Closes-Bug: #1794956

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/714151

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/714151
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=7bfd3c4dcf528ba900d6179ff06c227ea1016176
Submitter: Zuul
Branch: stable/train

commit 7bfd3c4dcf528ba900d6179ff06c227ea1016176
Author: zhu.boxiang <zhu.boxiang@99cloud.net>
Date: Fri Sep 28 19:07:12 2018 +0800

    RBD: fix volume reference handling in clone logic

    This patch fixes a bug in volume cloning where the source volume is
    prematurely closed. If the destination volume requires flattening and
    an exception occurs during flattening, the code attempts to perform
    cleanup operations on an already closed volume. This resulted in a
    segmentation fault which causes cinder to restart.

    Co-authored-by: Jon Bernard <email address hidden>

    Change-Id: Ib713aa91b775d8ec07ffdb24dfe1db1b6ecf2921
    Closes-Bug: #1794956
    (cherry picked from commit 394fbd7e20dbd8fb8cc58510657753432c43fa9b)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/715574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/stein)

Reviewed: https://review.opendev.org/715574
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=47caae4cbbcacf1d81554af4ea2bfec01a0f7c54
Submitter: Zuul
Branch: stable/stein

commit 47caae4cbbcacf1d81554af4ea2bfec01a0f7c54
Author: zhu.boxiang <zhu.boxiang@99cloud.net>
Date: Fri Sep 28 19:07:12 2018 +0800

    RBD: fix volume reference handling in clone logic

    This patch fixes a bug in volume cloning where the source volume is
    prematurely closed. If the destination volume requires flattening and
    an exception occurs during flattening, the code attempts to perform
    cleanup operations on an already closed volume. This resulted in a
    segmentation fault which causes cinder to restart.

    Co-authored-by: Jon Bernard <email address hidden>

    Change-Id: Ib713aa91b775d8ec07ffdb24dfe1db1b6ecf2921
    Closes-Bug: #1794956
    (cherry picked from commit 394fbd7e20dbd8fb8cc58510657753432c43fa9b)
    (cherry picked from commit 7bfd3c4dcf528ba900d6179ff06c227ea1016176)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/716081

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 15.1.0

This issue was fixed in the openstack/cinder 15.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 14.0.4

This issue was fixed in the openstack/cinder 14.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/rocky)

Reviewed: https://review.opendev.org/716081
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5d62c72d3545ed59b4cb3300b5c2b75d0a443a99
Submitter: Zuul
Branch: stable/rocky

commit 5d62c72d3545ed59b4cb3300b5c2b75d0a443a99
Author: zhu.boxiang <zhu.boxiang@99cloud.net>
Date: Fri Sep 28 19:07:12 2018 +0800

    RBD: fix volume reference handling in clone logic

    This patch fixes a bug in volume cloning where the source volume is
    prematurely closed. If the destination volume requires flattening and
    an exception occurs during flattening, the code attempts to perform
    cleanup operations on an already closed volume. This resulted in a
    segmentation fault which causes cinder to restart.

    Co-authored-by: Jon Bernard <email address hidden>

    Change-Id: Ib713aa91b775d8ec07ffdb24dfe1db1b6ecf2921
    Closes-Bug: #1794956
    (cherry picked from commit 394fbd7e20dbd8fb8cc58510657753432c43fa9b)
    (cherry picked from commit 7bfd3c4dcf528ba900d6179ff06c227ea1016176)
    (cherry picked from commit 47caae4cbbcacf1d81554af4ea2bfec01a0f7c54)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/718586

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

A fix to ceph:pybind/rbd was merged: https://tracker.ceph.com/issues/44610

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/queens)

Reviewed: https://review.opendev.org/718586
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=63ed20b7e9d9a3856c7f2444473504ae6e634e4c
Submitter: Zuul
Branch: stable/queens

commit 63ed20b7e9d9a3856c7f2444473504ae6e634e4c
Author: zhu.boxiang <zhu.boxiang@99cloud.net>
Date: Fri Sep 28 19:07:12 2018 +0800

    RBD: fix volume reference handling in clone logic

    This patch fixes a bug in volume cloning where the source volume is
    prematurely closed. If the destination volume requires flattening and
    an exception occurs during flattening, the code attempts to perform
    cleanup operations on an already closed volume. This resulted in a
    segmentation fault which causes cinder to restart.

    Co-authored-by: Jon Bernard <email address hidden>

    Change-Id: Ib713aa91b775d8ec07ffdb24dfe1db1b6ecf2921
    Closes-Bug: #1794956
    (cherry picked from commit 394fbd7e20dbd8fb8cc58510657753432c43fa9b)
    (cherry picked from commit 7bfd3c4dcf528ba900d6179ff06c227ea1016176)
    (cherry picked from commit 47caae4cbbcacf1d81554af4ea2bfec01a0f7c54)
    (cherry picked from commit 5d62c72d3545ed59b4cb3300b5c2b75d0a443a99)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder queens-eol

This issue was fixed in the openstack/cinder queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder rocky-eol

This issue was fixed in the openstack/cinder rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.