Failed to create rbd volume

Bug #1649956 reported by zhangsong
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
zhangsong

Bug Description

When I try to create rbd volumes, I encounterd the following problem:
1.The volume keeps in 'creating' state for a long time(sometimes more than 60 seconds);
2.Sometimes it can be successfully created in the end, but sometimes it is transfered to 'error' state after a long time.

Here is part of my config file(with one rbd pool):
report_interval = 10
periodic_interval = 60
enabled_backends = ceph1
[ceph1]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
rbd_pool = volumes
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = true
rbd_max_clone_depth = 0
rbd_store_chunk_size = 4
rados_connect_timeout = -1
volume_backend_name = ceph1

volume log:
...
2016-12-14 17:37:56.514 DEBUG oslo_service.periodic_task [req-07616727-922d-4678-aba7-0c42ce038e23 None] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2016-12-14 17:37:56.514 DEBUG cinder.manager [req-07616727-922d-4678-aba7-0c42ce038e23 None] Notifying Schedulers of capabilities ... _publish_service_capabilities /opt/stack/cinder/cinder/manager.py:175
2016-12-14 17:37:56.515 DEBUG oslo_messaging._drivers.amqpdriver [req-07616727-922d-4678-aba7-0c42ce038e23 None] CAST unique_id: 83cdd3c38437466cace844eb8c380ff0 FANOUT topic 'cinder-scheduler' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:431
2016-12-14 17:37:56.515 WARNING oslo.service.loopingcall [req-a0495cf6-fb7b-420a-a5cc-ae542c4c3985 None] Function 'cinder.service.Service.report_state' run outlasted interval by 51.39 sec
2016-12-14 17:37:56.523 DEBUG oslo_messaging._drivers.amqpdriver [req-07616727-922d-4678-aba7-0c42ce038e23 None] CAST unique_id: 50e103ab378945499e19fb0964185751 exchange 'openstack' topic 'cinder-scheduler' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:442
2016-12-14 17:37:56.528 DEBUG oslo_service.periodic_task [req-07616727-922d-4678-aba7-0c42ce038e23 None] Running periodic task VolumeManager._report_driver_status run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2016-12-14 17:37:56.528 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:56.555 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:56.585 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:56.763 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:57.456 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:57.744 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:58.046 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:58.351 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:37:59.073 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:00.930 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
...
2016-12-14 17:38:58.170 DEBUG oslo_service.periodic_task [req-07616727-922d-4678-aba7-0c42ce038e23 None] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2016-12-14 17:38:58.170 DEBUG cinder.manager [req-07616727-922d-4678-aba7-0c42ce038e23 None] Notifying Schedulers of capabilities ... _publish_service_capabilities /opt/stack/cinder/cinder/manager.py:175
2016-12-14 17:38:58.171 DEBUG oslo_messaging._drivers.amqpdriver [req-07616727-922d-4678-aba7-0c42ce038e23 None] CAST unique_id: 040fae1d2cf74699a4a51ebba28a7dda FANOUT topic 'cinder-scheduler' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:431
2016-12-14 17:38:58.171 WARNING oslo.service.loopingcall [req-a0495cf6-fb7b-420a-a5cc-ae542c4c3985 None] Function 'cinder.service.Service.report_state' run outlasted interval by 51.65 sec
2016-12-14 17:38:58.177 DEBUG oslo_messaging._drivers.amqpdriver [req-07616727-922d-4678-aba7-0c42ce038e23 None] CAST unique_id: dc8b3ad123114dc782125fc03a3abfaa exchange 'openstack' topic 'cinder-scheduler' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:442
2016-12-14 17:38:58.179 DEBUG oslo_service.periodic_task [req-07616727-922d-4678-aba7-0c42ce038e23 None] Running periodic task VolumeManager._report_driver_status run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2016-12-14 17:38:58.180 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:58.206 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:58.235 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:58.420 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:59.125 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:59.410 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:59.704 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:38:59.953 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-14 17:39:00.625 DEBUG cinder.volume.drivers.rbd [req-07616727-922d-4678-aba7-0c42ce038e23 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
...

There are thousands of repeated log messages like this, it indicates that _connect_to_rados method is called frequently(serveral times in one second).

There are some important warning messages like "Function 'cinder.service.Service.report_state' run outlasted interval by 51.39 sec". It is not reasonable that periodic task runs for such a long time.

I use traceback.print_stack() to print the stack info in _connect_to_rados method:
  File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_service/service.py", line 722, in run_service
    service.start()
  File "/opt/stack/cinder/cinder/service.py", line 239, in start
    service_id=Service.service_id)
  File "/opt/stack/cinder/cinder/volume/manager.py", line 417, in init_host
    self.driver.init_capabilities()
  File "/opt/stack/cinder/cinder/volume/driver.py", line 705, in init_capabilities
    stats = self.get_volume_stats(True)
  File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 351, in get_volume_stats
    self._update_volume_stats()
  File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 336, in _update_volume_stats
    self._get_usage_info()
  File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 289, in _get_usage_info
    with RADOSClient(self) as client:
  File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 142, in __init__
    self.cluster, self.ioctx = driver._connect_to_rados(pool)^M
  File "/opt/stack/cinder/cinder/utils.py", line 802, in _wrapper^M
    return r.call(f, *args, **kwargs)^M
  File "/usr/lib/python2.7/site-packages/retrying.py", line 200, in call^M
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)

I find that the _connect_to_rados is called in _get_usage_info method.
Code url in github:
https://github.com/openstack/cinder/blob/d4fd5660736a1363a4e78480b116532c71b5ce49/cinder/volume/drivers/rbd.py#L398

define of _get_usage_info() method:
    def _get_usage_info(self):
        with RADOSClient(self) as client:
            for t in self.RBDProxy().list(client.ioctx):
                if t.startswith('volume'):
                    # Only check for "volume" to allow some flexibility with
                    # non-default volume_name_template settings. Template
                    # must start with "volume".
                    with RBDVolumeProxy(self, t, read_only=True) as v:
                        v.diff_iterate(0, v.size(), None, self._iterate_cb)

I find that all the volumes are traversed in _get_usage_info(), each volume has a rados connection. So we can see many connection info in volume log message.

I think this is the problem. If there are 100 volumes in rbd pool, it need to connect to rados for 100 times to get volume info, and 1000 times for 1000 volumes, 10000 times for 10000 volumes... This may bring greater pressure to both cinder and ceph cluster.

In my system, I have 501 volumes:
# rbd ls volumes | grep volume- | wc -l
501

The periodic task need to run a long time because of so many voluems need to be traversed. What's worse, it is longer
than report_interval(default 10s), this may lead service-down, that's why I got error state sometimes.

We can easily recreate the problem if there is enough volumes.

There is another issue, the origin patch which brings this problem want to provide Thin Provisioning stats:
https://review.openstack.org/#/c/178262/11

It use diff_iterate() API of librbd to get actual allocated size of volumes, but we just want virtual size by v.size() for provisioned_capacity_gb to over subscription. The capacity_filter of cinder-schedule uses requested size of user and provisioned_capacity_gb to compute ratio of over subscription:
https://github.com/openstack/cinder/blob/master/cinder/scheduler/filters/capacity_filter.py#L117

Since librbd haven't provide a API to get provisioned capacity of a pool yet, we just reserve this API and implement it in the future

zhangsong (zhangsong)
Changed in cinder:
assignee: nobody → zhangsong (zhangsong)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/410884

Changed in cinder:
status: New → In Progress
Revision history for this message
zhangsong (zhangsong) wrote :

Since cinder-scheduler just want virtual size for provisioned_capacity_gb to compute the ratio of over subscription, I try v.size() instead of v.diff_iterate() to get the total usage info of volumes, and I find that the perf is good in this case, no periodic task warning message like 'run outlasted interval by 51.39 sec' is printed any more:

...
2016-12-15 10:45:28.233 DEBUG oslo_service.periodic_task [req-293078dd-b94e-4638-8451-95f65dc73e86 None] Running periodic task VolumeManager._publish_service_capabilities run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2016-12-15 10:45:28.234 DEBUG cinder.manager [req-293078dd-b94e-4638-8451-95f65dc73e86 None] Notifying Schedulers of capabilities ... _publish_service_capabilities /opt/stack/cinder/cinder/manager.py:175
2016-12-15 10:45:28.234 DEBUG oslo_messaging._drivers.amqpdriver [req-293078dd-b94e-4638-8451-95f65dc73e86 None] CAST unique_id: 871a7dd974424b51b80fa8a7de3836e8 FANOUT topic 'cinder-scheduler' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:431
2016-12-15 10:45:28.236 DEBUG oslo_messaging._drivers.amqpdriver [req-293078dd-b94e-4638-8451-95f65dc73e86 None] CAST unique_id: 443f2fb0a39c46908bf8a3b1c29af068 exchange 'openstack' topic 'cinder-scheduler' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:442
2016-12-15 10:45:28.239 DEBUG oslo_service.periodic_task [req-293078dd-b94e-4638-8451-95f65dc73e86 None] Running periodic task VolumeManager._report_driver_status run_periodic_tasks /usr/lib/python2.7/site-packages/oslo_service/periodic_task.py:215
2016-12-15 10:45:28.239 DEBUG cinder.volume.drivers.rbd [req-293078dd-b94e-4638-8451-95f65dc73e86 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
2016-12-15 10:45:28.266 DEBUG cinder.volume.drivers.rbd [req-293078dd-b94e-4638-8451-95f65dc73e86 None] opening connection to ceph cluster (timeout=-1). _connect_to_rados /opt/stack/cinder/cinder/volume/drivers/rbd.py:220
...

So we can use v.size() directly to get capacity statistics of volumes, and this is the correct way.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/410884
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=b907450d7d7500fbcb9e4040efc2d0e42f4a78f9
Submitter: Jenkins
Branch: master

commit b907450d7d7500fbcb9e4040efc2d0e42f4a78f9
Author: zhangsong <email address hidden>
Date: Thu Dec 15 01:22:10 2016 +0800

    Correct RBD Provision stats&fix a perf problem

    Use v.size() to get virtual size of volumes for
    provisioned_capacity_gb instead of v.diff_iterate().
    There are two reasons:

    1.The diff_iterate() API of librbd returns actually
    allocated size of volumes, but cinder-scheduler just want
    virtual size by v.size() for provisioned_capacity_gb to
    compute the ratio of over subscription.

    2.Traverse all the volumes with diff_iterate() brings
    greater pressure to both cinder and ceph cluster. It
    can cause critical problems:
    a. Run a periodic task may need a long time(may be need
    serval minutes), depend on how many volumes in backend
    pool and the load of ceph cluster;
    b. Periodic report of the service can be blocked, and
    cause service down in the end.

    Change-Id: If37e3820ebd00c85e11f1fbb1f32faabf0890bbb
    Closes-Bug: #1649956

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 10.0.0.0rc1

This issue was fixed in the openstack/cinder 10.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/500047
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a0248f4921e6892459d1535ab98892d94ac32a5b
Submitter: Jenkins
Branch: master

commit a0248f4921e6892459d1535ab98892d94ac32a5b
Author: Anton Arefiev <email address hidden>
Date: Fri Sep 1 16:11:40 2017 +0300

    RBD: get provisioned capacity using same connection

    Creating new connection on getting rbd image is significant overhead.
    Collecting provisioned capacity spends most of its time on new
    connection establishment.

    This patch reuses previously created connection.

    Change-Id: I2c54383a865727d0035a20dcf34d860bde3db4f9
    Related-Bug: #1649956

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/508455

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/ocata)

Change abandoned by Gyorgy Szombathelyi (<email address hidden>) on branch: stable/ocata
Review: https://review.openstack.org/508455

Revision history for this message
Jack Ivanov (gunph1ld) wrote :

What is the workaround until the patch released in next Ocata version?

Revision history for this message
György Szombathelyi (gyurco) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/625970

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/625970
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=f2d5a9c312d3d4e35be65641f9947746d388ce94
Submitter: Zuul
Branch: stable/pike

commit f2d5a9c312d3d4e35be65641f9947746d388ce94
Author: Anton Arefiev <email address hidden>
Date: Fri Sep 1 16:11:40 2017 +0300

    RBD: get provisioned capacity using same connection

    Creating new connection on getting rbd image is significant overhead.
    Collecting provisioned capacity spends most of its time on new
    connection establishment.

    This patch reuses previously created connection.

    Change-Id: I2c54383a865727d0035a20dcf34d860bde3db4f9
    Related-Bug: #1649956
    (cherry picked from commit a0248f4921e6892459d1535ab98892d94ac32a5b)

tags: added: in-stable-pike
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.