[RBD] rbd_store_chunk_size in megabytes is an unwanted limitation

Bug #1971154 reported by Alexander Binzxxxxxx
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Wishlist
Eric Harney

Bug Description

In cinder ceph RBD backend configuration one can set rbd_store_chunk_size for ceph object size hints which will override cephs default object size of the pool. This defaults to 4 and is given in megabytes while ceph configuration would take a object size as a power of 2 in rbd_default_order (default 22). In ceph 2^12 (4kB) to 2^25 (32MB) (inclusive) is allowed but the cinder minimum allowed config is 1 (1MB). Cinder does also not allow float numbers for this config option. The same applies to Glance.

For a setup where bandwidth is what you need or want this is perfectly fine. For my usage iops is king so I would like the object size to be lower then 1MB. (4KB and 8KB are iops wise very similar but 4KB has lost too much on bandwidth, 32KB or 64KB would be optimal for my usecase, 128KB or 256KB would already be a big improvement.

If there is no mayor reason why the configuration is so limiting I would like the option to get the full potential out of my ceph cluster. So I suggest two small changes:
1) use a order based config value matching the ceph way of configuration or at least allow float numbers for rbd_store_chunk_size
2) Why force the hint to be set at all and not just fall back to ceph pool config on ceph side with rbd_default_order

The same applies to Glance.

Tags: drivers rbd xena
tags: added: xena
Changed in cinder:
importance: Undecided → Wishlist
tags: added: drivers rbd
summary: - rbd_store_chunk_size in megabytes is unwanted limitation
+ [RBD] rbd_store_chunk_size in megabytes is an unwanted limitation
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Alexander Binzxxxxxx,

We do not have much concrete performance data to analyze, so more information about the actual problem would be helpful. Please remember to update this bug report.

It is a good idea to improve this, but it needs a lot of consideration. We should address this area because there are still some vulnerabilities that we need to address with RBD including sector sizes (512 vs 4k). However, adding a new configuration value that allows developers to set arbitrary values is not necessarily the right solution.
Regarding question 2: Specifying the chunk size prevents situations where images cannot be moved between pools during migration, such as cinder<->glance, etc. However, it is a good question that should be investigated further.

This bug was discussed in the bug session this week: https://meetings.opendev.org/meetings/cinder_bs/2022/cinder_bs.2022-05-04-15.01.log.html#l-31

Revision history for this message
Alexander Binzxxxxxx (devil000000) wrote :

well the size used is of course a tradeoff between bandwidth and iops but since you can change is per pool as well as per rdb object (on ceph not in openstack) you could configure it to your needs.

ceph performance data can be found a lot but here are some of my simplified and brief results:
command used:
rados bench -p volumes 5 write -b 4096 -t 2048 -O $size_in_bytes
on the cluster here:
4k => avg iops: 32k bandwidth: 125MB/s
8k => avg iops: 34k bandwidth: 134MB/s
16k => avg iops: 38k bandwidth: 148MB/s
32k => avg iops: 40k bandwidth: 157MB/s
64k => avg iops: 41k bandwidth: 160MB/s
128k => avg iops: 39k bandwidth: 154MB/s
256k => avg iops: 36k bandwidth: 143MB/s
512k => avg iops: 32k bandwidth: 124MB/s
1M => avg iops: 25k bandwidth: 100MB/s
2M => avg iops: 18k bandwidth: 71MB/s
4M => avg iops: 14k bandwidth: 53MB/s
8M => avg iops: 10k bandwidth: 41MB/s
note that a lot of caching is involved here to dampen the direct influence on disk write speeds so take my numbers with a grain or gram of salt. also in this test there may other factors involved.

Regarding question 2: as far as i know the client can overrule the sizes and transfers should be possible anyway. so a ceph pool may contain differently chunked RDB images anyway. Even if the RDB would need rechunking somewhere on a pool transfer this may be the better solution anyway.

Eric Harney (eharney)
Changed in cinder:
assignee: nobody → Eric Harney (eharney)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.