Read test against empty and thin-provisioned volume doesn't reflect the actual performance

Bug #2043167 reported by Nobuto Murata
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Woodpecker Charm
Fix Committed
Undecided
Unassigned

Bug Description

We need to have more investigation, but it looks like 4M read test using woodpecker is too good to be true.

My gut feeling is that RBD volume is thin-provisioned by default in Ceph so the 4M read test is like reading data from 0-byte image file and it's assured that any block returns 0. So I suppose the test needs to be against a thick-provisioned volume so the access will be spread across multiple OSDs or actual random data should be written before doing the read test.

https://docs.ceph.com/en/quincy/rbd/rados-rbd-cmds/
> Ceph Block Device images are thin provisioned. They don’t actually
> use any physical storage until you begin saving data to them.

https://docs.ceph.com/en/quincy/man/8/rbd/
> create (-s | –size size-in-M/G/T) [–image-format format-id]
> [–object-size size-in-B/K/M] [–stripe-unit size-in-B/K/M
> –stripe-count num] [–thick-provision] [–no-progress]
> [–image-feature feature-name]… [–image-shared] image-spec
>
> Will create a new rbd image. You must also specify the size via
> –size. The –stripe-unit and –stripe-count arguments are optional,
> but must be used together. If the –thick-provision is enabled, it
> will fully allocate storage for the image at creation time. It will
> take a long time to do. Note: thick provisioning requires zeroing the
> contents of the entire image.

Nobuto Murata (nobuto)
tags: added: field-ceph-dashboard
Revision history for this message
Nobuto Murata (nobuto) wrote :
summary: - Read test against empty and thin-provisioned doesn't reflect the actual
- performance
+ Read test against empty and thin-provisioned volume doesn't reflect the
+ actual performance
Revision history for this message
Luciano Lo Giudice (lmlogiudice) wrote :

It makes sense that thin provisioned images will turn reads into a no-op. FWIW, the Ceph team is thinking of tackling performance regression testing with the Woodpecker charm, so we'll most likely take a look at this. It seems to me like using thick-provisioned images by default would be the sanest, as it would ensure that storage is allocated before any IO is done on it.

Revision history for this message
Nobuto Murata (nobuto) wrote :

Confirmed the theory.

[thin provisioning - 4M read]

   READ: bw=12.6GiB/s (13.5GB/s), 12.6GiB/s-12.6GiB/s (13.5GB/s-13.5GB/s), io=126GiB (135GB), run=10001-10001msec

[thick provisioning - 4M read]

   READ: bw=346MiB/s (363MB/s), 346MiB/s-346MiB/s (363MB/s-363MB/s), io=3460MiB (3628MB), run=10005-10005msec

====

rbd create --pool ceph-iscsi --size 1G volume_with_thin_provisioning

rbd create --pool ceph-iscsi --size 1G --thick-provision volume_with_thick_provisioning

fio --ioengine=rbd \
    --rw=read --bs=4M --numjobs=1 --group_reporting=1 \
    --runtime=10 --time_based \
    --clientname=admin --pool=ceph-iscsi \
    --rbdname=volume_with_thin_provisioning \
    --name=thin_provisioning

fio --ioengine=rbd \
    --rw=read --bs=4M --numjobs=1 --group_reporting=1 \
    --runtime=10 --time_based \
    --clientname=admin --pool=ceph-iscsi \
    --rbdname=volume_with_thick_provisioning \
    --name=thick_provisioning

Revision history for this message
Samuel Allan (samuelallan) wrote :
Revision history for this message
Nobuto Murata (nobuto) wrote (last edit ):

Direct RBD testing was as above, and I tested the OpenStack scenario with Ciner-Ceph storage.

[default - 4M read]

   READ: bw=1273MiB/s (1334MB/s), 1273MiB/s-1273MiB/s (1334MB/s-1334MB/s), io=74.6GiB (80.1GB), run=60001-60001msec

[after dd if=/dev/zero - 4M read]

   READ: bw=364MiB/s (381MB/s), 364MiB/s-364MiB/s (381MB/s-381MB/s), io=21.3GiB (22.9GB), run=60002-60002msec

How to properly configure the benchmarking is a bit trickier than the direct RBD scenario since I don't know a way to enable thick provisioning per volume basis through Cinder as far as I see. So all blocks in the volume have to be pre-filled before running the read benchmark unless fio has a handy option to do so.

====

openstack volume create --size 1 volume_default_thin_provisioned

openstack server add volume test-instance-1 \
    volume_default_thin_provisioned --device /dev/vdb

# fio --ioengine=libaio --direct=1 \
    --rw=read --bs=4M \
    --numjobs=1 --group_reporting=1 \
    --runtime=60 --time_based \
    --filename /dev/vdb \
    --name=thin_provisioning

# dd if=/dev/zero of=/dev/vdb bs=4M oflag=sync status=progress

## once again

# fio --ioengine=libaio --direct=1 \
    --rw=read --bs=4M \
    --numjobs=1 --group_reporting=1 \
    --runtime=60 --time_based \
    --filename /dev/vdb \
    --name=thin_provisioning

Revision history for this message
Samuel Allan (samuelallan) wrote :

I opened https://github.com/openstack-charmers/charm-woodpecker/pull/8 to address the scenarios of volumes attached via the `test-devices` juju storage or the `disk-devices` fio action parameter.

Revision history for this message
Nobuto Murata (nobuto) wrote :
Changed in charm-woodpecker:
status: New → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.