when creating the thin lvm pool, its size is not calculated correctly

Bug #1245909 reported by Giulio Fidente on 2013-10-29
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
High
Jon Bernard
Havana
Undecided
Unassigned

Bug Description

when the thin lvm pool needs to be created, it is incorrectly sized

a 64G VG results in a 64M POOL

the offending code seems to be here: https://github.com/openstack/cinder/blob/d951222c1440b31f5e994d43b4277344e0d8f63f/cinder/brick/local_dev/lvm.py#L333

Jon Bernard (jbernard) on 2013-10-29
Changed in cinder:
assignee: nobody → Jon Bernard (jbernard)
John Griffith (john-griffith) wrote :

This should be addressed in the lvm driver similar to: https://github.com/openstack/cinder/commit/ec442e41d2d243003a42ed60ae862e96142a5cde#diff-034fc06633ae030d9ed88c5823456cdc

We should have a close look at other spots where this is still broken as well.

Changed in cinder:
status: New → Triaged
importance: Undecided → High
tags: added: havana-backport-potential
Jon Bernard (jbernard) wrote :

This bug raises two issues (that I can see):

  1. The size unit is missing. This is easy to fix, in the worst case the _sizestr() routine can be copied to the LVM driver. I would prefer not duplicating code, perhaps a class of useful LVM utility functions to house this sort of thing?

  2. When the unit is correct, there still exists a problem when the pool size falls on a 1GB boundary (e.g. 2GB). The call to lvcreate will attempt to create a pool of size 2G (which makes logical sense), but LVM translates that to extents, rounds up, and then cannot perform the allocation due to inadequate space. I would like to find an LVM option to disable this rounding or otherwise does the right thing, but I've not found anything yet. Any thoughts here? What is the exact behaviour that we expect from this routine?

Eric Harney (eharney) wrote :

We should at least first propose a fix that implements the expected current behavior (similar to _sizestr, adding a unit of GB) and gets this mostly working. This would be a good candidate for backporting to Havana.

I'm also of the opinion that this creation may be better done with extents rather than GB measurements... but that would be a further change outside of the immediate bugfix. (Involves changing update_volume_group_info() to gather free extents rather than space, etc.)

_sizestr() actually currently exists in the LVM driver (volume/drivers/lvm.py), but this change is needed in the LVM-related Brick code (brick/local_dev/lvm.py). IMO this means the LVM driver's sizestr() function should move to brick and the LVM driver can call it as lvm.sizestr(), since it already imports Brick's LVM code as "lvm".

Giulio Fidente (gfidente) wrote :

just adding some thoughts, I generally tend to agree with Eric on this "I'm also of the opinion that this creation may be better done with extents rather than GB measurements"

BUT

if we want to keep the "GB centric" behaviour which seems a recurring pattern, than maybe we can just ensure lvm is reporting the size in GB by passing the "--units G" option to vgs/lvs

Eric Harney (eharney) wrote :

Giulio: this is kind of a special case compared to the normal GB-centric behavior Cinder uses to manage volumes. The goal here is really only to determine how big of a thin pool LV we can create to consume 100% of the VG. Extents makes more sense in that context since you can be precise and not worry about rounding. Changing how Cinder volume LVs are created is a different matter and probably not something we want to do (at least not in reference to this bug).

When Cinder (Brick) collects info about space, it calls vgs --unit=g currently. (Which gives GiB measurements.)

Eric Harney (eharney) wrote :

Of course, reading the lvcreate man page, it looks like we could just do lvcreate -T -l 100%VG rather than measuring the space at all. Better route if supported on all versions of LVM...

Jon Bernard (jbernard) wrote :

What is the oldest version that openstack should reasonably support? Otherwise, that answers all of my questions. Thanks!

Eric Harney (eharney) wrote :

The -l 100%VG parameter shows up in CentOS 5 and Ubuntu 8.04 documentation, so, should be fine to use. (Newer versions of LVM than those are required to support thin-provisioning in the first place, so the ThinLVM path can certainly use this.)

https://www.centos.org/docs/5/html/Cluster_Logical_Volume_Manager/LV_create.html
http://manpages.ubuntu.com/manpages/hardy/man8/lvcreate.8.html

Jon Bernard (jbernard) wrote :

Unfortunately the '-l 100%VG' is not implemented (although it _is_ documented). It was added, and then another feature was added later that broke it. Here is the bugzilla link: https://bugzilla.redhat.com/show_bug.cgi?id=998347

So I'm currently thinking about parsing the output of 'vgs' and calculating a reasonable volume size based on the available free space and leaving a small amount for metadata. The amount of space needed for metadata is completely dependent on how much volume sharing there will be (basically snapshots). Do we expect lots of these?

Either way, the percentage of free space will be tunable with a reasonable default - that's my goal at least.

Eric Harney (eharney) wrote :

> The amount of space needed for metadata is completely dependent on how much volume sharing there will be (basically snapshots). Do we expect lots of these?

This point doesn't make sense to me in this context. We are trying to have the thin pool consume the entire VG (or close to it)... isn't any sharing/snapshotting consuming space from within that thin pool LV rather than from the VG outside of it? This may make more sense if just talking about thick LVM volumes and snapshots w/o a thin pool, but I'm not sure how it applies here.

Jon Bernard (jbernard) wrote :

You raise a good point, I really don't understand the thin pool implementation well enough to say for sure. My understanding is that metadata is stored in the volume group outside of the logical volume, including any thinly provisioned pools - but i could be wrong on that. The consensus I reached after asking on IRC was that some configurable amount of free space should remain when allocating the thin pool. I was leaning towards something like 5% for starters, what do you think?

Fix proposed to branch: master
Review: https://review.openstack.org/54833

Changed in cinder:
status: Triaged → In Progress
Eric Harney (eharney) on 2013-11-12
tags: added: drivers lvm
Eric Harney (eharney) on 2013-12-02
summary: - when creating the thin lvm pool, its size is not calculcated correctly
+ when creating the thin lvm pool, its size is not calculated correctly

Reviewed: https://review.openstack.org/54833
Committed: http://github.com/openstack/cinder/commit/d72914f739b1467ad849dd47fddd321965fed928
Submitter: Jenkins
Branch: master

commit d72914f739b1467ad849dd47fddd321965fed928
Author: Jon Bernard <email address hidden>
Date: Thu Nov 21 17:58:13 2013 -0500

    LVM: Create thin pools of adequate size

    Thin pools in LVM are quite different from volume groups or logical
    volumes and their differences must be taken into account when providing
    thin LVM support in Cinder.

    When you create a thin pool, LVM actually creates 4 block devices. You
    can see this after thin pool creation with the following command:

        $ dmsetup ls

        volumes--1-volumes--1--pool (253:4)
        volumes--1-volumes--1--pool-tpool (253:3)
        volumes--1-volumes--1--pool_tdata (253:2)
        volumes--1-volumes--1--pool_tmeta (253:1)

    In the above command, a thin pool named 'volumes-1-pool' was created in
    the 'volumes-1' volume group. Despite this, the 'lvs' command will only
    show one logical volume for the thin pool, which can be misleading if
    you aren't aware of how thin pools are implemented.

    When you create a thin pool, you specify on the command line a size for
    the pool. LVM will interpret this size as the amount of space requested
    to store data blocks only. In order to allow volume sharing and
    snapshots, some amount of metadata must be reserved in addition to the
    data request. This amount is calculated by LVM internally and varies
    depending on volume size and chunksize. This is why one cannot simply
    allocate 100% of a volume group to a thin pool - there must be some
    remaining space for metadata or you will not be able to create volumes
    and snapshots that are pool-backed.

    This patch allocates 95% of a volume group's free space to the thin
    pool. By doing this, we allow LVM to successfully allocate a region for
    metadata. Additionally, any free space remaining will by dynamically
    used by either data or metadata if capacity should become scarce.

    The 95/5 split seems like a sane default. This split can easily (and
    probably should) be made user-configurable in the future if the user
    expects an abnormal amount of volume sharing.

    Change-Id: Id461445780c1574db316ede0c0194736e71640d0
    Closes-Bug: #1245909

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2014-01-22
Changed in cinder:
milestone: none → icehouse-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-04-17
Changed in cinder:
milestone: icehouse-2 → 2014.1

Reviewed: https://review.openstack.org/60947
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=0553eb601fd8a1d0d365d8a4858a771efb034442
Submitter: Jenkins
Branch: stable/havana

commit 0553eb601fd8a1d0d365d8a4858a771efb034442
Author: Jon Bernard <email address hidden>
Date: Thu Nov 21 17:58:13 2013 -0500

    LVM: Create thin pools of adequate size

    Thin pools in LVM are quite different from volume groups or logical
    volumes and their differences must be taken into account when providing
    thin LVM support in Cinder.

    When you create a thin pool, LVM actually creates 4 block devices. You
    can see this after thin pool creation with the following command:

        $ dmsetup ls

        volumes--1-volumes--1--pool (253:4)
        volumes--1-volumes--1--pool-tpool (253:3)
        volumes--1-volumes--1--pool_tdata (253:2)
        volumes--1-volumes--1--pool_tmeta (253:1)

    In the above command, a thin pool named 'volumes-1-pool' was created in
    the 'volumes-1' volume group. Despite this, the 'lvs' command will only
    show one logical volume for the thin pool, which can be misleading if
    you aren't aware of how thin pools are implemented.

    When you create a thin pool, you specify on the command line a size for
    the pool. LVM will interpret this size as the amount of space requested
    to store data blocks only. In order to allow volume sharing and
    snapshots, some amount of metadata must be reserved in addition to the
    data request. This amount is calculated by LVM internally and varies
    depending on volume size and chunksize. This is why one cannot simply
    allocate 100% of a volume group to a thin pool - there must be some
    remaining space for metadata or you will not be able to create volumes
    and snapshots that are pool-backed.

    This patch allocates 95% of a volume group's free space to the thin
    pool. By doing this, we allow LVM to successfully allocate a region for
    metadata. Additionally, any free space remaining will by dynamically
    used by either data or metadata if capacity should become scarce.

    The 95/5 split seems like a sane default. This split can easily (and
    probably should) be made user-configurable in the future if the user
    expects an abnormal amount of volume sharing.

    Change-Id: Id461445780c1574db316ede0c0194736e71640d0
    Closes-Bug: #1245909
    (cherry picked from commit d72914f739b1467ad849dd47fddd321965fed928)

tags: added: in-stable-havana
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.