CapacityFilter calculates free virtual capacity incorrectly

Bug #1704786 reported by Arnon Yaari on 2017-07-17
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Arnon Yaari

Bug Description

max_oversubscription_ratio is used throughout Cinder as the ratio between the total physical capacity and the total virtual capacity (virtual capacity = thin-provisioned capacity). The function caculate_virtual_free_capacity takes a backend and calculates the virtual free capacity by taking *total_capacity* and multiplying it by max_over_subscription_ratio before subtracting the space already provisioned and the space reserved.
Even though max_over_subscription_ratio deals with the *total* virtual/physical capacity, some places in the code treat this differently:
* The Pure Storage volume driver returns the ratio between provisioned space and used (written) space. This is the ratio between currently allocated and written data, not totals available on the array.
* The Kaminario volume driver does the same, although the driver doesn't even report this capability correctly (returns "max_oversubscription_ratio" instead of "max_over_subscription_ratio")

I'm assuming that the drivers are calculating the value wrong (or I don't understand the values they use) because the rest of Cinder itself treats max_over_subscription_ratio as *total* virtual / *total* physical.

There is one other place in Cinder which uses max_over_subscription_ratio in an inconsistent manner: CapacityFilter calculates "adjusted_free_virtual" as "free * max_over_subscription_ratio" where "free" is "free physical capacity". This calculation can never be accurate because the ratio between the free physical space and the free virtual space is not constant. For example:

If max_over_subscription_ratio is 1.2 and the physical capacity of the backend is 1 TB, then the total virtual capacity of the backend is 1.2 TB. If 200 GBs are then allocated *and written*, then "free physical space" will be 800 GB and free virtual space will be 1 TB. In this case "free * max_over_subscription_ratio" (800 * 1.2) will be 960 GB - not 1 TB - so the calculation is incorrect.
The function caculate_virtual_free_capacity is made for the correct calculation, and CapacityFilter should use it.

Fix proposed to branch: master
Review: https://review.openstack.org/484354

Changed in cinder:
assignee: nobody → Arnon Yaari (arnony)
status: New → In Progress
Gorka Eguileor (gorka) wrote :

We are using max_over_subscription_ratio based on free capacity because we could have other data being stored on the same pool, and then over provisioning would be pure chaos.

I agree that we should consolidate the way we do calculations, but first we would need to make a statement on the assumptions that are made regarding the use of the pools used for cinder (ie: exclusive use by Cinder, and by 1 and only 1 service).

Changed in cinder:
status: In Progress → Opinion
Arnon Yaari (arnony) wrote :

The problem with using max_over_subscription_ratio to calculate free virtual capacity based on free physical capacity is that this ratio is not constant and will not lead to the correct value.

Here are two examples, similar to what I wrote above:

1. Physical capacity: 1 TB. Virtual capacity: 1.2 TB. Provisioned capacity: 200 GB. Written data: 100 GB. In this case: free physical: 900 GB, free virtual: 1TB. ratio would be 1000/900 = 1.11
2. Same as above but now we write more data - 200 GB. Now free physical is 800 GB and free virtual is still 1 TB. Ratio is now 1.25.

i.e. we cannot calculate free virtual capacity based on free physical capacity with a constant ratio. The calculation in CapacityFilter is not correct, and it's not a matter of opinion of what we decide this ratio is.

Since we're using a constant ratio, this ratio *has* to be between totals and not between currently allocated and written. Cinder also treats it like that in every other place in the code.

I'm not sure how other data being stored on the same pool changes that (and isn't that what "reserved" is for?)

Gorka Eguileor (gorka) wrote :

You can think of our over provisioning calculation as an adaptive calculation, where, as you say, we'll be evaluating if there will be enough capacity based on the current free space.

You may not agree with the current mechanism, but there are always cases in favor of both approaches, for example:

Physical 1TB
Virtual 1.2TB
Provisioned: 1TB
Written: 1TB

If under this circumstances you request a 1GB disk to be created from an image then with your approach it would succeed the scheduling phase but fail while trying to write data, while with current code it would fail to schedule the creation because there is not enough space.

That's why I say that changing current calculation it's just an opinion.

Reserved is used to ensure that created volumes will have enough space to add new data, not to count for another cinder volume service using the same pool and therefore reducing available space.

Arnon Yaari (arnony) wrote :

Regarding the example: over-provisioning always means that writing data may fail. That's why it's called "over". I understand now why it can be considered a matter of opinion - because in my opinion the creation should succeed in that case - but this opinion comes because that's the behavior of over-provisioning in general and storage arrays generally follow this logic and allow creation. It's also the way the feature is documented:
"Default ratio is 20.0, meaning provisioned capacity can be 20 times of the total physical capacity". It doesn't say anything about free capacities.
Should we bring this up in a Cinder meeting to get consensus?

In either case (no matter whose opinion we follow), the "adaptive" approach simply doesn't work... Again - because the ratio is not constant. There is no real arithmetical connection between physical free space and virtual free space - so I don't understand how the approach in CapacityFiler makes sense and I don't think we should leave it like that.

Gorka Eguileor (gorka) wrote :

Yes, probably best to discuss this in the meeting.

Xing Yang (xing-yang) wrote :

Please take a look of the following patch set which included a lengthy discussion on the exact same
line you are trying to change in this patch. After discussion, we decided it is safer to keep this calculation. Also we decided not to use utils.calculate_virtual_free_capacity() which is used in the weigher. In weigher, we have to use one formula to come up with a number for ranking. In filter, there are multiple steps to filter things out. So they are not exactly the same.

https://review.openstack.org/#/c/189261/3/cinder/scheduler/filters/capacity_filter.py

After discussion, we agreed just to rename the variable to adjusted_free_virtual:
https://review.openstack.org/#/c/214276/ (patch to rename variable name to adjusted_free_virtual)

Here is a list of patches from the past on oversubscription:

Oversubscription in thin provisioning:
https://blueprints.launchpad.net/cinder/+spec/over-subscription-in-thin-provisioning

Initial spec review:
https://review.openstack.org/#/c/129342/

Code in scheduler to introduce oversubscription ratio:
https://review.openstack.org/#/c/142171

Code to add provisioned_capacity:
https://review.openstack.org/147934

Update spec after code is merged:
https://review.openstack.org/153469

Fix capacity filter to allow oversubscription:
https://review.openstack.org/#/c/185764/

Attempt to not to use virtual free capacity but it is abandoned after lengthy discussion:
https://review.openstack.org/#/c/189261/

Patch to rename variable to adjusted_free_virtual following the discussion in https://review.openstack.org/#/c/189261/:
https://review.openstack.org/#/c/214276/1

Differentiate thick and thin provisioning:
https://blueprints.launchpad.net/cinder/+spec/differentiate-thick-thin-in-scheduler
https://review.openstack.org/#/c/315352/

Huang Zhiteng (zhiteng-huang) wrote :

I am re-posting my comment from the patch.

calculate_virtual_free_capacity does this:
   'free' = total * max_osr - provisioned (note it doesn't used used capacity).

There is significant difference between provisioned capacity and actual used capacity. What current behavior does is:
   'free' = total * max_osr - used * max_osr = (total - used) * max_osr

Does it make sense to count used capacity more than it is actually consumed like we are doing right now? I think so.

Let's look deeper into max_osr, and find out where it should be applied:
  1) total / provisioned <= max_osr
  2) used / provisioned <= 1/max_osr
1) is so obvious that we all agree; 2) is not mentioned in earlier discussions but it's actually quite intuitive too. Because max_osr basically is percentage that admin thinks (at certain confident level) that on average consumption ratio of all volumes don't go beyond/over, otherwise he/she would be in trouble. So 1/max_osr = max_consumption_ratio. Now, let's calculate 'free' capacity with both 1) and 2) in mind we have:
  3) 'free' <= total * max_osr - provisioned
  4) 'free' * max_consumption_ratio <= total - used
4) can be changed to
  5) 'free' <= (total - used) * max_osr

The meaningful 'free' should be the lower value calculated from 3) and 5), and that's exactly what we are doing now.

Huang Zhiteng (zhiteng-huang) wrote :

I think the fundamental disagreement between me and Yaari is the interpretation of 'over-provision'. I can accept that 'over provision' means at some point service level would become degraded. But a storage device becoming read-only is not degraded service to me, and when this happens to a storage cluster, it means read-only for everybody. That is disaster to me.

Arnon Yaari (arnony) wrote :

Your approach causes a problem where max_over_subscription_ratio is being used (and calculated, and documented) very differently in different places.
If I'm a storage admin (or storage device) who knows exactly, in absolute values, the following:
1. Total virtual capacity (amount storage backend will let me provision)
2. Total physical capacity (amount storage backend will let me write)
3. Used virtual (amount I provisioned)
4. Used physical (amount I wrote)
What should I write or return in max_over_subscription_ratio? (remember that even if the volume driver calculates this, the ratio is not really dynamic - it is an extra spec):
1. should I return total virtual / total physical to make the calculation in calculate_virtual_free_capacity correct?
2. should I return free virtual / free physical to make the calculation in CapacityFilter correct?
3. should I return used virtual / used physical to return the same as other volume drivers like Pure's and Kaminario's?

As a volume driver developer, after this discussion I'm leaning towards 4: not calculate this and just return the value in the configuration, so the operator will have to break his head about what this means. That seems to be what all the other vendors are doing.

Gorka Eguileor (gorka) wrote :

I have proposed a spec where I propose supporting standard over-provisioning calculations: https://review.openstack.org/490116

Arnon Yaari (arnony) wrote :

The inconsistency between the two calculations method has been resolved by spec:
https://specs.openstack.org/openstack/cinder-specs/specs/queens/provisioning-improvements.html
and commit https://review.openstack.org/#/c/534854/

Changed in cinder:
status: Opinion → Invalid
status: Invalid → Fix Released

Change abandoned by Arnon Yaari (<email address hidden>) on branch: master
Review: https://review.openstack.org/484354
Reason: The inconsistency between the two calculation methods has been resolved by spec:
https://specs.openstack.org/openstack/cinder-specs/specs/queens/provisioning-improvements.html
and commit https://review.openstack.org/#/c/534854/

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers