scheduler doesn't consume space properly for create_volume

Bug #1974078 reported by Walt Boring
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
High
Walt Boring

Bug Description

The scheduler's host_manager call consume_from_volume ignores pools when marking allocated_capacity_gb as consumed. This will allow overcommitting against a thick provisioned poo between get volume stats updates from the volume manager. This is a common problem when customers create lots of volumes in a short period of time.

https://github.com/openstack/cinder/blob/dba355a10d1cddd1f8b428aa6a4be89c2515693b/cinder/scheduler/host_manager.py#L335-L352

The revert_volume_consumed_capacity does use pools
https://github.com/openstack/cinder/blob/dba355a10d1cddd1f8b428aa6a4be89c2515693b/cinder/scheduler/host_manager.py#L739-L747

Changed in cinder:
assignee: nobody → Walt Boring (walter-boring)
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Hey Walt, I hope this message reaches you well

Would you please let me know the following information:
- Are you using master or any other version
- Do you have any logs in c-sch log or any other place
- Please let us know the step so the team can reproduce the problem.

Thanks a lot

Changed in cinder:
importance: Undecided → High
tags: added: c-sch scheduler
Revision history for this message
Walt Boring (walter-boring) wrote :

This is running on master.

I think I have finally tracked the source of the problem here.

The allocated_capacity_gb is updated in the backend_state.allocated_capacity_gb. The problem is that the get_pools only looks in the PoolState's capabilities dictionary, which is only updated every get_volume_stats time.

cinder get-pools --detail eventually boils down to a call to the host_manager's get_pools().
This builds a readonly dictionary ONLY from the PoolState class's capabilities here:
https://github.com/openstack/cinder/blob/master/cinder/scheduler/host_manager.py#L818

This is completely out of date.
What the scheduler actually uses is the PoolState's attributes such as
https://github.com/openstack/cinder/blob/master/cinder/scheduler/host_manager.py#L127-L129

Those are updated, but the capabilities dictionary is not.

Revision history for this message
Walt Boring (walter-boring) wrote :

Here you can see the method
consume_from_volume, which updates the attributes for the pool (PoolState) object.
This updates the allocated_capacity_gb, free_capacity_gb.

https://github.com/openstack/cinder/blob/master/cinder/scheduler/host_manager.py#L336-L353

The pool's capabilities is NOT updated, and hence the get_pools call returns out of date values
for the pool between get_volume_stats() calls.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.