Datastore selection in cinder vmdk driver is too slow

Bug #1521894 reported by Man Li Qi
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Wishlist
Man Li Qi
OpenStack Compute (nova)
Invalid
Undecided
Unassigned

Bug Description

Detail description and steps:
1. Configure vmdk as the volume_driver in cinder.conf, restart cinder corresponding service. Just like below:
volume_driver=cinder.volume.drivers.vmware.vmdk.VMwareVcVmdkDriver
2. Create a volume with command with "cinder create..."

Current problem:
If there are many hosts in the environment, for example 200 hosts, and the top hosts of the hosts list don't meet the requirement, it will lead openstack to interactive many times with vcenter(api calls), and it will cause the performance of volume creation very bad. The time cost mainly focuses on datastore selection phase.

Expected result:
Even though there are many hosts in the environment, the duration for volume creation(datastore selection) with vmdk driver is within acceptable range.

Tags: vmware
Revision history for this message
wanghao (wanghao749) wrote :

Looks like it is issue for VMware driver in cinder. Not for nova.

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

@Tracy Jones: For your information. Please correct the project assignment if I were wrong.

Changed in nova:
status: New → Invalid
Revision history for this message
Duncan Thomas (duncan-thomas) wrote :

Yes, this looks like a cinder bug. Is it possible to get the scheduler logs for a case of this happening? That would help me to understand the issue better. The VMware driver is a bit odd compared to most cinder drivers. Most of the cinder devs don't have an environment to reproduce this.

Thanks.

Setting to incomplete until we have logs, either from the reporter or somebody else.

Changed in cinder:
status: New → Incomplete
Revision history for this message
Vipin Balachandran (vbala) wrote :

The problem is due to the current way of selecting datastores. As mentioned in the bug description, we iterate over each of the ESX hosts until we find a suitable datastore for volume creation. But we can workaround this by using shared datastores. In fact our datastore selection prefers shared datastores so that we can reduce the migration of volumes from one datastore to another if the instance's ESX host cannot access it.

Another workaround is to use vmware_cluster_name option using which can restrict the placement of volumes to specific vCenter clusters.

Changed in cinder:
assignee: nobody → Vipin Balachandran (vbala)
status: Incomplete → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Qin Zhao (zhaoqin) wrote :

@Vipin, does your "shared datastores" mean to let all ESX hosts in the cluster to use one big datastore?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/252810

Changed in cinder:
assignee: Vipin Balachandran (vbala) → Man Li Qi (qimanli)
status: Confirmed → In Progress
Revision history for this message
lee jian (leejian0612) wrote :

As the bug reported, the datastore is binded to the ESX host, and will be polled in the secquence of the host, so the available datastore will be exhausted with more and more volume created on it, and we need search more and more ESX hosts, to get a new available datastore, the patch(https://review.openstack.org/252810) just random the access for the ESX host, it will be valuable, when all the datastore on previous ESX host not statisfied, but depend too much on the luck.

I see @Vipin add comments on the patch, and said maybe using cache for datastore/host is a good choice, I aggre with this but do not aggree on using a periodically call to sync the cache, my reason is that, when the environment is large enough, each sync will query all the ESX hosts on the vcenter, and that will cost a lot of time, and if we increase the interval for the peroidcal call, the cache may not be accurate enough, and be valueless.

For this bug, I think the question is that, how to select an available datastore for the volume in a large environment quickly, right? we don't have to select the best proper datastore, since this will be much complicated than selecting an available one. And to improve the performence, we should focus on reducing the access to the vcenter, which will cost most of time.

So my idea is that, we can based on both the random access and cache mechnism, such as we can cache the datastore/host info by last access to the vcenter, and these info can be used for next choice, and we also can set a timout value for the cache entry to make sure the info in the cache is fresh. That sounds a little like arp cache in the netwroking.......

Revision history for this message
Vipin Balachandran (vbala) wrote :

>so the available datastore will be exhausted with more and more volume created on it
It will not happen if you have shared datastores which is the recommended configuration to minimize volume migrations during attach.

>I aggre with this but do not aggree on using a periodically call to sync the cache, my reason is that, when the environment is large enough, each sync will query all the ESX hosts on the vcenter, and that will cost a lot of time,

I said to sync using PropertyCollector which is meant for monitoring changes and we do not need to query all ESX hosts.

Revision history for this message
lee jian (leejian0612) wrote :

@Vipin, thanks for your patience to explain, but I still have some questions,

>>so the available datastore will be exhausted with more and more volume created on it
>It will not happen if you have shared datastores which is the recommended configuration to minimize volume migrations during attach.
Not agree on this , first see the question in commnet 5 post by zhao qin, we can not image all the users use a share datastore, and from the code, we can see the volume will be created on the first available datastore, that will lead the datastore exhausted, due to this, we will communicate with the vcenter more and more to get a new available one, and this is why we report this bug, right? so my point is that, the function select_datastore should be improved.
https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/vmware/datastore.py#L215

>>I aggre with this but do not aggree on using a periodically call to sync the cache, my reason is that, when the environment is large enough, each sync will query all the ESX hosts on the vcenter, and that will cost a lot of time,
>I said to sync using PropertyCollector which is meant for monitoring changes and we do not need to query all ESX hosts.
For PropertyCollector, do you mean this function?
https://github.com/openstack/oslo.vmware/blob/f79beeb23e127a2f18c6e4242a4d9a69ab34dcbe/oslo_vmware/vim_util.py#L288
if you can give more details on PropertyCollector, and how to use it to monitor the datastore changes for all the ESX host, that will be great!
Here is doc I found for PropertyCollector,
http://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.wssdk.pg.doc%2FPG_PropertyCollector.7.5.html&resultof=%22PropertyCollector%22%20%22propertycollector%22

Revision history for this message
Vipin Balachandran (vbala) wrote :

See https://bugs.launchpad.net/cinder/+bug/1556902

The fix for this bug reduced the volume creation time significantly (by almost 92% for 25 concurrent volume create operations) for large vCenter installations (tested for 270 hosts). The patch removed the logic of iterating over the host list and selecting an appropriate datastore. Instead, we retrieve all the datastores and its properties in a single vCenter API call, then filter out unusable datastores and finally select the best datastore from the filtered list. Since the best datastore selection uses space utilization, we will not overload a particular set of datastores. Also, the new approach does not require a cache maintained using PropertyCollector because we query all the information in a single API call.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/296934
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=12afec15f265e0c3abf681a6c295987017dab779
Submitter: Jenkins
Branch: master

commit 12afec15f265e0c3abf681a6c295987017dab779
Author: Vipin Balachandran <email address hidden>
Date: Thu Mar 17 01:08:03 2016 -0700

    VMware: Reduce volume creation time

    The volumes (virtual disks) are created on datastores connected to
    ESX hosts in vCenter clusters. Currently we create a list of hosts,
    filter out hosts in maintenance mode (using one vCenter API call per
    host), and then iterate over the filtered list to select a datastore
    which is mounted to maximum number of ESX hosts. Ties are broken
    based on space utilization.

    This approach is not scalable for vCenter installations with large
    number of ESX hosts. Also, we always consider the ESX hosts in the
    same order and may end up selecting the same set of datastores. This
    patch fixes these issues with the datastore selection. It creates a
    list of all datastores in vCenter, which is then filtered to remove
    unusable datastores and datastores not satisfying the volume requir-
    ements. Finally, we select a datastore from the filtered list based
    on the number of ESX hosts to which it is mounted and space utiliza-
    tion.

    Closes-bug: #1556902
    Related-bug: #1521894
    Change-Id: I0c505cbe1c82b9cbd2918a223a53a800a9bc7931

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/317405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/mitaka)

Change abandoned by naga venkata (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/317405

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/252810
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Is more work needed here?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.