[VMware] Creating snapshot stuck at "Image uploading" state

Bug #1498645 reported by Giridhar Jayavelu
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Glance
Won't Fix
Undecided
Giridhar Jayavelu

Bug Description

- Started devstack from master branch
- Created 2 instances from same image (cirros-0.3.2-i386-disk) which is backed by iscsi datastore on vsphere
Creating snapshot of instance 1 got stuck at "Image Uploading" state.

Noticed "Operation timed out" error while executing "Export OVF template" task.

When only one instance is created from the image, snapshot creation works fine.

vpxd.log
------------
2015-09-22T20:25:49.681Z info vpxd[7F5A52B6A700] [Originator@6876 sub=vpxLro opID=257026c4-1697-4696-b644-8e8533a3ef91-12119-ngc-8] [VpxLRO] -- BEGIN task-internal-209954 -- datastore-60 -- vim.Datastore.GetBrowser -- 52a48e8a-5d02-e8d3-f168-22a925987f0b(52100a84-20a1-b773-bec2-1e9f6ab7c5a4)
2015-09-22T20:25:49.682Z info vpxd[7F5A52B6A700] [Originator@6876 sub=vpxLro opID=257026c4-1697-4696-b644-8e8533a3ef91-12119-ngc-8] [VpxLRO] -- FINISH task-internal-209954
2015-09-22T20:25:49.686Z info vpxd[7F5A52C6C700] [Originator@6876 sub=vpxLro opID=257026c4-1697-4696-b644-8e8533a3ef91-12120-ngc-b4] [VpxLRO] -- BEGIN session[52a48e8a-5d02-e8d3-f168-22a925987f0b]52ff9143-1a53-1738-6c4c-4226b0667c04 -- datastoreBrowser-datastore-60 -- vim.host.DatastoreBrowser.search -- 52a48e8a-5d02-e8d3-f168-22a925987f0b(52100a84-20a1-b773-bec2-1e9f6ab7c5a4)
2015-09-22T20:25:49.806Z info vpxd[7F5A52C6C700] [Originator@6876 sub=vpxLro opID=257026c4-1697-4696-b644-8e8533a3ef91-12120-ngc-b4] [VpxLRO] -- FINISH session[52a48e8a-5d02-e8d3-f168-22a925987f0b]52ff9143-1a53-1738-6c4c-4226b0667c04
2015-09-22T20:25:50.643Z info vpxd[7F5A5AA03700] [Originator@6876 sub=VAppExport opID=5d311e85-e] [ExportTaskMo] Timed out
2015-09-22T20:25:50.643Z info vpxd[7F5A5AA03700] [Originator@6876 sub=VAppExport opID=5d311e85-e] [ExportTaskMo] Task timed out
2015-09-22T20:25:50.643Z info vpxd[7F5A5AA03700] [Originator@6876 sub=MoHttpNfcLease opID=5d311e85-e] [HttpNfcLeaseMo] Releasing HTTP-NFC ticket
2015-09-22T20:25:50.649Z warning vpxd[7F5A5AA03700] [Originator@6876 sub=VpxProfiler opID=5d311e85-e] VpxLro::LroMain [TotalTime] took 300023 ms
2015-09-22T20:25:50.649Z info vpxd[7F5A5AA03700] [Originator@6876 sub=vpxLro opID=5d311e85-e] [VpxLRO] -- FINISH task-872
2015-09-22T20:25:50.649Z info vpxd[7F5A5AA03700] [Originator@6876 sub=Default opID=5d311e85-e] [VpxLRO] -- ERROR task-872 -- vm-176 -- VirtualMachine.ExportVmLRO: vim.fault.Timedout:
--> Result:
--> (vim.fault.Timedout) {
--> faultCause = (vmodl.MethodFault) null,
--> msg = ""
--> }
--> Args:
-->

g-api.log
-------------
2015-09-22 13:23:33.232 21257 ERROR glance_store._drivers.vmware_datastore [req-d91e76c5-746d-4b0f-8942-cd5adf866b91 86ba66edc4b24e639c37e4ce992d9384 3d5d5d98dde249f08298210cb2e45866 - - -] Communication error sending http PUT requestto the url /folder/openstack_glance/89c4faf4-79d7-4593-9b6d-4ea4827e3f7c%3FdcPath%3Ddatacenter-1%26dsName%3Dshared.
Got IOError [Errno 32] Broken pipe
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils [req-d91e76c5-746d-4b0f-8942-cd5adf866b91 86ba66edc4b24e639c37e4ce992d9384 3d5d5d98dde249f08298210cb2e45866 - - -] Failed to upload image 89c4faf4-79d7-4593-9b6d-4ea4827e3f7c
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils Traceback (most recent call last):
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/opt/stack/glance/glance/api/v1/upload_utils.py", line 113, in upload_data_to_store
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils context=req.context)
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/usr/local/lib/python2.7/dist-packages/glance_store/backend.py", line 340, in store_add_to_backend
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils context=context)
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/usr/local/lib/python2.7/dist-packages/glance_store/capabilities.py", line 226, in op_checker
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils return store_op_fun(store, *args, **kwargs)
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/usr/local/lib/python2.7/dist-packages/glance_store/_drivers/vmware_datastore.py", line 522, in add
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils res = conn.getresponse()
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/usr/lib/python2.7/httplib.py", line 1051, in getresponse
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils response.begin()
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/usr/lib/python2.7/httplib.py", line 415, in begin
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils version, status, reason = self._read_status()
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils File "/usr/lib/python2.7/httplib.py", line 379, in _read_status
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils raise BadStatusLine(line)
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils BadStatusLine: ''
2015-09-22 13:23:33.233 21257 ERROR glance.api.v1.upload_utils

git commits used

devstack
-------------
commit c00e39901be810deb4044904734cc68af42aad8e
Merge: 7d4485c be65c6f
Author: Jenkins <email address hidden>
Date: Tue Sep 15 03:17:15 2015 +0000

    Merge "Fix typos in stackrc and unstack.sh"

glance
----------
commit 925794c63e38263198345f92124f370211a9d39e
Merge: 84437b9 5c3a3bd
Author: Jenkins <email address hidden>
Date: Tue Sep 15 02:00:35 2015 +0000

    Merge "Make task_time_to_live work"

nova
-------
commit fe1d118c0c37cd94fe05271946ded8c2b11f9482
Merge: b14bc0a 812d75e
Author: Jenkins <email address hidden>
Date: Wed Sep 16 05:13:44 2015 +0000

    Merge "Fix cells use of legacy bdms during local instance delete operations"

Revision history for this message
Giridhar Jayavelu (gjayavelu) wrote :
Revision history for this message
Giridhar Jayavelu (gjayavelu) wrote :
Changed in glance:
assignee: nobody → Giridhar Jayavelu (gjayavelu)
description: updated
Revision history for this message
Giridhar Jayavelu (gjayavelu) wrote :

Thanks to Sabari for pointing out the issue.

This seems to be due very low NFC timeout value used in VC.

Here is the workaround suggested by Sabari:

1. Make a backup copy of the vpxd.cfg file.
2. Open the vpxd.cfg file in a text editor.
3. Add the following lines between <vmacore> and </vmacore> stanza:

  <http>
      <readTimeoutMs>600000</readTimeoutMs>
  </http>

For example, a complete section appears similar to:

<config>
  <vmacore>
    <http>
      <readTimeoutMs>600000</readTimeoutMs>
    </http>
  </vmacore>
..
</config>

4. Stop and restart the VirtualCenter Server Service. This ensures it picks up the updated parameters in the vpxd.cfg file.

Revision history for this message
Sabari Murugesan (smurugesan) wrote :

Thanks Giri for filing the bug.

Yup, as I mentioned, the vCenters http read timeout needs to be increased because there may be brief periods of time without any data transfer as the ESX server is busing compressing the disk for streaming. The http pipeline looks something like this and unfortunately the whole operation is serialized :-

ESX Server --> nova-compute --> glance-api --> vCenter Server

Unfortunately, we can't fix this bug on glance's VMware store.

Changed in glance:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.