Glance + SSL - Image download errors
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Glance |
Incomplete
|
Undecided
|
Unassigned |
Bug Description
Hello,
I have a latest stable havana (2013.2.3) openstack setup and I am noticing issues occasionally when downloading new backing files for vm's to compute nodes. I will occasionally end up with vm's that are stuck spawning, upon investigation I can see the backing file under /var/nova/
I have managed to create some scripts that will replicate the issue multiple ways. The image files that I have been testing with are 8.8gb, 8.6gb and a large 60gb image (however another larger 8gb image would also duplicate the issue).
The first script: https:/
Will take the image files that you give it and will deploy a vm per image file to the compute node that you have specified. With SSL enabled typically only 1 VM will ever boot successfully. Errors here will range from failed (md5sum mismatches) image downloads to backing files that are only partially downloaded. To narrow down the issue I switched over to using the glance client to do image downloads.
The second script: https:/
Will take the images specified on the command line and run the glance image-download command in a parallel bash subshell. This script removes nova from the mix. However, errors seen here are the same as what I have seen with the first script.
The thrid script: https:/
Uses: https:/
With all the scripts, and after a lot of testing I have found that this issue is 100% re-producible when trying to download 3 images at the same time. But I have also noticed in production that this issue happens when only downloading a single image on a compute node.
I should add some more detail about our setup. SSL is not being offloaded in any environment and is being handled via the glance-api and glance-registry services. We increased the number of workers to 40, to better handle multiple downloads/SSL overhead. In production we are using F5’s or A10’s for load balancing in our dev/test/stage environments we are using haproxy. The issue exists in all environments. Also, in testing it did not matter the number of glance-api servers we had in rotation. To simplify troubleshooting, I had disabled glance-api on all but one server. So most of the testing was done from a single compute node using multiple clients to a single glance-api instance (with 40 workers). To add some additional detail I am running on Centos 6.5, and I have already tried upgrading eventlet, greenlet, pyOpenSSL, pycryptography to their latest versions on both the client and the server and it did not help.
If we turn off ssl in glance-api and the client, then the 3 simultaneous downloads work without issue.