RPC timeouts (when using raw image backend for example)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Dashboard (Horizon) |
Invalid
|
Medium
|
Unassigned |
Bug Description
Release found: Folsom
Environment:
A single node running all services
Ubuntu 12.04
Using Ubuntu cloud archive packages
Steps to reproduce:
1. Upload a big image, for example a windows 7 image of 8GB (I guess any big image will do, but we verified this with a big win7 image)
2. Add the following flag to nova.conf: libvirt_
3. Use horizon to launch 20 instances at once, using the win7 image
Current result:
Only 3..7 out of 20 instances are able to launch successfuly, all other instances become in error state because of rpc timeout of nova-network
Expected result:
Higher ratio of success preferably 20, maybe by limiting the amount of allowed instances that can be started when raw backend is being used. Or maybe this should be done by limiting system io for the spawn instance process?
Additional info:
If we change to libvirt_
Related error log which shows the RPC timeout:
http://
When you use nova-network in multi_host mode the same issue may be experiencied when launching lots of instances. So another way to reproduce is to deploy several nodes all running nova-network multi_host and then launch 20 x N instances where N == amount of compute nodes.
description: | updated |
Changed in horizon: | |
milestone: | havana-1 → none |
It's a known problem. You can achieve the same result by simply launching 1000 images (if your cluster is capable of that). Eventually you hit the timeout and things get messy.
I would love to see this fixed but it's not particularly simple to solve. Nova improved things somewhat in Grizzly by switching part of the launch code flow to be an async "cast", but it's still not perfect.
Horizon is at the mercy of the APIs here by-and-large.