RPC timeouts (when using raw image backend for example)

Bug #1137060 reported by Sam Stoelinga
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
Invalid
Medium
Unassigned

Bug Description

Release found: Folsom

Environment:
A single node running all services
Ubuntu 12.04
Using Ubuntu cloud archive packages

Steps to reproduce:
1. Upload a big image, for example a windows 7 image of 8GB (I guess any big image will do, but we verified this with a big win7 image)
2. Add the following flag to nova.conf: libvirt_images_type=raw and restart nova-compute
3. Use horizon to launch 20 instances at once, using the win7 image

Current result:
Only 3..7 out of 20 instances are able to launch successfuly, all other instances become in error state because of rpc timeout of nova-network

Expected result:
Higher ratio of success preferably 20, maybe by limiting the amount of allowed instances that can be started when raw backend is being used. Or maybe this should be done by limiting system io for the spawn instance process?

Additional info:
If we change to libvirt_images_type=default the exact same hardware/setup can launch 20 instances without any rpc timeouts. It seems that the big image copy part is causing heavy load, but when using qcow2 it uses a backing file so it doesn't have to copy the whole image.

Related error log which shows the RPC timeout:
http://paste.openstack.org/show/32687/

When you use nova-network in multi_host mode the same issue may be experiencied when launching lots of instances. So another way to reproduce is to deploy several nodes all running nova-network multi_host and then launch 20 x N instances where N == amount of compute nodes.

description: updated
Revision history for this message
Gabriel Hurley (gabriel-hurley) wrote :

It's a known problem. You can achieve the same result by simply launching 1000 images (if your cluster is capable of that). Eventually you hit the timeout and things get messy.

I would love to see this fixed but it's not particularly simple to solve. Nova improved things somewhat in Grizzly by switching part of the launch code flow to be an async "cast", but it's still not perfect.

Horizon is at the mercy of the APIs here by-and-large.

summary: - Batch instance causes rpc timeout when using raw image backend
+ RPC timeouts (when using raw image backend for example)
Changed in horizon:
importance: Undecided → Medium
milestone: none → havana-1
status: New → Confirmed
Changed in horizon:
milestone: havana-1 → none
Revision history for this message
Gary W. Smith (gary-w-smith) wrote :

This bug was last updated over 4 years ago, and as there have
been many changes to both nova and horizon since then, this is
getting marked as Invalid. If the issue still exists, please
feel free to reopen it.

Changed in horizon:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.