OpenStack Compute (nova)

Performance issues when have 1k+ Ironic BM instances

Bug #1559246 reported by sergiiF on 2016-03-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Invalid	Undecided	Unassigned

Bug Description

We have an Ironic deployment with about 1500 BMs, 1k+ of them are already provisioned.

The current Ironic architecture doesn't allow us to have more than one 'ironic compute node'. As a result nova-compute service is 100% busy with periodic tasks like updating instances status (this task takes about 1.5 minute!!).

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2016-03-18:

This lacking quite a bit of information. First, what version of nova/ironic are you on?

Have you done any profiling to see what bottlenecks there might be?

Which periodic tasks specifically are taking a long time?

Also, what is the size of the deployment (how big is the controller)? Talking CPUs/RAM here.

Changed in nova:
status:	New → Invalid

Revision history for this message

Andrew Laski (alaski) wrote on 2016-03-18:

There's not enough here to classify anything as a bug, though there are surely things that could be improved. This is also related to the work proposed in https://review.openstack.org/294795

Revision history for this message

sergiiF (framin) wrote on 2016-03-18:

>>Have you done any profiling to see what bottlenecks there might be?
>>Which periodic tasks specifically are taking a long time?

Main CPU consuming task is update_available_resource. And in particular two subroutines:
1. objects.InstanceList.get_by_host_and_node
2. objects.MigrationList.get_in_progress_by_host_and_node

Revision history for this message

sergiiF (framin) wrote on 2016-03-18:

>>There's not enough here to classify anything as a bug
Kind of agree. But still, without making a code change Ironic is not usable for large scales.. I would say there is a bug and it is in design.
1. Nova compute design is not suitable for managing hundreds of instances per compute node.
2. Ironic design (unless Ironic: Multiple compute host support blueprint is implemented) assigns all BMs to the only compute node.

Revision history for this message

sergiiF (framin) wrote on 2016-03-18:

Btw mentioned blueprint expects EACH compute node to report all nodes. Which doesn't really solve the issue. The resource tracking is the only performance issue we are experiencing on a scale 1k+ nodes.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Ironic: Multiple compute host support

Remote bug watches

Bug watches keep track of this bug in other bug trackers.