OpenStack Compute (nova)

Docs needed for tunables at large scale

Bug #1838819 reported by Matt Riedemann on 2019-08-02

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Undecided	Unassigned

Bug Description

Various things come up in IRC every once in a while about configuration options that need to be tweaked at large scale (blizzard, cern, etc) which once you hit hundreds or thousands of compute nodes need to be changed to avoid killing the control plane.

One such option is this:

https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval

From a blizzard operator:

(3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane
(3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes
(3:06:26 PM) eandersson: We also ended up adding a variance

Similarly, CERN had to totally disable this one:

https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh

And rely on SIGHUP / restart of the service if they needed to refresh that cache.

We should put these things in the admin docs as we come across them so we don't forget about this stuff when new operators/users come along and hit scaling issues.

Tags:

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-08-02:

https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.report_interval is another one, and rpc_response_timeout and long_rpc_timeout.

Takashi Natsume (natsume-takashi) on 2019-08-03

Changed in nova:
assignee:	nobody → Takashi NATSUME (natsume-takashi)
status:	New → In Progress

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-08-03:

I meant to assign this to myself since I have some ideas in mind for what to document, or at least start a document.

Revision history for this message

Tim Bell (tim-bell) wrote on 2019-08-03:

Another one from CERN's experiences is regarding Placement : scheduler.max_placement_results (https://techblog.web.cern.ch/techblog/post/scheduling-optimizations/).

Are you looking to document only for Nova or also for other areas too? We're gradually tuning Ironic/Neutron for scale too.

Takashi Natsume (natsume-takashi) on 2019-08-04

Changed in nova:
assignee:	Takashi NATSUME (natsume-takashi) → nobody
status:	In Progress → Confirmed

Revision history for this message

Brin Zhang (zhangbailin) wrote on 2019-08-05:

I think in the docs, o consider in many ways is necessayr. Nova, glance. neutron, cinder and related clients, configuration items should be comprehensive components to improve the stability of large-scale scenarios.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-08-06:

@Tim, I'm just looking for some simple pointers to options people typically tune at scale for *nova* specifically. The ops guide [1] is something more general for openstack-wide documentation. I'm not trying to boil the ocean here.

[1] https://docs.openstack.org/operations-guide/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-08-21:

More details from CERN's nova+ironic testing at scale:

https://techblog.web.cern.ch/techblog/post/nova-ironic-at-scale/

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-08-27:

Another one: image cache management when you're not using ceph or NFS on your computes:

https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.image_cache_subdirectory_name

The idea is to share ^ across your computes so the first compute to download a new image has the image shared across all computes so the others don't have to re-download the same image.

Note that there could be unexpected bugs as a result (see bug 1804262).

Sylvain Bauza (sylvain-bauza) on 2019-09-10

tags:

added: doc
removed: docs

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-10-14:

Another one ( https://review.opendev.org/#/c/623558/ ) related to https://docs.openstack.org/nova/latest/configuration/config.html#filter_scheduler.track_instance_changes

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.