Bug #1835958 “Nova sync power state on large clusters causes poo...” : Bugs : OpenStack Compute (nova)

Revision history for this message

Chris Dent (cdent) wrote on 2019-07-12:

#1

There's been some work in this area to try to resolve some of those issues, but they may not have made it in to any release yet. I'm looking around to find more information on what patches were made when.

In the meantime can you report on which release you are using and how you deploy your OpenStack?

Thans.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2019-07-12:

#2

> cause a Nova sync power process to be kicked off.

I'm not sure if you're referring to the _sync_power_states periodic task:

https://github.com/openstack/nova/blob/ff0feed25d56c8ccd2298d5b5b82e636880fa986/nova/compute/manager.py#L7907

If you are, that's a mischaracterization of what happens when you create or live migrate a server. Since it's a periodic task, it runs like a cron.

Maybe you mean "claims" performed in the resource tracker, like during instance create?

https://github.com/openstack/nova/blob/ff0feed25d56c8ccd2298d5b5b82e636880fa986/nova/compute/manager.py#L2223

But again, those claims aren't (yet) made for live migration and interface attach/detach operations. So I'm a bit confused on what you're saying is taking a long time.

Regarding the _sync_power_states periodic task I've seen a few vcenter-specific patches to deal with a large number of instances when nova-compute is managing a large vcenter cluster, e.g.:

https://review.opendev.org/#/c/575034/

Also see bug 1513735.

tags:

added: compute vmware

Revision history for this message

tom king (kingttx) wrote on 2019-07-12:

#3

> I'm not sure if you're referring to the _sync_power_states periodic task...

Not in this bug, no. I've watched the nova-compute.log file during some of those example operations and the req-#### for the operation is added to the sync. It isn't part of the periodic sync, it gets kicked off as part of those example operations.

> Maybe you mean "claims" performed in the resource tracker, like during instance create?

No, it literally takes 5-20 minutes to create an instance, especially if we create more than one instance at a time (which happens a LOT in our environment).

Lately, I deployed three test instances.
1st instance took 4 minutes
2nd instance took 10 minutes
3rd instance took 11 minutes

> In the meantime can you report on which release you are using and how you deploy your OpenStack?

Queens via VMware's VIO 5.x installer.

Thanks!
Tom

Revision history for this message

Richard Boswell (richard-boswell) wrote on 2019-07-12:

#4

Tom,

There are two things that _should_ help:

- Disable power state checks in nova.conf.
- Break up your ESXi clusters into smaller ones.

The problem is the distribution of the power state checks themselves, if there are ESXi clusters that are large (greater than say 500 VMs) the nova-compute instances, in a few cases, begin to experience service degradation due to the number of power state checks. I wrote about tangentially at http://www.revolutionlabs.net/vio_resource_schedulers_part2/.

Revision history for this message

tom king (kingttx) wrote on 2019-07-12:

#5

>There are two things that _should_ help:
>
>- Disable power state checks in nova.conf.
>- Break up your ESXi clusters into smaller ones.

With as much churn our environment has, disabling the power state checks wouldn't be good. We use it for dev/test, and the users do a lot of stuff within the instance operating system. We are changing the periodic check to 1 hour rather than the default 10 minutes, though.

Breaking up the large clusters into smaller ones will likely help a ton but will really mask the underlying issue: actions taken on a specific instance causes a _full_ power sync to be kicked off for every single instance.

I know this would mean a pretty involved rewrite, but an action taken on a particular instance should only do a power sync for that particular instance rather than every single instance. Also, I'm not understanding the need/benefit of doing a whole power sync when creating instances. Scaling up a cluster would perform better if that isn't necessary.

Thanks!
Tom

Revision history for this message

Chris Dent (cdent) wrote on 2019-07-19:

#6

Tom,

> Breaking up the large clusters into smaller ones will likely help a ton but will really mask the underlying issue: actions taken on a specific instance causes a _full_ power sync to be kicked off for every single instance.

It's not clear how this could be happening. syncing power states is supposed to only happen during an independent periodic job, without regard to incoming instances.

If your evidence of this is solely the logs and the request id that is involved, it could be that this patch <https://review.opendev.org/#/c/575034/>, which is merged in the VIO release (plus a several other power sync related changes) but not upstream, is disrupting the context variable used to provide the request-id used in the logs. That is, there could be a flow like this:

* power sync starts
* it sleeps, relinquishing the thread
* an instance is spawned, context is updated
* power sync starts back up with the new context

However: The VIO code for _sync_power_states in 5.0 is quite a lot different from the upstream code so it is hard to make any satisfying conclusions without much more information. I think you've already got a support issue opened with the VIO support people. If you can provide detailed logs _there_ (if you haven't already), inspecting those logs is probably going to be the most useful thing and won't be usable by non-VMware people upstream as they do not have access to the customised code.

If I had to guess (and that's all it is without more data) what's going on with the performance slowdown, there's been an unintended consequence from the optimisation that is present in the VIO code. Some situation where what normally does help ends up cluttering up the works.

But finally, to support what Richard said: There are a lot of loops in nova-compute over the number of active instances. These are developed with an eye towards what many of the developers think as the "normal" environment, libvirt+KVM on host where the number of instances is in the tens or hundreds, not thousands. This can sometimes lead to an unfortunate but true impedance mismatch between the way nova-compute wants to behave and the way vSphere would like it to.

You're completely correct that an individual instance action shouldn't cause an update of all the instances. If that is indeed what is happening, tracing it down and fixing it would be a great improvement for any virt driver. Which is a long way of saying: Thanks for working to get this fixed and provide the information needed to make it happen.