key distribution for live migration scales poorly

Bug #1755966 reported by Paul Collins
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Fix Released
Medium
Unassigned

Bug Description

I recently upgraded an OpenStack deployment of the ~15.04 charms to 17.11. During the upgrade we noticed that nova-cloud-controller was spending a lot of time in "configuring live migration".

This turned out to be because it was spending approximately 40 seconds on each of the 60 units in our nova-compute-cache service and, for each unit, it was running relation-set over 1110 times (increasing by 4 for each subsequent unit, strangely).

This nova-cloud-controller is also related to a nova-compute service with only four units, and when handling keys for those units it only ran relation-set 84 times for the first unit, again increasing by 4 for each subsequent unit. This all added up to an additional 30 or 40 minutes spent upgrading this one charm.

Here's a breakdown of the relation-get/relation-set calls during the first half or so of this upgrade: https://pastebin.canonical.com/p/mznDWrZqQK/ (sorry, link is Canonical-only).

Paul Collins (pjdc)
tags: added: canonical-is-ps45-1711-upgrade
Revision history for this message
Simon Monette (simon-monette) wrote :

This issue can also be reproduced during a deployment.

With cs:nova-cloud-controller-302 on xenial-newton
 and cs:nova-compute-279 on xenial-newton and enable-live-migration=true

It can add-up to 4h of wait-time on stack with ~100 computes.

Revision history for this message
James Page (james-page) wrote :

We might be better to review TLS configuration of libvirt and change the way we do live migration rather than tackle this specific issue.

With the new certificate management work being done with Vault, configuring the libvirt daemons directly with TLS via a trusted CA should be much easier, requiring a 1 -> N set of hook executions rather than the N x N that we have today.

References: https://wiki.libvirt.org/page/TLSSetup

tags: added: upgrade
Changed in charm-nova-cloud-controller:
status: New → Triaged
importance: Undecided → Medium
milestone: none → 18.08
James Page (james-page)
Changed in charm-nova-cloud-controller:
milestone: 18.08 → 18.11
David Ames (thedac)
Changed in charm-nova-cloud-controller:
milestone: 18.11 → 19.04
David Ames (thedac)
Changed in charm-nova-cloud-controller:
milestone: 19.04 → 19.07
David Ames (thedac)
Changed in charm-nova-cloud-controller:
milestone: 19.07 → 19.10
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

This is effectively solved by the 19.07 release of the nova-cloud-controller which has optimisations to reduce the time taken for upgrades (it doesn't re-evaluate all of the knownhosts during an upgrade).

Changed in charm-nova-cloud-controller:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.