Refactor ssh key handling logic to improve performance/efficiency

Bug #1833420 reported by Alex Kavanagh on 2019-06-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack nova-cloud-controller charm
Wishlist
Alex Kavanagh
OpenStack nova-compute charm
Wishlist
Alex Kavanagh

Bug Description

The current SSH key distribution code in nova-cloud-controller charm takes too long when dealing with large (>10) numbers of nova-compute units. This bug will document the changes that are introduced as a series of patches to the nova-cloud-controller and nova-compute charms to attempt to resolve the performance issues, as they are mandatory for being able to manage large clouds.

Note that this bug will not track work associated with resolving inter- nova-cloud-controller ssh key distribution (which is necessary to be able to migrate an instance between two juju nova-cloud-controller applications).

Patches will use the gerrit topic: improve-ssh-key-management

Patches will use "Related-Bug" and NOT "Closes-Bug" so as to be tracked here. The bug will be closed when agreement has been reached as to whether any more performance can be extracted using the current approach (using relations to move keys around).

The main approach taken in rationalising the way SSH keys are managed is to:

1. Ensure that operations on SSH keys are only performed ONCE in any hook execution.
2. Cache expensive operations during a hook execution that aren't going to change.
3. Cache expensive operations between hook executions (nslookup, for example) that aren't going to change during a hook execution.
4. Use goal-state to hold off setting known_hosts and authorized_keys to reduce the number of cloud-compute relation change hooks.
5. Add an action to purge the hosts cached if DNS changes to update the known_hosts to each of the compute units.

Changed in charm-nova-cloud-controller:
assignee: nobody → Alex Kavanagh (ajkavanagh)
Alex Kavanagh (ajkavanagh) wrote :

https://review.opendev.org/#/c/666374/ -- in nova-compute charm

Refactor import_authorized_keys() function for performance

The main change is to fetch all of the relation_data() at once, and then
iterate through the python dictionary. This speeds up processing of
potentially hundreds of hosts and authorized_keys.

Change-Id: I095104f535c1eae1554f842502ae93ebb92e44fe
Related-Bug: #1833420

Changed in charm-nova-cloud-controller:
status: New → In Progress
Changed in charm-nova-compute:
assignee: nobody → Alex Kavanagh (ajkavanagh)
status: New → In Progress
description: updated

Related fix proposed to branch: master
Review: https://review.opendev.org/668711

Reviewed: https://review.opendev.org/667652
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=cb60bab6f7124bf6ba42d0b4fd1470e8b6dc3991
Submitter: Zuul
Branch: master

commit cb60bab6f7124bf6ba42d0b4fd1470e8b6dc3991
Author: Alex Kavanagh <email address hidden>
Date: Wed Jun 26 16:24:54 2019 +0100

    Refactor compute_changed() hook handler

    The compute_changed() function (which handled the cloud-compute relation
    changed hook event) was also used in the config-changed and update-charm
    hooks. This meant that it did a lot of work that wasn't necessary for
    those hooks.

    This patch splits the function up into separate functions that deal with
    one thing, and then introduces a new function to call those. This means
    that the other usages compute_changed() now use the actual features that
    they need.

    Change-Id: I59e52076480729beec9e125f66714a208303908d
    Related-Bug: 1833420

Reviewed: https://review.opendev.org/668190
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=0c96b7177ac3c19665d2b8205dac32ac2096098d
Submitter: Zuul
Branch: master

commit 0c96b7177ac3c19665d2b8205dac32ac2096098d
Author: Alex Kavanagh <email address hidden>
Date: Fri Jun 28 16:20:49 2019 +0100

    Refactor update_ssh_keys_and_notify_compute_units()

    This patchset refactors the update_ssh_keys_and_notify_compute_units()
    into two separate halves: one is "update keys and hosts from a specified
    unit on a relation (or the current relation/unit for the hook)" and the
    other is update the relation_set data for the relation with the found
    keys/hosts.

    This is a precusor patch to reducing the number of dns queries and
    setting of relation data, and to eliminate repeated operations on the
    same data/result (which currently happens).

    Change-Id: I45bc9a889968796572a61c199ac25d543c064670
    Related-Bug: 1833420

Reviewed: https://review.opendev.org/668711
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=452ac31663a87fd3198896fe2ff498a1fbade769
Submitter: Zuul
Branch: master

commit 452ac31663a87fd3198896fe2ff498a1fbade769
Author: Alex Kavanagh <email address hidden>
Date: Tue Jul 2 17:43:09 2019 +0100

    Refactor region notification code to not need unit

    The (already) refactored region notification code checked for the
    'region' value in the remote unit, despite the nova-compute charm not
    setting the value. This has been removed. Now that the function only
    needs to be set for the relation, it is no longer included in 'unit'
    loops.

    The utility function is also renamed to
    set_region_on_relation_from_config to better reflect it's actual
    function.

    Change-Id: I81d9924bebe4009119505b1d5dccf2e498925c7e
    Related-Bug: #1833420

Reviewed: https://review.opendev.org/669123
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=e8577fc96e7818a5cda80128eba3de75b32de448
Submitter: Zuul
Branch: master

commit e8577fc96e7818a5cda80128eba3de75b32de448
Author: Alex Kavanagh <email address hidden>
Date: Thu Jul 4 10:42:25 2019 +0100

    Use goal state to defer distributing ssh data

    If goal state is available, this patch uses it to defer distributing the
    known_hosts and authorized_keys to the related nova-compute units until
    the last nova-compute unit expected has joined and triggered a
    cloud-compute-relation-changed hook. This means that all of the hosts
    have been scanned and thus the whole group (per relation id) can be set.

    Note that this patch does not unify the known_hosts/authorized_keys
    files across relations. That's for a separate patch.

    Change-Id: I6c26ebbad2236e66c174ef4606828db834803865
    Related-Bug: #1833420

Alex Kavanagh (ajkavanagh) wrote :
Changed in charm-nova-cloud-controller:
status: In Progress → Fix Committed
milestone: none → 19.07
importance: Undecided → Wishlist
Changed in charm-nova-compute:
importance: Undecided → Wishlist
milestone: none → 19.07
status: In Progress → Fix Committed
David Ames (thedac) on 2019-08-12
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers