Comment 12 for bug 1978372

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/845896
Committed: https://opendev.org/openstack/nova/commit/099a6f63af7805440d91976ba0ea03bc6c278280
Submitter: "Zuul (22348)"
Branch: master

commit 099a6f63af7805440d91976ba0ea03bc6c278280
Author: Balazs Gibizer <email address hidden>
Date: Wed Jun 15 09:28:27 2022 +0200

    Optimize numa_fit_instance_to_host

    The numa_fit_instance_to_host algorithm tries all the possible
    host cell permutations to fit the instance cells. So in worst case
    scenario it does n! / (n-k)! _numa_fit_instance_cell calls
    (n=len(host_cells) k=len(instance_cells)) to find if the instance can be
    fit to the host. With 16 NUMA nodes host and 8 NUMA node guests this
    means 500 million calls to _numa_fit_instance_cell. This takes excessive
    time.

    However going through these permutations there are many repetitive
    host_cell, instance_cell pairs to try to fit.
    E.g.
      host_cells=[H1, H2, H2]
      instance_cells=[G1, G2]

    Produces pairings:

    * H1 <- G1 and H2 <- G2
    * H1 <- G1 and H3 <- G2
    ...

    Here G1 is checked to fit H1 twice. But if it does not fit in the first
    time then we know that it will not fit in the second time either. So we
    can cache the result of the first check and use that cache for the later
    permutations.

    This patch adds two caches to the algo. A fit_cache to hold
    host_cell.id, instance_cell.id pairs that we know fit, and a
    no_fit_cache for those pairs that we already know that doesn't fit.

    This change significantly boost the performance of the algorithm. The
    reproduction provided in the bug 1978372 took 6 minutes on my local
    machine to run without the optimization. With the optimization it run in
    3 seconds.

    This change increase the memory usage of the algorithm with the two
    caches. Those caches are sets of integer two tuples. And the total size
    of the cache is the total number of possible host_cell, instance_cell
    pairs which is len(host_cell) * len(instance_cells). So form the above
    example (16 host, 8 instance NUMA) it is 128 pairs of integers in the
    cache. That will not cause a significant memory increase.

    Closes-Bug: #1978372
    Change-Id: Ibcf27d741429a239d13f0404348c61e2668b4ce4