Comment 8 for bug 1903008

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/786097
Committed: https://opendev.org/openstack/neutron/commit/1f30f2dfff722ea65c811b4b99243fae51a2d688
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 1f30f2dfff722ea65c811b4b99243fae51a2d688
Author: Terry Wilson <email address hidden>
Date: Mon Dec 7 20:46:53 2020 +0000

    Rely on worker count for HashRing caching

    The current code looks at a hash ring node's created_at/updated_at
    fields and tries to determine whether the node has been updated
    based on whether updated_at - created_at > 1 second (due to the
    method that initially fills them being different by microseconds).
    Unfortunately, due to the notify() method being called which calls
    the hash ring node's touch_node(), a node can be updated in under
    a second, meaning we will prevent caching for much longer than
    we intend.

    When using sqlite in-memory db, this continually re-creating the
    Hash Ring objects for every event that is processed is exposing an
    issue where rows that should be in the db just *aren't*.

    This patch instead limits the hash ring nodes to api workers and
    prevents caching only until the number of nodes == number of api
    workers on the host. The switch from spawning hash ring nodes
    where !is_maintenance to is_api_worker is primarily because it
    seems to be difficult to get a list of *all* workers from which to
    subtract the maintenance worker so that _wait_startup_before_caching
    can wait for that specific number of workers. In practice, this
    means that RpcWorker and ServiceWorker workers would not process
    HashRing events.

    A note on bug 1903008: While this change will greatly reduce the
    likelihood of this issue taking place, we still have some work to
    do in order to fully understand why it rubs the database backend
    in the wrong way. Thus, we will make this change 'related to'
    instead of closing the bug.

    Related-Bug: #1894117
    Related-Bug: #1903008
    Change-Id: Ia198d45f49bddda549a0e70a3374b8339f88887b
    (cherry picked from commit c4007b0833111a25d24f597161d39ee9ccd37189)