Hash Ring: Race condition during service initialization

Bug #1833105 reported by Lucas Alvares Gomes on 2019-06-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
High
Lucas Alvares Gomes

Bug Description

During the service initialization the Hash Ring code cleans up any existing node that matches the host's hostname (in case it wasn't cleaned when the service was shutdown, e.g SIGKILL), this operation is done in the pre_fork_initialize() method of the ML2 driver.

While many processes are being spawned, the pre_fork_initialize() and post_fork_initialize() methods can race and some Hash Ring members that just spawned may be deleted in the process, see the logs from the mysql database:

()[root@overcloud-controller-0 /]# grep ovn_hash_ring /var/lib/mysql/overcloud-controller-0-slow.log | grep -v "SELECT" | egrep "INSERT|DELETE"
DELETE FROM ovn_hash_ring WHERE ovn_hash_ring.hostname = 'overcloud-controller-0.localdomain';
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('1583fecf-97b8-41e7-85c3-bf14702c8040', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('ab46e47e-7b5e-4913-a046-3400ca7d7f07', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('1f461ff0-fa48-4cc7-99f4-a35e448cd044', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('514013bc-a174-43ac-a796-842fcbab1ab0', 'overcloud-controller-0.localdomain', now(), now());
DELETE FROM ovn_hash_ring WHERE ovn_hash_ring.hostname = 'overcloud-controller-0.localdomain';
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('2ad8d643-28e3-4742-a7ab-d645af995796', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('86fa3549-c321-4ddf-9140-249aabb56aaf', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('a61f24a6-218f-48dd-8d15-e8672a53a58e', 'overcloud-controller-0.localdomain', now(), now());

Changed in networking-ovn:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
importance: Undecided → High
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.opendev.org/665720

Changed in networking-ovn:
status: Confirmed → In Progress

Reviewed: https://review.opendev.org/665720
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=5edf5cefc58f3303064f04b3976382727f7e513f
Submitter: Zuul
Branch: master

commit 5edf5cefc58f3303064f04b3976382727f7e513f
Author: Lucas Alvares Gomes <email address hidden>
Date: Mon Jun 17 17:02:38 2019 +0100

    Hash Ring: Fix race during service initialization

    This patch is moving the call that cleans up the local hash ring workers
    (self._clean_hash_ring) to happen prior to it subscribing to the worker's
    pre/post fork notifications.

    Before, the self._clean_hash_ring() was invoked during
    pre_fork_initialize() which could race with the post_fork_initialize()
    when multiple workers where being spawned in parallel and that could
    cause some new workers to be deleted from the Hash Ring (see bug linked
    to more information and logs).

    Change-Id: I1c2903e97e7aadff830a63011ab7eff8ca03c65b
    Closes-Bug: #1833105
    Signed-off-by: Lucas Alvares Gomes <email address hidden>

Changed in networking-ovn:
status: In Progress → Fix Released
tags: added: networking-ovn-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers