Hash Ring: Race condition during service initialization

Bug #1833105 reported by Lucas Alvares Gomes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
High
Lucas Alvares Gomes

Bug Description

During the service initialization the Hash Ring code cleans up any existing node that matches the host's hostname (in case it wasn't cleaned when the service was shutdown, e.g SIGKILL), this operation is done in the pre_fork_initialize() method of the ML2 driver.

While many processes are being spawned, the pre_fork_initialize() and post_fork_initialize() methods can race and some Hash Ring members that just spawned may be deleted in the process, see the logs from the mysql database:

()[root@overcloud-controller-0 /]# grep ovn_hash_ring /var/lib/mysql/overcloud-controller-0-slow.log | grep -v "SELECT" | egrep "INSERT|DELETE"
DELETE FROM ovn_hash_ring WHERE ovn_hash_ring.hostname = 'overcloud-controller-0.localdomain';
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('1583fecf-97b8-41e7-85c3-bf14702c8040', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('ab46e47e-7b5e-4913-a046-3400ca7d7f07', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('1f461ff0-fa48-4cc7-99f4-a35e448cd044', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('514013bc-a174-43ac-a796-842fcbab1ab0', 'overcloud-controller-0.localdomain', now(), now());
DELETE FROM ovn_hash_ring WHERE ovn_hash_ring.hostname = 'overcloud-controller-0.localdomain';
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('2ad8d643-28e3-4742-a7ab-d645af995796', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('86fa3549-c321-4ddf-9140-249aabb56aaf', 'overcloud-controller-0.localdomain', now(), now());
INSERT INTO ovn_hash_ring (node_uuid, hostname, created_at, updated_at) VALUES ('a61f24a6-218f-48dd-8d15-e8672a53a58e', 'overcloud-controller-0.localdomain', now(), now());

Changed in networking-ovn:
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
importance: Undecided → High
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.opendev.org/665720

Changed in networking-ovn:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.opendev.org/665720
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=5edf5cefc58f3303064f04b3976382727f7e513f
Submitter: Zuul
Branch: master

commit 5edf5cefc58f3303064f04b3976382727f7e513f
Author: Lucas Alvares Gomes <email address hidden>
Date: Mon Jun 17 17:02:38 2019 +0100

    Hash Ring: Fix race during service initialization

    This patch is moving the call that cleans up the local hash ring workers
    (self._clean_hash_ring) to happen prior to it subscribing to the worker's
    pre/post fork notifications.

    Before, the self._clean_hash_ring() was invoked during
    pre_fork_initialize() which could race with the post_fork_initialize()
    when multiple workers where being spawned in parallel and that could
    cause some new workers to be deleted from the Hash Ring (see bug linked
    to more information and logs).

    Change-Id: I1c2903e97e7aadff830a63011ab7eff8ca03c65b
    Closes-Bug: #1833105
    Signed-off-by: Lucas Alvares Gomes <email address hidden>

Changed in networking-ovn:
status: In Progress → Fix Released
tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 7.0.0.0b1

This issue was fixed in the openstack/networking-ovn 7.0.0.0b1 development milestone.

tags: removed: networking-ovn-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.