Race in mysql database initial bootstrap

Bug #1896009 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Damien Ciabrini

Bug Description

When a node that hosts mariadb is initially set up, a container called mysql_bootstrap runs kolla bootstrap scripts to create a blank mariadb database, and add a couple of tripleo-specific users on top.

Once the empty mariadb DB is created by Kolla, the remaining user setup runs in two steps:

  . Kolla starts temporary mysql server with mysqld_safe to set up the mysql root password and remove unecessary users from the DB. It then shuts down the server with "mysqladmin shutdown".

  . Then TripleO starts a new temporary mysql server with mysqld_safe, because it needs to create additional users for monitoring. It then shuts this server down with "mysqladmin shutdown".

When "mysqladmin shutdown" finishes, it only guarantees that the real mysqld ELF binary is stopped. It doesn't wait for the mysqld_safe shell script to effectively stop.

This causes a race condition: once the server created by Kolla is stopped, TripleO may be able to restart a new mysqld server before the original Kolla mysqld_safe script is rescheduled by the kernel. When that happens, Kolla's mysqld_safe sees that new mysqld process, thinks it didn't shutdown properly, and forcibly kill -9 it. This breaks the TripleO-started mysqld_safe script and cause the rest of the DB bootstrap to fail.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/752475

Changed in tripleo:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/752618

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/752619

Revision history for this message
Marios Andreou (marios-b) wrote :

can this be related to https://bugs.launchpad.net/tripleo/+bug/1895822 ?

in particular see https://bugs.launchpad.net/tripleo/+bug/1895822/comments/1

it seems that after mysql has shutdown (for the upgrade) then when it starts again we have the errors

        * https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/logs/undercloud/var/log/containers/mysql/mysqld.log

                * 2020-09-15 14:32:08 0 [Note] InnoDB: Starting shutdown...
                * 2020-09-15 14:32:09 0 [Note] /usr/libexec/mysqld: Shutdown complete
                * 2020-09-15 14:32:30 0 [Note] WSREP: Found saved state: cebd6089-f754-11ea-ac23-9b5df17a204a:8702, safe_to_bootstrap: 1
                * 2020-09-15 14:32:30 0 [Note] /usr/libexec/mysqld: ready for connections.
        Version: '10.3.17-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
                  2020-09-15 14:32:31 0 [Note] InnoDB: Buffer pool(s) load completed at 200915 14:32:31
                * 2020-09-15 14:38:29 259 [Warning] Aborted connection 259 to db: 'nova_api' user: 'nova_api' host: '192.168.24.1' (Got an error reading communication packets)

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Hey Marios, no this mysql race is very specific to the initial database creation. This mysql_bootstrap container is a no_op as soon as the db has been initialized, so if the deployment passed, the upgrade won't run into the same race again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/752475
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=e8ddc606b26b89cff4e595c27520961ae2ca7d25
Submitter: Zuul
Branch: master

commit e8ddc606b26b89cff4e595c27520961ae2ca7d25
Author: Damien Ciabrini <email address hidden>
Date: Thu Sep 17 16:09:50 2020 +0200

    Remove race during mysql database creation

    The mysql database is create by container mysql_bootstrap,
    which let Kolla run mysqld_safe temporarily, and then
    let TripleO run it for additional setup.

    Before running the second temporary mysqld server, make
    sure that the mysqld_safe script started by Kolla is
    always stopped, to avoid any race condition that would
    cause the second mysqld_safe server to be killed by the
    Kolla one.

    Change-Id: Id7cf45fb95d3c8a2c5519b1a13a5651cf414a115
    Co-Authored-By: Michele Baldessari <email address hidden>
    Closes-Bug: #1896009

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/752618
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0f0e7fa1193e3dc63179a682a7256ae3368bf0be
Submitter: Zuul
Branch: stable/ussuri

commit 0f0e7fa1193e3dc63179a682a7256ae3368bf0be
Author: Damien Ciabrini <email address hidden>
Date: Thu Sep 17 16:09:50 2020 +0200

    Remove race during mysql database creation

    The mysql database is create by container mysql_bootstrap,
    which let Kolla run mysqld_safe temporarily, and then
    let TripleO run it for additional setup.

    Before running the second temporary mysqld server, make
    sure that the mysqld_safe script started by Kolla is
    always stopped, to avoid any race condition that would
    cause the second mysqld_safe server to be killed by the
    Kolla one.

    Change-Id: Id7cf45fb95d3c8a2c5519b1a13a5651cf414a115
    Co-Authored-By: Michele Baldessari <email address hidden>
    Closes-Bug: #1896009

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/752619
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=46b45ed6e340861d02b4280ba685d5ff1a41220d
Submitter: Zuul
Branch: stable/train

commit 46b45ed6e340861d02b4280ba685d5ff1a41220d
Author: Damien Ciabrini <email address hidden>
Date: Thu Sep 17 16:09:50 2020 +0200

    Remove race during mysql database creation

    The mysql database is create by container mysql_bootstrap,
    which let Kolla run mysqld_safe temporarily, and then
    let TripleO run it for additional setup.

    Before running the second temporary mysqld server, make
    sure that the mysqld_safe script started by Kolla is
    always stopped, to avoid any race condition that would
    cause the second mysqld_safe server to be killed by the
    Kolla one.

    Change-Id: Id7cf45fb95d3c8a2c5519b1a13a5651cf414a115
    Co-Authored-By: Michele Baldessari <email address hidden>
    Closes-Bug: #1896009

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.