Race in mysql database initial bootstrap
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Undecided
|
Damien Ciabrini |
Bug Description
When a node that hosts mariadb is initially set up, a container called mysql_bootstrap runs kolla bootstrap scripts to create a blank mariadb database, and add a couple of tripleo-specific users on top.
Once the empty mariadb DB is created by Kolla, the remaining user setup runs in two steps:
. Kolla starts temporary mysql server with mysqld_safe to set up the mysql root password and remove unecessary users from the DB. It then shuts down the server with "mysqladmin shutdown".
. Then TripleO starts a new temporary mysql server with mysqld_safe, because it needs to create additional users for monitoring. It then shuts this server down with "mysqladmin shutdown".
When "mysqladmin shutdown" finishes, it only guarantees that the real mysqld ELF binary is stopped. It doesn't wait for the mysqld_safe shell script to effectively stop.
This causes a race condition: once the server created by Kolla is stopped, TripleO may be able to restart a new mysqld server before the original Kolla mysqld_safe script is rescheduled by the kernel. When that happens, Kolla's mysqld_safe sees that new mysqld process, thinks it didn't shutdown properly, and forcibly kill -9 it. This breaks the TripleO-started mysqld_safe script and cause the rest of the DB bootstrap to fail.
Fix proposed to branch: master /review. opendev. org/752475
Review: https:/