kolla extend_start.sh doesn't check the start of MariaDB server correctly in bootstrap
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla |
Fix Released
|
High
|
Unassigned | ||
tripleo |
Fix Released
|
High
|
Sergii Golovatiuk |
Bug Description
Description of problem:
kolla extend_start.sh doesn't check the start of MariaDB server correctly in bootstrap
How reproducible:
very hard to reproduce, caught on CI
Here is a log how it failed
Here is a MySQL log at the same time
http://
Steps to Reproduce:
There should be innodb recovery during bootstrap. In this case there will be delay between pid and mysql.sock
Actual results:
MariaDB bootstrap failed
Expected results:
MariaDB bootstrap happened
Additional info:
There is a wrong logic in [1]
When pid file exists it doesn't matter that MariaDB is up and running. In this case, MariaDB needed some time to recover InnoDB data, so PID file was present but MariaDB was not ready.
2018-06-08 7:33:46 140020369176768 [Note] /usr/libexec/
happened after
2018-06-08 07:33:45 | "++ sudo -E kolla_security_
so the check of MariaDB [1] didn't happen
[1] https:/
RHBZ#1589267
Changed in kolla: | |
status: | New → Confirmed |
Changed in tripleo: | |
assignee: | nobody → Sergii Golovatiuk (sgolovatiuk) |
importance: | Undecided → Medium |
importance: | Medium → High |
status: | New → Confirmed |
milestone: | none → rocky-3 |
no longer affects: | tripleo/pike |
no longer affects: | tripleo/queens |
Changed in tripleo: | |
status: | Confirmed → Triaged |
Changed in tripleo: | |
milestone: | rocky-3 → rocky-rc1 |
Changed in tripleo: | |
milestone: | rocky-rc1 → stein-1 |
Changed in tripleo: | |
milestone: | stein-1 → stein-2 |
Changed in tripleo: | |
milestone: | stein-2 → stein-3 |
Changed in tripleo: | |
status: | Triaged → Fix Released |
Changed in kolla: | |
milestone: | none → 8.0.0 |
importance: | Undecided → High |
[Current analysis of what failed in the CI job - sorry for the long post]
TL;DR using "mysqladmin ping" is probably a better way to ensure mysqld started properly during kolla bootstrap.
There's a small typo in the link in the description, the failure log is available at [1].
Quick tripleo refresher: during undercloud install, we bootstrap the mysql db by running the kolla bootstrap script [2] in a transient container (before that time no DB existed yet).
The bootstrapping happens in two steps: run mysql_install_db, and run some kolla-specific commands to set up root password.
extend_start.sh is sourced by kolla_start, which has +x and +e flags, so we trace all shell commands and stop on first error.
1) From the install log, we see that mysql_install_db went ok, because we continue and call bootstrap_db:
2018-06-05 16:24:34 | "++ mysql_install_db", 51615 adjusted to 4294967295", 51615 adjusted to 4294967295", 51615 adjusted to 4294967295",
2018-06-05 16:24:34 | "2018-06-05 16:24:14 139787632818368 [Warning] option 'open_files_limit': unsigned value 184467440737095
2018-06-05 16:24:34 | "2018-06-05 16:24:14 139787632818368 [Note] /usr/libexec/mysqld (mysqld 10.1.20-MariaDB) starting as process 46 ...",
2018-06-05 16:24:34 | "2018-06-05 16:24:18 139859719940288 [Warning] option 'open_files_limit': unsigned value 184467440737095
2018-06-05 16:24:34 | "2018-06-05 16:24:18 139859719940288 [Note] /usr/libexec/mysqld (mysqld 10.1.20-MariaDB) starting as process 75 ...",
2018-06-05 16:24:34 | "2018-06-05 16:24:21 139917192714432 [Warning] option 'open_files_limit': unsigned value 184467440737095
2018-06-05 16:24:34 | "2018-06-05 16:24:21 139917192714432 [Note] /usr/libexec/mysqld (mysqld 10.1.20-MariaDB) starting as process 104 ...",
2018-06-05 16:24:34 | "++ bootstrap_db",
(internally this caused three transient mysqld run as seen in [3]) www.percona. com) 5.6.34-79.1 started; log sequence number 0 www.percona. com) 5.6.34-79.1 started; log sequence number 1622818 www.percona. com) 5.6.34-79.1 started; log sequence number 1622828
2018-06-08 7:33:11 139673585006784 [Note] InnoDB: Percona XtraDB (http://
2018-06-08 7:33:15 139700819671232 [Note] InnoDB: Percona XtraDB (http://
2018-06-08 7:33:19 139684735236288 [Note] InnoDB: Percona XtraDB (http://
Finishing OK means any pidfile and sock file on disk were deleted when mysqld_safe stopped.
2) Then comes the kolla-specific initialization. mysql/mariadb. pid are created.
. In that step, we start a mysql server _without network connectivity_ and _witout galera replication_.
. A while loops [4] waits until it sees that both the mysql{d}.sock unix socket _and_ /var/lib/
The while loop is a fairly good guarantee that mysql is up because mysqld_safe force deletes any lingering /var/lib/ mysql/mariadb. pid [5] before starting the mysqld server. Also mysql{d}.sock is always deleted and recreated because of the way named UNIX socket are initialized.
I've tested it manually and it seems that the while loop only finished when both the socket and the pidfile are created.
From mariadb logs [3], we see that the server properly started at 7:33...