Upgrading from Rocky to Stein fails on MariaDB

Bug #1820325 reported by Mark Goddard on 2019-03-15
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Critical
Mark Goddard
Rocky
Medium
Mark Goddard
Stein
Critical
Mark Goddard

Bug Description

MariaDB was upgraded from 10.1 to 10.3 in the Stein release. It seems upgrades do not work currently, as seen in kayobe upgrade testing and anecdotally in IRC.

There are no useful error logs either in `docker logs mariadb` or /var/log/kolla/mariadb/mariadb.log.

Using the following command I was able to get some logs on startup:

docker run --rm -it --name mariadb_test -e KOLLA_CONFIG_STRATEGY=COPY_ALWAYS -e KOLLA_BASE_DISTRO=centos -e KOLLA_INSTALL_TYPE=binary -v kolla_logs:/var/log/kolla -v /etc/localtime:/etc/localtime:ro -v mariadb:/var/lib/mysql -v /etc/kolla/mariadb:/var/lib/kolla/config_files:ro --net=host kolla/centos-binary-mariadb:master bash -c 'sudo -E kolla_set_configs && . kolla_extend_start && bash -x mysqld_safe'

For some reason the -x causes logs to be dumped to stdout/err.

Here's a dump of the output: http://paste.openstack.org/show/747900/. It's verbose due to the -x.

The pertinent logs seem to be these:

nohup /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/usr/lib64/mariadb/plugin --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_on=ON --log-error=/var/log/kolla/mariadb/mariadb.log --pid-file=/var/lib/mysql/mariadb.pid --port=3306 --wsrep_recover --disable-log-error --pid-file=/var/lib/mysql//sossidge.novalocal-recover.pid
++ grep 'WSREP: Recovered position:' /tmp/wsrep_recovery.kZaWpN
+ local rp=
+ '[' -z '' ']'
++ grep WSREP /tmp/wsrep_recovery.kZaWpN
++ grep 'skipping position recovery'
+ local skipped=
+ '[' -z '' ']'
++ cat /tmp/wsrep_recovery.kZaWpN
+ log_error 'WSREP: Failed to recover position: '\''nohup: appending output to '\''/var/lib/mysql/nohup.out'\''
2019-03-15 17:58:59 0 [Note] /usr/libexec/mysqld (mysqld 10.3.10-MariaDB-log) starting as process 184 ...
2019-03-15 17:58:59 0 [Note] InnoDB: Using Linux native AIO
2019-03-15 17:58:59 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2019-03-15 17:58:59 0 [Note] InnoDB: Uses event mutexes
2019-03-15 17:58:59 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2019-03-15 17:58:59 0 [Note] InnoDB: Number of pools: 1
2019-03-15 17:58:59 0 [Note] InnoDB: Using SSE2 crc32 instructions
2019-03-15 17:58:59 0 [Note] InnoDB: Initializing buffer pool, total size = 4G, instances = 8, chunk size = 128M
2019-03-15 17:59:00 0 [Note] InnoDB: Completed initialization of buffer pool
2019-03-15 17:59:00 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2019-03-15 17:59:00 0 [ERROR] InnoDB: Upgrade after a crash is not supported. This redo log was created before MariaDB 10.2.2.
2019-03-15 17:59:00 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2019-03-15 17:59:00 0 [Note] InnoDB: Starting shutdown...
2019-03-15 17:59:00 0 [ERROR] Plugin '\''InnoDB'\'' init function returned error.
2019-03-15 17:59:00 0 [ERROR] Plugin '\''InnoDB'\'' registration as a STORAGE ENGINE failed.
2019-03-15 17:59:00 0 [Note] Plugin '\''FEEDBACK'\'' is disabled.
2019-03-15 17:59:00 0 [ERROR] Unknown/unsupported storage engine: innodb
2019-03-15 17:59:00 0 [ERROR] Aborting'\'''

Mark Goddard (mgoddard) wrote :

Following https://serverfault.com/questions/379714/unknown-unsupported-storage-engine-innodb-mysql-ubuntu, I removed /var/lib/mysql/ib_logfile0 and /var/lib/mysql/ib_logfile1, which allowed the service to start. There are various bad looking logs, e.g.

[Warning] InnoDB: Table mysql/innodb_index_stats has length mismatch in the column name table_name. Please run mysql_upgrade

This is clearly not the right course of action, but does show what is blocking the startup.

We also see these messages, suggesting we need to run mysql_upgrade:

[Warning] InnoDB: Table mysql/innodb_table_stats has length mismatch in the column name table_name. Please run mysql_upgrade

Mark Goddard (mgoddard) wrote :

In order to run mysql_upgrade, I had to remove the directory /var/lib/mysql/.pki, otherwise we get the following error:

mysqlcheck: Got error: 1102: Incorrect database name '#mysql50#.pki' when selecting the database
FATAL ERROR: Upgrade failed

Command used was this:

mysql_upgrade -u root -p<DB root password>

Mark Goddard (mgoddard) wrote :

Looks like we're killing the maridb service, which leaves around old state, preventing the upgrade.

docker stop -t 60 mariadb

sudo journalctl -u docker | tail -n 1

Container c2a33c47667f80452ab18fe10c1564309955db0e6b4cb765348371746f11a72d failed to exit within 60 seconds of signal 15 - using the force"

Mark Goddard (mgoddard) wrote :

dumb-init is executed with --single-process so only forwards the term signal to mysqld_safe script, which apparently just ignores it. We need to call this to shutdown:

mysqladmin shutdown -p<password>

Perhaps we need to wrap mysqld_admin in a script that monitors signals and shuts down mysql as necessary.

For stopping existing containers, we might need to do something in ansible.

Mark Goddard (mgoddard) on 2019-03-17
Changed in kolla-ansible:
importance: Undecided → High
assignee: nobody → Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: High → Critical

Fix proposed to branch: master
Review: https://review.openstack.org/644250

Changed in kolla-ansible:
status: New → In Progress

Reviewed: https://review.openstack.org/644250
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=b25c0ee4772fdf04fe6f263a8e81571204f706ff
Submitter: Zuul
Branch: master

commit b25c0ee4772fdf04fe6f263a8e81571204f706ff
Author: Mark Goddard <email address hidden>
Date: Sun Mar 17 17:48:54 2019 +0000

    Fix MariaDB 10.3 upgrade

    Upgrading MariaDB from Rocky to Stein currently fails, with the new
    container left continually restarting. The problem is that the Rocky
    container does not shutdown cleanly, leaving behind state that the new
    container cannot recover. The container does not shutdown cleanly
    because we run dumb-init with a --single-child argument, causing it to
    forward signals to only the process executed by dumb-init. In our case
    this is mysqld_safe, which ignores various signals, including SIGTERM.
    After a (default 10 second) timeout, Docker then kills the container.

    A Kolla change [1] removes the --single-child argument from dumb-init
    for the MariaDB container, however we still need to support upgrading
    from Rocky images that don't have this change. To do that, we add new
    handlers to execute 'mysqladmin shutdown' to cleanly shutdown the
    service.

    A second issue with the current upgrade approach is that we don't
    execute mysql_upgrade after starting the new service. This can leave the
    database state using the format of the previous release. This patch also
    adds handlers to execute mysql_upgrade.

    [1] https://review.openstack.org/644244

    Depends-On: https://review.openstack.org/644244
    Depends-On: https://review.openstack.org/645990
    Change-Id: I08a655a359ff9cfa79043f2166dca59199c7d67f
    Closes-Bug: #1820325

Changed in kolla-ansible:
status: In Progress → Fix Released

This issue was fixed in the openstack/kolla 8.0.0.0rc1 release candidate.

This issue was fixed in the openstack/kolla-ansible 8.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/650855
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=d93c604d7af03e8b096faea8acaaaef718df2cd5
Submitter: Zuul
Branch: master

commit d93c604d7af03e8b096faea8acaaaef718df2cd5
Author: Mark Goddard <email address hidden>
Date: Fri Apr 5 19:37:54 2019 +0100

    Remove shutdown of MariaDB

    Since we are now in the Train cycle, we can be sure that any running
    MariaDB containers can be safely stopped, and we do not need to perform
    an explicit shutdown prior to restarting them.

    Change-Id: I5450690f1cbe0c995e8e4b01a76e90dac2574d61
    Related-Bug: #1820325

This issue was fixed in the openstack/kolla 7.0.2 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers