standalone-upgrade job failing while upgrading mariadb

Bug #1810136 reported by yatin on 2018-12-31
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Damien Ciabrini

Bug Description

standalone-upgrade job failing in both upstream[2] and RDO promotion job[1] while running step2 mariadb bootstrap task with Error:-

Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)"

Failing step:- Run docker-puppet tasks (bootstrap tasks) for step 2

Task:- /usr/bin/puppet apply --summarize --detailed-exitcodes --color=false --logdest syslog --logdest console --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules --tags file,file_line,concat,augeas,cron,mysql_database,mysql_grant,mysql_user /etc/config.pp

Mariadb Contains following log:-
[ERROR] InnoDB: Upgrade after a crash is not supported. This redo log was created before MariaDB 10.2.2.
[ERROR] InnoDB: Plugin initialization aborted with error Generic error
[Note] InnoDB: Starting shutdown...
[ERROR] Plugin 'InnoDB' init function returned error.
[ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
[Note] Plugin 'FEEDBACK' is disabled.
[ERROR] Unknown/unsupported storage engine: InnoDB

Looks like some step is needed before upgrading MariaDB from 10.1 to 10.3. I can see some SUCCESS in upstream check jobs so seems the issue is intermittent but in promotion job i can see continuous 6 FAILURES[2].

Example log:-
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master/131b2f6/logs/undercloud/home/zuul/undercloud_upgrade.log.txt.gz
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master/131b2f6/logs/undercloud/var/log/containers/mysql/mariadb.log.txt.gz

[1] Logstash query for upstream which shows upgrade jobs have issues:- http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Can't%20connect%20to%20local%20MySQL%20server%20through%20socket%5C%22

[2] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master

yatin (yatinkarel) on 2018-12-31
tags: added: alert ci promotion-blocker
Ronelle Landy (rlandy) on 2019-01-02
Changed in tripleo:
milestone: none → stein-2
importance: Undecided → Critical
status: New → Triaged
Ronelle Landy (rlandy) wrote :

Asking upgrades team to look at this

Luca Miccini (lmiccini2) wrote :

according to https://mariadb.com/kb/en/library/upgrading-from-mariadb-101-to-mariadb-102/+comments/2903 it seems like innodb_fast_shutdown=1 should workaround it.
we should also check https://jira.mariadb.org/browse/MDEV-13603 (innodb_fast_shutdown=0 may fail to purge all history).

Fix proposed to branch: master
Review: https://review.openstack.org/628161

Changed in tripleo:
assignee: nobody → Jose Luis Franco (jfrancoa)
status: Triaged → In Progress
Changed in tripleo:
assignee: Jose Luis Franco (jfrancoa) → Dan Prince (dan-prince)
Changed in tripleo:
assignee: Dan Prince (dan-prince) → Jose Luis Franco (jfrancoa)
wes hayutin (weshayutin) wrote :

removing alert, it looks like Jose has this under control

tags: removed: alert
Changed in tripleo:
assignee: Jose Luis Franco (jfrancoa) → Damien Ciabrini (dciabrin)
Changed in tripleo:
milestone: stein-2 → stein-3

Reviewed: https://review.openstack.org/628161
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0015cc74416445c3cf4e18e566a954f609f7cf0f
Submitter: Zuul
Branch: master

commit 0015cc74416445c3cf4e18e566a954f609f7cf0f
Author: Jose Luis Franco Arza <email address hidden>
Date: Thu Jan 3 13:07:26 2019 +0100

    Gracefully shutdown Mysql before upgrade.

    When upgrading from MySQL 10.1 to 10.3 a bug appears if no
    shutdown is being performed, as the redo log format has
    changed in version 10.3.2 [0].

    Make sure we always stop the MySQL server cleanly before
    upgrading to a new version, to avoid redo log issue.

    Note: to be idempotent, we need to stop the mysql container
    rather than delete it; to be able to stop the container, we
    amend the restart policy of the mysql container.

    [0] - https://jira.mariadb.org/browse/MDEV-14848

    Change-Id: Ia07b7755867858c74c7334424e8e6579ace495db
    Co-Authored-By: Damien Ciabrini <email address hidden>
    Closes-Bug: #1810136

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/tripleo-heat-templates 10.4.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers