standalone-upgrade job failing while upgrading mariadb

Bug #1810136 reported by yatin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Damien Ciabrini

Bug Description

standalone-upgrade job failing in both upstream[2] and RDO promotion job[1] while running step2 mariadb bootstrap task with Error:-

Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)"

Failing step:- Run docker-puppet tasks (bootstrap tasks) for step 2

Task:- /usr/bin/puppet apply --summarize --detailed-exitcodes --color=false --logdest syslog --logdest console --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules --tags file,file_line,concat,augeas,cron,mysql_database,mysql_grant,mysql_user /etc/config.pp

Mariadb Contains following log:-
[ERROR] InnoDB: Upgrade after a crash is not supported. This redo log was created before MariaDB 10.2.2.
[ERROR] InnoDB: Plugin initialization aborted with error Generic error
[Note] InnoDB: Starting shutdown...
[ERROR] Plugin 'InnoDB' init function returned error.
[ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
[Note] Plugin 'FEEDBACK' is disabled.
[ERROR] Unknown/unsupported storage engine: InnoDB

Looks like some step is needed before upgrading MariaDB from 10.1 to 10.3. I can see some SUCCESS in upstream check jobs so seems the issue is intermittent but in promotion job i can see continuous 6 FAILURES[2].

Example log:-
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master/131b2f6/logs/undercloud/home/zuul/undercloud_upgrade.log.txt.gz
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master/131b2f6/logs/undercloud/var/log/containers/mysql/mariadb.log.txt.gz

[1] Logstash query for upstream which shows upgrade jobs have issues:- http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Can't%20connect%20to%20local%20MySQL%20server%20through%20socket%5C%22

[2] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master

yatin (yatinkarel)
tags: added: alert ci promotion-blocker
Ronelle Landy (rlandy)
Changed in tripleo:
milestone: none → stein-2
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Ronelle Landy (rlandy) wrote :

Asking upgrades team to look at this

Revision history for this message
Luca Miccini (lmiccini2) wrote :

according to https://mariadb.com/kb/en/library/upgrading-from-mariadb-101-to-mariadb-102/+comments/2903 it seems like innodb_fast_shutdown=1 should workaround it.
we should also check https://jira.mariadb.org/browse/MDEV-13603 (innodb_fast_shutdown=0 may fail to purge all history).

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/628161

Changed in tripleo:
assignee: nobody → Jose Luis Franco (jfrancoa)
status: Triaged → In Progress
Changed in tripleo:
assignee: Jose Luis Franco (jfrancoa) → Dan Prince (dan-prince)
Changed in tripleo:
assignee: Dan Prince (dan-prince) → Jose Luis Franco (jfrancoa)
Revision history for this message
wes hayutin (weshayutin) wrote :

removing alert, it looks like Jose has this under control

tags: removed: alert
Changed in tripleo:
assignee: Jose Luis Franco (jfrancoa) → Damien Ciabrini (dciabrin)
Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/628161
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0015cc74416445c3cf4e18e566a954f609f7cf0f
Submitter: Zuul
Branch: master

commit 0015cc74416445c3cf4e18e566a954f609f7cf0f
Author: Jose Luis Franco Arza <email address hidden>
Date: Thu Jan 3 13:07:26 2019 +0100

    Gracefully shutdown Mysql before upgrade.

    When upgrading from MySQL 10.1 to 10.3 a bug appears if no
    shutdown is being performed, as the redo log format has
    changed in version 10.3.2 [0].

    Make sure we always stop the MySQL server cleanly before
    upgrading to a new version, to avoid redo log issue.

    Note: to be idempotent, we need to stop the mysql container
    rather than delete it; to be able to stop the container, we
    amend the restart policy of the mysql container.

    [0] - https://jira.mariadb.org/browse/MDEV-14848

    Change-Id: Ia07b7755867858c74c7334424e8e6579ace495db
    Co-Authored-By: Damien Ciabrini <email address hidden>
    Closes-Bug: #1810136

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.4.0

This issue was fixed in the openstack/tripleo-heat-templates 10.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.