tripleo

standalone-upgrade job failing while upgrading mariadb

Bug #1810136 reported by yatin on 2018-12-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	Critical	Damien Ciabrini	tripleo stein-3

Bug Description

standalone-upgrade job failing in both upstream[2] and RDO promotion job[1] while running step2 mariadb bootstrap task with Error:-

Error: Failed to apply catalog: Execution of '/usr/bin/mysql --defaults-extra-file=/root/.my.cnf -NBe SELECT CONCAT(User, '@',Host) AS User FROM mysql.user' returned 1: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)"

Failing step:- Run docker-puppet tasks (bootstrap tasks) for step 2

Task:- /usr/bin/puppet apply --summarize --detailed-exitcodes --color=false --logdest syslog --logdest console --modulepath=/etc/puppet/modules:/usr/share/openstack-puppet/modules --tags file,file_line,concat,augeas,cron,mysql_database,mysql_grant,mysql_user /etc/config.pp

Mariadb Contains following log:-
[ERROR] InnoDB: Upgrade after a crash is not supported. This redo log was created before MariaDB 10.2.2.
[ERROR] InnoDB: Plugin initialization aborted with error Generic error
[Note] InnoDB: Starting shutdown...
[ERROR] Plugin 'InnoDB' init function returned error.
[ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
[Note] Plugin 'FEEDBACK' is disabled.
[ERROR] Unknown/unsupported storage engine: InnoDB

Looks like some step is needed before upgrading MariaDB from 10.1 to 10.3. I can see some SUCCESS in upstream check jobs so seems the issue is intermittent but in promotion job i can see continuous 6 FAILURES[2].

Example log:-
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master/131b2f6/logs/undercloud/home/zuul/undercloud_upgrade.log.txt.gz
https://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master/131b2f6/logs/undercloud/var/log/containers/mysql/mariadb.log.txt.gz

[1] Logstash query for upstream which shows upgrade jobs have issues:- http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22Can't%20connect%20to%20local%20MySQL%20server%20through%20socket%5C%22

[2] https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master

Tags:

yatin (yatinkarel) on 2018-12-31

tags:

added: alert ci promotion-blocker

Ronelle Landy (rlandy) on 2019-01-02

Changed in tripleo:
milestone:	none → stein-2
importance:	Undecided → Critical
status:	New → Triaged

Revision history for this message

Ronelle Landy (rlandy) wrote on 2019-01-02:

Asking upgrades team to look at this

Revision history for this message

Luca Miccini (lmiccini2) wrote on 2019-01-03:

according to https://mariadb.com/kb/en/library/upgrading-from-mariadb-101-to-mariadb-102/+comments/2903 it seems like innodb_fast_shutdown=1 should workaround it.
we should also check https://jira.mariadb.org/browse/MDEV-13603 (innodb_fast_shutdown=0 may fail to purge all history).

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-03: Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/628161

Changed in tripleo:
assignee:	nobody → Jose Luis Franco (jfrancoa)
status:	Triaged → In Progress

OpenStack Infra (hudson-openstack) on 2019-01-03

Changed in tripleo:
assignee:	Jose Luis Franco (jfrancoa) → Dan Prince (dan-prince)

OpenStack Infra (hudson-openstack) on 2019-01-03

Changed in tripleo:
assignee:	Dan Prince (dan-prince) → Jose Luis Franco (jfrancoa)

Revision history for this message

wes hayutin (weshayutin) wrote on 2019-01-03:

removing alert, it looks like Jose has this under control

tags:

removed: alert

OpenStack Infra (hudson-openstack) on 2019-01-09

Changed in tripleo:
assignee:	Jose Luis Franco (jfrancoa) → Damien Ciabrini (dciabrin)

Emilien Macchi (emilienm) on 2019-01-13

Changed in tripleo:
milestone:	stein-2 → stein-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-01-14: Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/628161
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=0015cc74416445c3cf4e18e566a954f609f7cf0f
Submitter: Zuul
Branch: master

commit 0015cc74416445c3cf4e18e566a954f609f7cf0f
Author: Jose Luis Franco Arza <email address hidden>
Date: Thu Jan 3 13:07:26 2019 +0100

Gracefully shutdown Mysql before upgrade.

    When upgrading from MySQL 10.1 to 10.3 a bug appears if no
    shutdown is being performed, as the redo log format has
    changed in version 10.3.2 [0].

Make sure we always stop the MySQL server cleanly before
upgrading to a new version, to avoid redo log issue.

    Note: to be idempotent, we need to stop the mysql container
    rather than delete it; to be able to stop the container, we
    amend the restart policy of the mysql container.

[0] - https://jira.mariadb.org/browse/MDEV-14848

    Change-Id: Ia07b7755867858c74c7334424e8e6579ace495db
    Co-Authored-By: Damien Ciabrini <email address hidden>
    Closes-Bug: #1810136