Comment 4 for bug 1890798

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Two notes to clarify the failure messages seen in this launchpad:

1. the "Error: resource \'ip-192.168.24.10\' not running on any node is caused by a task to migrates away the VIP hosted by the node being updated, to avoid a potential long service disruption caused by the stop ordering constraint in pacemaker.

We migrate the VIP away by banning the VIP locally, and lifting the ban once it's restarted elsewhere. However this assumes that the control plane is bigger than 1 node, so we need to fix that eventually.

2. here the fatal error during the update task seems to be that mysql_init_bundle failed to access mysql and made the update timed out.
When looking at mysql logs [1], I can see that mysql stopped an unexpected file permission error:

2020-08-10 23:06:01 0 [ERROR] InnoDB: Cannot open '/var/lib/mysql/ib_buffer_pool.incomplete' for writing: Permission denied
2020-08-10 23:06:02 0 [Note] InnoDB: Shutdown completed; log sequence number 4053481; transaction id 2693
2020-08-10 23:06:02 0 [ERROR] InnoDB: Operating system error number 13 in a file operation.
2020-08-10 23:06:02 0 [ERROR] InnoDB: The error means mysqld does not have the access rights to the directory.
2020-08-10 23:06:02 0 [ERROR] mysqld: Error on delete of './tc.log' (Errcode: 13 "Permission denied")
2020-08-10 23:06:02 0 [Note] /usr/libexec/mysqld: Shutdown complete

That is due to a recent change in upstream master now that kolla images are build with TCIB instead of kolla [2]. The UID/GID of service user has changed, so when this minor update job updates the container image to ussuri:current-tripleo [3], the id is changed back to kolla, and the container cannot restart.

Until https://review.opendev.org/#/c/745575/ is merged, this periodic job won't pass unless the initial container image used to deploy the overcloud changes.

[1] https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-ussuri/653660a/logs/subnode-1/var/log/containers/mysql/mysqld.log.txt.gz
[2] trunk.registry.rdoproject.org/tripleoussuri/centos-binary-mariadb:bb7c0317ff07c99e4bb86814b2beecc9
[3] docker.io/tripleou/centos-binary-mariadb:current-tripleo