[ocata to pike] upgrade failing at step3 during dbsync

Bug #1724636 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Sofer Athlan-Guyot

Bug Description

Testing upgrades in upstream gate, it doesn't work yet.
We now reach step 3 but it's failing during the dbsyncs.

Failure during upgrade:
http://logs.openstack.org/25/500625/15/check/legacy-tripleo-ci-centos-7-containers-multinode-upgrades/2a547a7/logs/undercloud/home/zuul/overcloud_upgrade_console.log.txt.gz#_2017-10-18_15_49_11

In http://logs.openstack.org/25/500625/15/check/legacy-tripleo-ci-centos-7-containers-multinode-upgrades/2a547a7/logs/subnode-2/var/log/messages.txt.gz :

Oct 18 15:48:38 centos-7-rax-ord-0000291272 os-collect-config: "Error: /Stage[main]/Cinder::Db::Sync/Exec[cinder-manage db_sync]: Failed to call refresh: cinder-manage db sync returned 1 instead of one of [0]",

Confirmed here:
http://logs.openstack.org/25/500625/15/check/legacy-tripleo-ci-centos-7-containers-multinode-upgrades/2a547a7/logs/subnode-2/var/log/cinder/cinder-manage.log.txt.gz#_2017-10-18_15_45_50_862

DBError: (pymysql.err.InternalError) (1018, u'Can\'t read dir of \'./cinder/\' (errno: 13 "Permission denied")') [SQL: u'SHOW FULL TABLES FROM `cinder`']

Same for Heat (and probably others, but haven't reached that step yet).

Revision history for this message
Alex Schultz (alex-schultz) wrote :

This is likely from the user ID change as we go from non-containerized mariadb to containerized mariadb. I think the containerized version uses a different UID and so the existing ownership might need to change.

Revision history for this message
Michele Baldessari (michele) wrote :

NB: Until we fix https://bugs.launchpad.net/tripleo/+bug/1713007 in pike upgrades just won't work (see also https://bugzilla.redhat.com/show_bug.cgi?id=1475404 for some more info).

I am totally surprised that galera is even somewhat up and running (as it normally just stays in slave mode without those fixes)

I'll look some more at this tomorrow with Damien.

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

I wonder if it's a dup of https://bugs.launchpad.net/tripleo/+bug/1701485

I have to check the logs but as said in comment #1, this looks a discrepancy
between the mysql pid which is running and the permission of /var/lib/mysql
on disk.

So it's either the new containerized mysql service that tries to access the
DB on disk before it has been chown to kolla's mysql uid, or it's the old
galera pid which failed to stop during the upgrade and which cannot access
the DB on disk anymore.

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

OK so When looking at http://logs.openstack.org/25/500625/15/check/legacy-tripleo-ci-centos-7-containers-multinode-upgrades/2a547a7/logs/subnode-2/var/log/messages.txt.gz, I don't see any mention of a container "mysql_data_ownership" which should have been started to chown /var/lib/mysql for containers. It has not been run.

Likewise, looking at http://logs.openstack.org/25/500625/15/check/legacy-tripleo-ci-centos-7-containers-multinode-upgrades/2a547a7/logs/subnode-2/var/log/cluster/corosync.log.txt.gz, I don't see any mention of pacemaker starting any containerized galera on its own.

So I think this upgrade job is not doing the thing it is intended to? If it's a multinode upgrade, it should use the containerized pacemaker settings, i.e. the contents of docker-ha.yaml should be passed to the deploy command.

Revision history for this message
Emilien Macchi (emilienm) wrote :

Damien, the command used to upgrade is the following:

openstack overcloud deploy --templates tripleo-heat-templates --libvirt-type qemu --timeout 80 -e /home/zuul/cloud-names.yaml -e /home/zuul/tripleo-heat-templates/environments/deployed-server-environment.yaml -e /home/zuul/tripleo-heat-templates/environments/deployed-server-bootstrap-environment-centos.yaml --overcloud-ssh-user zuul -e /home/zuul/tripleo-heat-templates/ci/environments/multinode.yaml -e /home/zuul/tripleo-heat-templates/environments/low-memory-usage.yaml -e /opt/stack/new/tripleo-ci/test-environments/worker-config.yaml -e /home/zuul/tripleo-heat-templates/environments/debug.yaml --validation-errors-nonfatal --roles-file /home/zuul/overcloud_roles.yaml --compute-scale 0 -e tripleo-heat-templates/environments/docker.yaml -e tripleo-heat-templates/ci/environments/multinode-containers.yaml -e tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml -e /home/zuul/containers-default-parameters.yaml -e /home/zuul/overcloud-repo.yaml

Source: http://logs.openstack.org/25/500625/15/check/legacy-tripleo-ci-centos-7-containers-multinode-upgrades/2a547a7/logs/undercloud/home/zuul/overcloud_upgrade_console.log.txt.gz#_2017-10-18_15_14_36

Revision history for this message
Damien Ciabrini (dciabrin) wrote :

Thanks Emilien, it seems like tripleo-heat-templates/ci/environments/multinode-containers.yaml is meant to configure the proper containerized pacemaker services.

So I'm really sure why I don't seem to see any call to the new containerized services. I need to replicate the deployment locally to investigate more

Changed in tripleo:
milestone: queens-1 → queens-2
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

Wondering if that could be related[1] or not at all. Triggering a new job to check if it help as it has merged.

[1] https://bugs.launchpad.net/tripleo/+bug/1730349

Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

Wrong branch, cherry pick and add depends-on.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.openstack.org/518578

Changed in tripleo:
assignee: nobody → Sofer Athlan-Guyot (sofer-athlan-guyot)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/518579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/518578
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f48e11ee2715277bd1fe6c32580cc41a0de6ce4e
Submitter: Zuul
Branch: master

commit f48e11ee2715277bd1fe6c32580cc41a0de6ce4e
Author: Sofer Athlan-Guyot <email address hidden>
Date: Wed Nov 8 17:46:35 2017 +0100

    Make sure /var/lib/mysql rights are setup correctly.

    If you do an upgrade on then bootstrap[1] is not run, so you have to
    make sure the permission are setup right every time.

    This is duplicating what is happening in the pacemaker mysql template[2]

    Partial-Bug: #1724636

    [1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/database/mysql.yaml#L128
    [2] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/database/mysql.yaml#L162

    Change-Id: Ib224dd10361171dfd579867be35a2c67a71fd9d5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/pike)

Reviewed: https://review.openstack.org/518579
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9ca5d97476da2d394fbf2485f7e6855fbbbac693
Submitter: Zuul
Branch: stable/pike

commit 9ca5d97476da2d394fbf2485f7e6855fbbbac693
Author: Sofer Athlan-Guyot <email address hidden>
Date: Wed Nov 8 17:46:35 2017 +0100

    Make sure /var/lib/mysql rights are setup correctly.

    If you do an upgrade on then bootstrap[1] is not run, so you have to
    make sure the permission are setup right every time.

    This is duplicating what is happening in the pacemaker mysql template[2]

    Partial-Bug: #1724636

    [1] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/database/mysql.yaml#L128
    [2] https://github.com/openstack/tripleo-heat-templates/blob/master/docker/services/pacemaker/database/mysql.yaml#L162

    Change-Id: Ib224dd10361171dfd579867be35a2c67a71fd9d5
    (cherry picked from commit f48e11ee2715277bd1fe6c32580cc41a0de6ce4e)

tags: added: in-stable-pike
Changed in tripleo:
milestone: queens-2 → queens-3
Changed in tripleo:
milestone: queens-3 → queens-rc1
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Revision history for this message
Jose Luis Franco (jfrancoa) wrote :

Is this still happening? I guess it's still oepened because the submitted patch was a Partial-Bug but it didn't close it. @Sofer, is there anything missing on this or could we just close it?

Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
Sofer Athlan-Guyot (sofer-athlan-guyot) wrote :

This has been closed by fixing the ci workflow. Wrong container were used.

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.