Minor update failure: Could not complete restart of galera-bundle

Bug #1873893 reported by Emilien Macchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Damien Ciabrini

Bug Description

Environment: master, centos8, pacemaker enabled, single node overcloud.

Minor updates fail at the step when puppet container run:

2020-04-17 15:11:49 | TASK [tripleo_container_manage : Create facts for containers which changed or failed] ***
2020-04-17 15:11:49 | task path: /usr/share/ansible/roles/tripleo_container_manage/tasks/podman/create.yml:94
2020-04-17 15:11:49 | Friday 17 April 2020 15:11:49 +0000 (0:10:42.416) 0:19:51.498 **********
2020-04-17 15:11:49 | ok: [centos-8-inap-mtl01-0016007986] => changed=false
2020-04-17 15:11:49 | ansible_facts:
2020-04-17 15:11:49 | containers_changed:
2020-04-17 15:11:49 | - keystone_init_log
2020-04-17 15:11:49 | - clustercheck
2020-04-17 15:11:49 | containers_failed:
2020-04-17 15:11:49 | - mysql_restart_bundle
2020-04-17 15:11:49 | - rabbitmq_restart_bundle
2020-04-17 15:11:49 | skip_once_per_role: Checking host centos-8-inap-mtl01-0016007986 for task Print the containers that failed to start
2020-04-17 15:11:49 | skip_once_per_role: host centos-8-inap-mtl01-0016007986 has role Controller
2020-04-17 15:11:49 | skip_once_per_role: task when evaluated to True, appending
2020-04-17 15:11:49 |
2020-04-17 15:11:49 | TASK [tripleo_container_manage : Print the containers that failed to start] ****
2020-04-17 15:11:49 | task path: /usr/share/ansible/roles/tripleo_container_manage/tasks/podman/create.yml:99
2020-04-17 15:11:49 | Friday 17 April 2020 15:11:49 +0000 (0:00:00.269) 0:19:51.768 **********
2020-04-17 15:11:49 | fatal: [centos-8-inap-mtl01-0016007986]: FAILED! => changed=false
2020-04-17 15:11:49 | msg: '[''mysql_restart_bundle'', ''rabbitmq_restart_bundle''] failed to start, check logs in /var/log/containers/stdouts/'

https://18821a32bda71023f471-164ae1091eb5a377552df92dcdcf8170.ssl.cf1.rackcdn.com/719664/6/check/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates/1dba0d2/logs/undercloud/home/zuul/overcloud_update_run_Controller.log

2020-04-17T15:01:05.553708115+00:00 stdout F Fri Apr 17 15:01:05 UTC 2020: Restarting galera-bundle globally
2020-04-17T15:11:07.960480979+00:00 stderr F Error: Error performing operation: Timer expired
2020-04-17T15:11:07.960480979+00:00 stderr F Set 'galera-bundle' option: id=galera-bundle-meta_attributes-target-role set=galera-bundle-meta_attributes name=target-role value=stopped
2020-04-17T15:11:07.960480979+00:00 stderr F Waiting for 1 resources to stop:
2020-04-17T15:11:07.960480979+00:00 stderr F * galera-bundle
2020-04-17T15:11:07.960480979+00:00 stderr F Deleted 'galera-bundle' option: id=galera-bundle-meta_attributes-target-role name=target-role
2020-04-17T15:11:07.960480979+00:00 stderr F Waiting for 1 resources to start again:
2020-04-17T15:11:07.960480979+00:00 stderr F * galera-bundle
2020-04-17T15:11:07.960480979+00:00 stderr F Could not complete restart of galera-bundle, 1 resources remaining
2020-04-17T15:11:07.960480979+00:00 stderr F * rabbitmq-bundle
2020-04-17T15:11:07.960480979+00:00 stderr F

Changed in tripleo:
milestone: none → ussuri-rc2
milestone: ussuri-rc2 → ussuri-rc1
importance: Undecided → High
status: New → Triaged
Changed in tripleo:
assignee: nobody → Damien Ciabrini (dciabrin)
Revision history for this message
Damien Ciabrini (dciabrin) wrote :

I can't reproduced yet locally on a 1-node HA controller, the minor update runs fine.
I'm going to try with reproducer-quickstart.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/722341

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/722341
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=690124682728f56d36701e852a899aaf8ee91940
Submitter: Zuul
Branch: master

commit 690124682728f56d36701e852a899aaf8ee91940
Author: Damien Ciabrini <email address hidden>
Date: Thu Apr 23 16:42:46 2020 +0200

    Ensure <service>_restart_bundle do not run concurrently

    Now that paunch containers are run asynchronously, some
    containers used for HA orchestration can run concurrently,
    which triggers a deadlock and cause them to time out and
    exit in error.
    To overcome this concurrency deadlock, tweak the start_order
    configuration of each HA service to make sure that no
    <service>_restart_bundle container can be started at the same
    time during a given step.

    Change-Id: I9ae55101945978ea6c65d37eaa8221ed3d96a7f5
    Closes-Bug: #1873893

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/723325

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/723325
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=765d19889a972907fe43384bcc02fdcbf046efa5
Submitter: Zuul
Branch: stable/train

commit 765d19889a972907fe43384bcc02fdcbf046efa5
Author: Damien Ciabrini <email address hidden>
Date: Thu Apr 23 16:42:46 2020 +0200

    Ensure <service>_restart_bundle do not run concurrently

    Now that paunch containers are run asynchronously, some
    containers used for HA orchestration can run concurrently,
    which triggers a deadlock and cause them to time out and
    exit in error.
    To overcome this concurrency deadlock, tweak the start_order
    configuration of each HA service to make sure that no
    <service>_restart_bundle container can be started at the same
    time during a given step.

    Change-Id: I9ae55101945978ea6c65d37eaa8221ed3d96a7f5
    Closes-Bug: #1873893
    (cherry picked from commit 690124682728f56d36701e852a899aaf8ee91940)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.